server resetting
Postgresql 7.4.7 (yes, I've been telling them we need to upgrade to the
latest 7.4)
Red Hat Enterprise Linux ES release 3
We are having problems with the postgresql server resetting and dropping
all user connections. There is a core file generated and I've attached
a backtrace. I'm about to dig into the source to see what I can find,
but if anyone can put their finger on the problem, I would appreciate
it. I do realize that there is a call to exec_stmt() which appears to
have a null value being passed, which I suspect is the issue. Why a
null is being passed is what I plan to look into.
Thanks for any info, here's the backtrace:
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: bwoods exp [local] INSERT '.
Program terminated with signal 11, Segmentation fault.
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
in pl_exec.c
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
#1 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
at pl_exec.c:903
#2 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
at pl_exec.c:1139
#3 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fab78)
at pl_exec.c:947
#4 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fab90)
at pl_exec.c:903
#5 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fad20)
at pl_exec.c:1139
#6 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fad20)
at pl_exec.c:947
#7 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9133e60)
at pl_exec.c:903
#8 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90d97b8)
at pl_exec.c:1139
#9 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90d97b8)
at pl_exec.c:947
#10 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9118408)
at pl_exec.c:903
#11 0x0083ee15 in exec_stmt_block (estate=0xfeff8a90, block=0x90d97e8)
at pl_exec.c:859
#12 0x0083e77a in plpgsql_exec_trigger (func=0x9149ae0, trigdata=0xfeff8ca0)
at pl_exec.c:645
#13 0x0083b053 in plpgsql_call_handler (fcinfo=0xfeff8b50) at
pl_handler.c:121
#14 0x080f1c8e in ExecCallTriggerFunc (trigdata=0xfeff8ca0, finfo=0x935e260,
per_tuple_context=0x0) at trigger.c:1150
#15 0x080f2be7 in DeferredTriggerExecute (event=0x92af050, itemno=0,
rel=0x8,
trigdesc=0x935daf0, finfo=0xfeff8a90, per_tuple_context=0x0)
at trigger.c:1859
#16 0x080f2fee in deferredTriggerInvokeEvents (immediate_only=1 '\001')
at trigger.c:2000
#17 0x080f314f in DeferredTriggerEndQuery () at trigger.c:2135
#18 0x08178ae8 in finish_xact_command () at postgres.c:1749
#19 0x08177816 in exec_simple_query (
query_string=0x8fe2438 "INSERT INTO logs
(seq,level,event_code,event_date,event_time,city,province,user_id,est_dsp_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num)
VALUES ('2','6','TAS','09/14/06','19:"...)
at postgres.c:905
#20 0x08179f09 in PostgresMain (argc=4, argv=0x8f94b48,
username=0x8f94ab8 "bwoods") at postgres.c:2871
#21 0x08153c90 in BackendFork (port=0x8fa6af0) at postmaster.c:2564
#22 0x08153683 in BackendStartup (port=0x8fa6af0) at postmaster.c:2207
#23 0x08151be8 in ServerLoop () at postmaster.c:1119
#24 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at
postmaster.c:897
#25 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin
Maybe I'm restating the obvious, but it looks to me like the procedural
trigger from the SQL query "INSERT INTO logs
(seq,level,event_code,event_date,event_time,city,province,user_id,est_ds
p_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num
)
VALUES ('2','6','TAS','09/14/06','19:"... is the culprit, probably 3-4
IF (or other conditional) statements in. Check this trigger to see if
it handles NULLs correctly.
Looking at the change logs
(http://www.postgresql.org/docs/7.4/interactive/release.html) it looks
like there were significant fixes in 7.4.8. It's possible that this is
a known bug that has already been fixed.
--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Geoffrey
Sent: Monday, September 18, 2006 10:06 AM
To: PostgreSQL List
Subject: [GENERAL] server resetting
Postgresql 7.4.7 (yes, I've been telling them we need to upgrade to the
latest 7.4)
Red Hat Enterprise Linux ES release 3
We are having problems with the postgresql server resetting and dropping
all user connections. There is a core file generated and I've attached
a backtrace. I'm about to dig into the source to see what I can find,
but if anyone can put their finger on the problem, I would appreciate
it. I do realize that there is a call to exec_stmt() which appears to
have a null value being passed, which I suspect is the issue. Why a
null is being passed is what I plan to look into.
Thanks for any info, here's the backtrace:
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: bwoods exp [local] INSERT '.
Program terminated with signal 11, Segmentation fault.
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
in pl_exec.c
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
#1 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
at pl_exec.c:903
#2 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
at pl_exec.c:1139
#3 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fab78)
at pl_exec.c:947
#4 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fab90)
at pl_exec.c:903
#5 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fad20)
at pl_exec.c:1139
#6 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fad20)
at pl_exec.c:947
#7 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9133e60)
at pl_exec.c:903
#8 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90d97b8)
at pl_exec.c:1139
#9 0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90d97b8)
at pl_exec.c:947
#10 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9118408)
at pl_exec.c:903
#11 0x0083ee15 in exec_stmt_block (estate=0xfeff8a90, block=0x90d97e8)
at pl_exec.c:859
#12 0x0083e77a in plpgsql_exec_trigger (func=0x9149ae0,
trigdata=0xfeff8ca0)
at pl_exec.c:645
#13 0x0083b053 in plpgsql_call_handler (fcinfo=0xfeff8b50) at
pl_handler.c:121
#14 0x080f1c8e in ExecCallTriggerFunc (trigdata=0xfeff8ca0,
finfo=0x935e260,
per_tuple_context=0x0) at trigger.c:1150
#15 0x080f2be7 in DeferredTriggerExecute (event=0x92af050, itemno=0,
rel=0x8,
trigdesc=0x935daf0, finfo=0xfeff8a90, per_tuple_context=0x0)
at trigger.c:1859
#16 0x080f2fee in deferredTriggerInvokeEvents (immediate_only=1 '\001')
at trigger.c:2000
#17 0x080f314f in DeferredTriggerEndQuery () at trigger.c:2135
#18 0x08178ae8 in finish_xact_command () at postgres.c:1749
#19 0x08177816 in exec_simple_query (
query_string=0x8fe2438 "INSERT INTO logs
(seq,level,event_code,event_date,event_time,city,province,user_id,est_ds
p_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num
)
VALUES ('2','6','TAS','09/14/06','19:"...)
at postgres.c:905
#20 0x08179f09 in PostgresMain (argc=4, argv=0x8f94b48,
username=0x8f94ab8 "bwoods") at postgres.c:2871
#21 0x08153c90 in BackendFork (port=0x8fa6af0) at postmaster.c:2564
#22 0x08153683 in BackendStartup (port=0x8fa6af0) at postmaster.c:2207
#23 0x08151be8 in ServerLoop () at postmaster.c:1119
#24 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at
postmaster.c:897
#25 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
Geoffrey <esoteric@3times25.net> writes:
Program terminated with signal 11, Segmentation fault.
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
in pl_exec.c
#0 exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
#1 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
at pl_exec.c:903
#2 0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
at pl_exec.c:1139
It seems you've got a corrupt "compiled statements" data structure for
a plpgsql trigger function. Offhand this does not look like any of the
known post-7.4.7 bug fixes. Can you show us the source code for that
trigger?
regards, tom lane
Tom Lane wrote:
It seems you've got a corrupt "compiled statements" data structure for
a plpgsql trigger function. Offhand this does not look like any of the
known post-7.4.7 bug fixes. Can you show us the source code for that
trigger?
Problem is, we seem to be having a problem with this reset issue and I
don't see a correlation in the backtraces. Most of them are in fact
related to inserts, but there are at least three different tables
involved. There are also some where an INSERT is not involved. I've
attached three more backtraces from different core files to provide
further data and hopefully pinpoint this issue.
We're assuming a common problem here, maybe that's our first mistake.
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin
Geoffrey <esoteric@3times25.net> writes:
Problem is, we seem to be having a problem with this reset issue and I
don't see a correlation in the backtraces. Most of them are in fact
related to inserts, but there are at least three different tables
involved. There are also some where an INSERT is not involved. I've
attached three more backtraces from different core files to provide
further data and hopefully pinpoint this issue.
Well, these make it clear that you've got some pretty big chunks of
nonstandard code in the backend, so my first thought is that there's a
memory-clobber bug somewhere in that. It might be worth trying to run
the code with a debugging malloc library (ElectricFence or some such)
to try to locate the culprit.
regards, tom lane
Tom Lane wrote:
Geoffrey <esoteric@3times25.net> writes:
Problem is, we seem to be having a problem with this reset issue and I
don't see a correlation in the backtraces. Most of them are in fact
related to inserts, but there are at least three different tables
involved. There are also some where an INSERT is not involved. I've
attached three more backtraces from different core files to provide
further data and hopefully pinpoint this issue.Well, these make it clear that you've got some pretty big chunks of
nonstandard code in the backend, so my first thought is that there's a
memory-clobber bug somewhere in that. It might be worth trying to run
the code with a debugging malloc library (ElectricFence or some such)
to try to locate the culprit.
I'm not sure what you mean by 'nonstandard code,' could you expand on
that? All the trigger code is written in plpgsql. Are you suggesting
we're stomping on our own memory within the trigger code we've written?
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin
Geoffrey <esoteric@3times25.net> writes:
Tom Lane wrote:
Well, these make it clear that you've got some pretty big chunks of
nonstandard code in the backend, so my first thought is that there's a
memory-clobber bug somewhere in that.
I'm not sure what you mean by 'nonstandard code,' could you expand on
that?
The traces include code from /usr/local/lib/libgrid.so and
/usr/local/lib/libpcmsrv.so ... I don't know what those are,
but I'm quite sure they are not invoked by a standard Postgres build.
I also find it suggestive that they appear to have been written in C++
... we've seen problems before from trying to link C++ code into the
backend, because it tends to bring along its own incompatible ideas
about how to do error recovery and memory management.
regards, tom lane
Tom Lane wrote:
Geoffrey <esoteric@3times25.net> writes:
Tom Lane wrote:
Well, these make it clear that you've got some pretty big chunks of
nonstandard code in the backend, so my first thought is that there's a
memory-clobber bug somewhere in that.I'm not sure what you mean by 'nonstandard code,' could you expand on
that?The traces include code from /usr/local/lib/libgrid.so and
/usr/local/lib/libpcmsrv.so ... I don't know what those are,
but I'm quite sure they are not invoked by a standard Postgres build.
I also find it suggestive that they appear to have been written in C++
... we've seen problems before from trying to link C++ code into the
backend, because it tends to bring along its own incompatible ideas
about how to do error recovery and memory management.
The libpcmsrv is a library for looking up miles, vendor provided. I'll
have to check on the other one, it may be related to the same package.
Thanks for the heads up on C++ code.
It seems we may have located a memory problem in a library that is used
throughout our code, thus, we are looking into this at this time.
Thanks again.
--
Until later, Geoffrey
Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin