Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

Started by Lawrence Oluyedeover 18 years ago9 messagesgeneral
Jump to latest
#1Lawrence Oluyede
l.oluyede@gmail.com

While developing a Python program I encountered a situation that makes
on of the PostgreSQL 8.3b4's processes crash badly with a segfault.
Let me explain.

After enabling "debug5" as log level I was able to locate and
reproduce the exact sequence of SQL queries made against my test
database
and crash postgres again using only "psql". See for yourself:

foodb=# \o out.txt
foodb=# SELECT * FROM foobaz.instrument WHERE code = 'TEST0118';
LOG: duration: 3.683 ms statement: SELECT * FROM foobaz.instrument
WHERE code = 'TEST0118';
foodb=# SELECT xpath('//sp:description/text()', content,
ARRAY[ARRAY['sp', 'http://www.foobaz.com/']])
foodb-# FROM foobaz.instrument WHERE code = 'TEST0018';
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The pgsql log file contains these error logs (the pid 753 is a
postgres child process):

"""""""""
[loluyede:foodb:2007-12-11 14:38:20.992 CET]LOG: 00000: duration:
0.090 ms statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
COMMITTED
[loluyede:foodb:2007-12-11 14:38:20.992 CET]LOCATION:
exec_simple_query, postgres.c:1040
[::2007-12-11 14:38:20.993 CET]LOG: 00000: server process (PID 753)
was terminated by signal 11: Segmentation fault
[::2007-12-11 14:38:20.993 CET]LOCATION: LogChildExit, postmaster.c:2510
[::2007-12-11 14:38:20.993 CET]LOG: 00000: terminating any other
active server processes
[::2007-12-11 14:38:20.993 CET]LOCATION: HandleChildCrash, postmaster.c:2355
[::2007-12-11 14:38:20.997 CET]LOG: 00000: all server processes
terminated; reinitializing
[::2007-12-11 14:38:20.997 CET]LOCATION: PostmasterStateMachine,
postmaster.c:2663
[::2007-12-11 14:38:21.000 CET]LOG: 00000: database system was
interrupted; last known up at 2007-12-11 14:38:02 CET
[::2007-12-11 14:38:21.000 CET]LOCATION: StartupXLOG, xlog.c:4789
[::2007-12-11 14:38:21.000 CET]DEBUG: 00000: checkpoint record is at 0/2AB634A0
""""""""

The out.txt file contains the output of the "SELECT *" statement

code | kind | last_modified |
created | modified_by | created_by |
content
----------+------------+-------------------------------+------------------------------+-------------+------------+----------------------------------------------------------------------
TEST0118 | bazfoo | 2007-12-12 08:17:01.846919+01 | 2007-12-12
08:17:00.32337+01 | anonymous | anonymous | <xml ... >
(1 row)

The "instrument" table has the following schema:

"""
CREATE TABLE foobaz.instrument
(
-- Inherited: code character varying(32) NOT NULL,
-- Inherited: kind character varying(32) NOT NULL DEFAULT
'bazfoo'::character varying,
last_modified timestamp with time zone NOT NULL,
created timestamp with time zone NOT NULL DEFAULT
('now'::text)::timestamp(3) with time zone,
modified_by character varying(64),
created_by character varying(64) NOT NULL DEFAULT
'anonymous'::character varying,
content xml NOT NULL,
CONSTRAINT instrument_pkey PRIMARY KEY (code),
CONSTRAINT instrument_kind_check CHECK (kind::text = 'bazfoo'::text)
)
INHERITS (foobaz.instrument_base)
WITH (OIDS=FALSE);
"""

Its "parent" table is instrument_base and has the following schema:

"""
CREATE TABLE foobaz.instrument_base
(
code character varying(32) NOT NULL,
kind character varying(32) NOT NULL,
CONSTRAINT instrument_base_pkey PRIMARY KEY (code)
)
WITH (OIDS=FALSE);
"""

PostgreSQL is again 8.3beta 4 on Ubuntu Linux 7.10.
It's compiled with the following flags:
./configure --with-python --with-openssl --with-pam --with-libxml
--with-libxslt --enable-thread-safety --enable-debug

libxml is 2.6.30, libxslt is 1.1.21

I think that is all.

Is it a bug or am I doing something wrong?

--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Lawrence Oluyede (#1)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

Hello

this is bug. Please send backtrace from core file.

Regards
Pavel Stehule

Show quoted text

On 12/12/2007, Lawrence Oluyede <l.oluyede@gmail.com> wrote:

While developing a Python program I encountered a situation that makes
on of the PostgreSQL 8.3b4's processes crash badly with a segfault.
Let me explain.

After enabling "debug5" as log level I was able to locate and
reproduce the exact sequence of SQL queries made against my test
database
and crash postgres again using only "psql". See for yourself:

foodb=# \o out.txt
foodb=# SELECT * FROM foobaz.instrument WHERE code = 'TEST0118';
LOG: duration: 3.683 ms statement: SELECT * FROM foobaz.instrument
WHERE code = 'TEST0118';
foodb=# SELECT xpath('//sp:description/text()', content,
ARRAY[ARRAY['sp', 'http://www.foobaz.com/&#39;]])
foodb-# FROM foobaz.instrument WHERE code = 'TEST0018';
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The pgsql log file contains these error logs (the pid 753 is a
postgres child process):

"""""""""
[loluyede:foodb:2007-12-11 14:38:20.992 CET]LOG: 00000: duration:
0.090 ms statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ
COMMITTED
[loluyede:foodb:2007-12-11 14:38:20.992 CET]LOCATION:
exec_simple_query, postgres.c:1040
[::2007-12-11 14:38:20.993 CET]LOG: 00000: server process (PID 753)
was terminated by signal 11: Segmentation fault
[::2007-12-11 14:38:20.993 CET]LOCATION: LogChildExit, postmaster.c:2510
[::2007-12-11 14:38:20.993 CET]LOG: 00000: terminating any other
active server processes
[::2007-12-11 14:38:20.993 CET]LOCATION: HandleChildCrash, postmaster.c:2355
[::2007-12-11 14:38:20.997 CET]LOG: 00000: all server processes
terminated; reinitializing
[::2007-12-11 14:38:20.997 CET]LOCATION: PostmasterStateMachine,
postmaster.c:2663
[::2007-12-11 14:38:21.000 CET]LOG: 00000: database system was
interrupted; last known up at 2007-12-11 14:38:02 CET
[::2007-12-11 14:38:21.000 CET]LOCATION: StartupXLOG, xlog.c:4789
[::2007-12-11 14:38:21.000 CET]DEBUG: 00000: checkpoint record is at 0/2AB634A0
""""""""

The out.txt file contains the output of the "SELECT *" statement

code | kind | last_modified |
created | modified_by | created_by |
content
----------+------------+-------------------------------+------------------------------+-------------+------------+----------------------------------------------------------------------
TEST0118 | bazfoo | 2007-12-12 08:17:01.846919+01 | 2007-12-12
08:17:00.32337+01 | anonymous | anonymous | <xml ... >
(1 row)

The "instrument" table has the following schema:

"""
CREATE TABLE foobaz.instrument
(
-- Inherited: code character varying(32) NOT NULL,
-- Inherited: kind character varying(32) NOT NULL DEFAULT
'bazfoo'::character varying,
last_modified timestamp with time zone NOT NULL,
created timestamp with time zone NOT NULL DEFAULT
('now'::text)::timestamp(3) with time zone,
modified_by character varying(64),
created_by character varying(64) NOT NULL DEFAULT
'anonymous'::character varying,
content xml NOT NULL,
CONSTRAINT instrument_pkey PRIMARY KEY (code),
CONSTRAINT instrument_kind_check CHECK (kind::text = 'bazfoo'::text)
)
INHERITS (foobaz.instrument_base)
WITH (OIDS=FALSE);
"""

Its "parent" table is instrument_base and has the following schema:

"""
CREATE TABLE foobaz.instrument_base
(
code character varying(32) NOT NULL,
kind character varying(32) NOT NULL,
CONSTRAINT instrument_base_pkey PRIMARY KEY (code)
)
WITH (OIDS=FALSE);
"""

PostgreSQL is again 8.3beta 4 on Ubuntu Linux 7.10.
It's compiled with the following flags:
./configure --with-python --with-openssl --with-pam --with-libxml
--with-libxslt --enable-thread-safety --enable-debug

libxml is 2.6.30, libxslt is 1.1.21

I think that is all.

Is it a bug or am I doing something wrong?

--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#3Lawrence Oluyede
l.oluyede@gmail.com
In reply to: Pavel Stehule (#2)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

Here it is:

postgres$ gdb /usr/local/pgsql/bin/postgres core.1600
(gdb) bt
#0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591
#1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2
#2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2
#3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441
#4 0x0817107f in ExecMakeFunctionResult (fcache=0x84723d0,
econtext=0x8472348, isNull=0x8472fa8 "", isDone=0x8472fb8) at
execQual.c:1351
#5 0x0816f522 in ExecProject (projInfo=0x8472e98, isDone=0xbfee5c78)
at execQual.c:4601
#6 0x08175987 in ExecScan (node=0x84722c0, accessMtd=0x817e310
<IndexNext>) at execScan.c:143
#7 0x0817e2e7 in ExecIndexScan (node=0x84722c0) at nodeIndexscan.c:147
#8 0x0816ee7a in ExecProcNode (node=0x84722c0) at execProcnode.c:338
#9 0x0816e113 in ExecutorRun (queryDesc=0x845e908,
direction=ForwardScanDirection, count=0) at execMain.c:1233
#10 0x08204ea0 in PortalRunSelect (portal=0x84577a0, forward=<value
optimized out>, count=0, dest=0x845d1c8) at pquery.c:943
#11 0x08205eaa in PortalRun (portal=0x84577a0, count=2147483647,
isTopLevel=1 '\001', dest=0x845d1c8, altdest=0x845d1c8,
completionTag=0xbfee5ee8 "") at pquery.c:797
#12 0x082013c3 in exec_simple_query (
query_string=0x8453790 "SELECT xpath('//sp:description/text()',
content, ARRAY[ARRAY['sp',
'http://www.statpro.net/xml/structure/1.0&#39;]])\n FROM instrument
WHERE code = 'TEST0018';") at postgres.c:963
#13 0x08202ebc in PostgresMain (argc=4, argv=<value optimized out>,
username=0x83d2330 "postgres") at postgres.c:3531
#14 0x081d76cf in ServerLoop () at postmaster.c:3180
#15 0x081d83a5 in PostmasterMain (argc=3, argv=0x83ccb08) at postmaster.c:1028
#16 0x081909a0 in main (argc=3, argv=0x82c0c00) at main.c:188
(gdb) l
586 header = (StandardChunkHeader *)
587 ((char *) pointer - STANDARDCHUNKHEADERSIZE);
588
589 AssertArg(MemoryContextIsValid(header->context));
590
591 (*header->context->methods->free_p) (header->context, pointer);
592 }
593
594 /*
595 * repalloc

I also uploaded the entire core file on the net, if you want to inspect it:
http://www.oluyede.org/temp/core.1600 (it's ~ 30M)

thank you

--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Lawrence Oluyede (#3)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

Hello

I cannot repeat this bug on my test date. Can you send some small set
that reproduce the bug?

Regards
Pavel

Show quoted text

On 12/12/2007, Lawrence Oluyede <l.oluyede@gmail.com> wrote:

Here it is:

postgres$ gdb /usr/local/pgsql/bin/postgres core.1600
(gdb) bt
#0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591
#1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2
#2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2
#3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441
#4 0x0817107f in ExecMakeFunctionResult (fcache=0x84723d0,
econtext=0x8472348, isNull=0x8472fa8 "", isDone=0x8472fb8) at
execQual.c:1351
#5 0x0816f522 in ExecProject (projInfo=0x8472e98, isDone=0xbfee5c78)
at execQual.c:4601
#6 0x08175987 in ExecScan (node=0x84722c0, accessMtd=0x817e310
<IndexNext>) at execScan.c:143
#7 0x0817e2e7 in ExecIndexScan (node=0x84722c0) at nodeIndexscan.c:147
#8 0x0816ee7a in ExecProcNode (node=0x84722c0) at execProcnode.c:338
#9 0x0816e113 in ExecutorRun (queryDesc=0x845e908,
direction=ForwardScanDirection, count=0) at execMain.c:1233
#10 0x08204ea0 in PortalRunSelect (portal=0x84577a0, forward=<value
optimized out>, count=0, dest=0x845d1c8) at pquery.c:943
#11 0x08205eaa in PortalRun (portal=0x84577a0, count=2147483647,
isTopLevel=1 '\001', dest=0x845d1c8, altdest=0x845d1c8,
completionTag=0xbfee5ee8 "") at pquery.c:797
#12 0x082013c3 in exec_simple_query (
query_string=0x8453790 "SELECT xpath('//sp:description/text()',
content, ARRAY[ARRAY['sp',
'http://www.statpro.net/xml/structure/1.0&#39;]])\n FROM instrument
WHERE code = 'TEST0018';") at postgres.c:963
#13 0x08202ebc in PostgresMain (argc=4, argv=<value optimized out>,
username=0x83d2330 "postgres") at postgres.c:3531
#14 0x081d76cf in ServerLoop () at postmaster.c:3180
#15 0x081d83a5 in PostmasterMain (argc=3, argv=0x83ccb08) at postmaster.c:1028
#16 0x081909a0 in main (argc=3, argv=0x82c0c00) at main.c:188
(gdb) l
586 header = (StandardChunkHeader *)
587 ((char *) pointer - STANDARDCHUNKHEADERSIZE);
588
589 AssertArg(MemoryContextIsValid(header->context));
590
591 (*header->context->methods->free_p) (header->context, pointer);
592 }
593
594 /*
595 * repalloc

I also uploaded the entire core file on the net, if you want to inspect it:
http://www.oluyede.org/temp/core.1600 (it's ~ 30M)

thank you

--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#4)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

sorry

s/date/data/g

Pavel

#6Lawrence Oluyede
l.oluyede@gmail.com
In reply to: Pavel Stehule (#4)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

On Dec 12, 2007 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I cannot repeat this bug on my test date. Can you send some small set
that reproduce the bug?

I dumped the entire table here: http://www.oluyede.org/temp/instrument.txt

Let me know if you need anything else

--
Lawrence, oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Lawrence Oluyede (#6)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

"Lawrence Oluyede" <l.oluyede@gmail.com> writes:

On Dec 12, 2007 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I cannot repeat this bug on my test date. Can you send some small set
that reproduce the bug?

I dumped the entire table here: http://www.oluyede.org/temp/instrument.txt

I didn't have any luck reproducing the crash either.

Please try to put together a SQL script that reproduces the crash for
you, starting from an empty database. It's not exactly clear to me
whether I'm doing the same things you did.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Lawrence Oluyede (#3)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

"Lawrence Oluyede" <l.oluyede@gmail.com> writes:

Here it is:
postgres$ gdb /usr/local/pgsql/bin/postgres core.1600
(gdb) bt
#0 0x082c101c in pfree (pointer=0x8472f00) at mcxt.c:591
#1 0xb7e46513 in xmlCleanupCharEncodingHandlers () from /usr/lib/libxml2.so.2
#2 0xb7e4f091 in xmlCleanupParser () from /usr/lib/libxml2.so.2
#3 0x082940e4 in xpath (fcinfo=0xbfee59c4) at xml.c:3441

I think we finally found the problem. Please see if things are more
stable with 8.3RC1 plus this patch:

http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php

regards, tom lane

#9Lawrence Oluyede
l.oluyede@gmail.com
In reply to: Tom Lane (#8)
Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

I think we finally found the problem. Please see if things are more
stable with 8.3RC1 plus this patch:

http://archives.postgresql.org/pgsql-committers/2008-01/msg00190.php

Sorry for being so late. I'm currently using PostgreSQL RC2 at work
and I notice no crashes.
Thanks!

--
Lawrence, stacktrace.it - oluyede.org - neropercaso.it
"It is difficult to get a man to understand
something when his salary depends on not
understanding it" - Upton Sinclair