BUG #5004: pg_freespacemap make a SegFault

Started by Sébastien Lardièrealmost 17 years ago5 messagesbugs

slardiere@hi-media.com

almost 17 years ago

The following bug has been logged online:

Bug reference: 5004
Logged by: Sebastien Lardiere
Email address: slardiere@hi-media.com
PostgreSQL version: 8.3.7
Operating system: Debian Etch
Description: pg_freespacemap make a SegFault
Details:

I've got a crash with a cluster. Nothing found in the logfile, but a message
about a Segfault, so I get a coredump and here is the backtrace :

Core was generated by `postgres: postgres postgres [local] SELECT
'.
Program terminated with signal 11, Segmentation fault.
#0 pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162
162
fctx->record[i].reltablespace = fsmrel->key.spcNode;
(gdb) bt
#0 pg_freespacemap_pages (fcinfo=0x7fff4a9bc250) at pg_freespacemap.c:162
#1 0x0000000000526781 in ExecMakeTableFunctionResult (funcexpr=0x29c2408,
econtext=0x29c1b70, expectedDesc=0x29c1ed0, returnDesc=0x7fff4a9bc6d0) at
execQual.c:1566
#2 0x00000000005330d2 in FunctionNext (node=0x29bf620) at
nodeFunctionscan.c:68
#3 0x000000000052881c in ExecScan (node=0x7fc03f6c5370, accessMtd=0x533030
<FunctionNext>) at execScan.c:68
#4 0x0000000000521f6d in ExecProcNode (node=0x29bf620) at
execProcnode.c:356
#5 0x000000000052ca40 in ExecAgg (node=0x29c17f0) at nodeAgg.c:874
#6 0x0000000000521fed in ExecProcNode (node=0x29c17f0) at
execProcnode.c:394
#7 0x0000000000520ffd in ExecutorRun (queryDesc=<value optimized out>,
direction=ForwardScanDirection, count=0) at execMain.c:1335
#8 0x00000000005ba0d6 in PortalRunSelect (portal=0x29b47a0, forward=<value
optimized out>, count=0, dest=0x29af198) at pquery.c:943
#9 0x00000000005bb159 in PortalRun (portal=0x29b47a0,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x29af198,
altdest=0x29af198, completionTag=0x7fff4a9bcf40 "") at pquery.c:769
#10 0x00000000005b6d2d in exec_simple_query (query_string=0x2969070 "select
count(*) as pages from pg_freespacemap_pages ") at postgres.c:1004
#11 0x00000000005b8071 in PostgresMain (argc=4, argv=<value optimized out>,
username=0x28bf4b0 "postgres") at postgres.c:3631
#12 0x000000000058ca1b in ServerLoop () at postmaster.c:3207
#13 0x000000000058d73e in PostmasterMain (argc=5, argv=0x28ba310) at
postmaster.c:1029
#14 0x0000000000544c15 in main (argc=5, argv=<value optimized out>) at
main.c:188

We can see the use of contrib/pg_freespacemap. A munin plugin sent this
query "select count(*) as pages from pg_freespacemap_pages " every 5 minutes
( since 1 year, now ) and we obtain graph.

I notice that the graph says that our freespacemap is empty ( a few thousand
of pages ) since our first crash. And sometime, the number of pages
increase, and we've got a crash.

If you want more detail, ask me ...

Thanks,

PS : Sorry for my poor english

Tom Lane

tgl@sss.pgh.pa.us

almost 17 years ago

In reply to: Sébastien Lardière (#1)

Re: BUG #5004: pg_freespacemap make a SegFault

"Sebastien Lardiere" <slardiere@hi-media.com> writes:

Description: pg_freespacemap make a SegFault

There's a post-8.3.7 fix that might cure this:

http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php

regards, tom lane

hubert depesz lubaczewski

depesz@depesz.com

almost 17 years ago

In reply to: Sébastien Lardière (#1)

Re: BUG #5004: pg_freespacemap make a SegFault

On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote:

The following bug has been logged online:

Bug reference: 5004
Logged by: Sebastien Lardiere
Email address: slardiere@hi-media.com
PostgreSQL version: 8.3.7
Operating system: Debian Etch
Description: pg_freespacemap make a SegFault
Details:

I've got a crash with a cluster. Nothing found in the logfile, but a message
about a Segfault, so I get a coredump and here is the backtrace :

Can you check if you had any vacuums running at the time of crash?

It might be in logs, something like:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
STATEMENT: vacuum

if yes - how many vacuum jobs there were?

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

Sébastien Lardière

slardiere@hi-media.com

almost 17 years ago

In reply to: hubert depesz lubaczewski (#3)

Re: BUG #5004: pg_freespacemap make a SegFault

On 22/08/2009 19:52, hubert depesz lubaczewski wrote:

On Fri, Aug 21, 2009 at 04:26:11PM +0000, Sebastien Lardiere wrote:

The following bug has been logged online:

Bug reference: 5004
Logged by: Sebastien Lardiere
Email address: slardiere@hi-media.com
PostgreSQL version: 8.3.7
Operating system: Debian Etch
Description: pg_freespacemap make a SegFault
Details:

I've got a crash with a cluster. Nothing found in the logfile, but a message
about a Segfault, so I get a coredump and here is the backtrace :

Can you check if you had any vacuums running at the time of crash?

Yes, autovacuum is on. it wasn't "normal" vacuum during the crash, but
the last.

Nevertheless, the day before the first crash, I made a big delete on 23
millions of rows, and pg_freespacemap show a big increase of the number
of pages in FSM. Then, when the number of pages in FSM increase, Pg
crashes ; but :

It might be in logs, something like:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
STATEMENT: vacuum

if yes - how many vacuum jobs there were?

I never seen in the logs this messages with vacuum, Pg always crash with
the query :

"select count(*) as pages from pg_freespacemap_pages"

We can see in Munin ( graph attached ), the behavior :

The big increase, then, the first crash, and, a each time there is a
significat increase, a crash, with a reset of FSM.

I had disable the plugin, so there is no more queries with
pg_freespacemap, and no crash.

--
Sébastien Lardière

Sébastien Lardière

slardiere@hi-media.com

almost 17 years ago

In reply to: Tom Lane (#2)

Re: BUG #5004: pg_freespacemap make a SegFault

On 21/08/2009 18:51, Tom Lane wrote:

"Sebastien Lardiere"<slardiere@hi-media.com> writes:

Description: pg_freespacemap make a SegFault

There's a post-8.3.7 fix that might cure this:

http://archives.postgresql.org/pgsql-committers/2009-04/msg00108.php

regards, tom lane

Ok, I'll try to appli this patch,

Thanks,

--
Sï¿½bastien Lardiï¿½re

BUG #5004: pg_freespacemap make a SegFault

Attachments: