Wal sender segfault

Started by Dmitriy Sarafannikovabout 10 years ago5 messagesbugs
Jump to latest
#1Dmitriy Sarafannikov
dimon99901@mail.ru

Hi, i'm trying to test logical decoding on server under load.
I launched pg_recvlogical with 'test_decoding' plugin and wal sender was crashed with segfault after several minutes of work.

pg_recvlogical --start --slot test_slot --no-loop -d dbname -h 127.0.0.1 -p5432 -U dbuser -w -f /tmp/test_logical.xlog

postgres=# select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row) I have core dump file (size 66G)

I launch gdb with core file and getting incomplete backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
(gdb) bt
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
#1 0x00007fa9248acde2 in LogicalDecodingProcessRecord ()
#2 0x00007fa9248b53b4 in ?? ()
#3 0x00007fa9248b6ed3 in ?? ()
#4 0x00007fa9248b7d8a in exec_replication_command ()
#5 0x00007fa9248f39fe in PostgresMain ()
#6 0x00007fa9246bb92e in ?? ()
#7 0x00007fa92489e58b in PostmasterMain ()
#8 0x00007fa9246bcac2 in main ()
What can i do to get more info about the reason of this segfault?

--
Best regards,
Dmitriy Sarafannikov

#2Andres Freund
andres@anarazel.de
In reply to: Dmitriy Sarafannikov (#1)
Re: Wal sender segfault

Hi,

Thanks for the report!

On 2016-01-22 15:45:27 +0300, Dmitriy Sarafannikov wrote:

Hi, i'm trying to test logical decoding on server under load.
I launched pg_recvlogical with 'test_decoding' plugin and wal sender was crashed with segfault after several minutes of work.

pg_recvlogical --start --slot test_slot --no-loop -d dbname -h 127.0.0.1 -p5432 -U dbuser -w -f /tmp/test_logical.xlog

postgres=# select version();
version
-----------------------------------------------------------------------------------------------
PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
(1 row) I have core dump file (size 66G)

I launch gdb with core file and getting incomplete backtrace:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
(gdb) bt
#0 0x00007fa9248af742 in ReorderBufferGetTupleBuf ()
#1 0x00007fa9248acde2 in LogicalDecodingProcessRecord ()
#2 0x00007fa9248b53b4 in ?? ()
#3 0x00007fa9248b6ed3 in ?? ()
#4 0x00007fa9248b7d8a in exec_replication_command ()
#5 0x00007fa9248f39fe in PostgresMain ()
#6 0x00007fa9246bb92e in ?? ()
#7 0x00007fa92489e58b in PostmasterMain ()
#8 0x00007fa9246bcac2 in main ()

Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly. I'm working on fixing that bug.

What can i do to get more info about the reason of this segfault?

You could post a reproducible example... Other than that it's usually
helpful to build postgres with debugging symbols enabled, that'd already
give more context.

Regards,

Andres Freund

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Dmitriy Sarafannikov
dimon99901@mail.ru
In reply to: Andres Freund (#2)
Re[2]: [BUGS] Wal sender segfault

Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly. I'm working on fixing that bug.

Thanks,

Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many fields and big text fields.

--
Best regards,
Dmitriy Sarafannikov

#4Michael Paquier
michael@paquier.xyz
In reply to: Dmitriy Sarafannikov (#3)
Re: Re[2]: [BUGS] Wal sender segfault

On Fri, Jan 22, 2016 at 11:19 PM, Dmitriy Sarafannikov
<dimon99901@mail.ru> wrote:

Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly. I'm working on fixing that bug.

Thanks,

Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many
fields and big text fields.

The original bug is here:
/messages/by-id/CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Dmitriy Sarafannikov
dimon99901@mail.ru
In reply to: Michael Paquier (#4)
Re[2]: [BUGS] Re[2]: [BUGS] Wal sender segfault

this is the same bug?

(gdb) bt
#0 slist_pop_head_node (head=0x7fa92656bc10) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/include/lib/ilist.h:649
#1 ReorderBufferGetTupleBuf (rb=0x7fa92656bb90) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/reorderbuffer.c:456
#2 0x00007fa9248acde2 in DecodeUpdate (ctx=<optimized out>, ctx=<optimized out>, buf=0x7fffecd7e300) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:651
#3 DecodeHeapOp (buf=0x7fffecd7e300, ctx=0x7fa92655db60) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:430
#4 LogicalDecodingProcessRecord (ctx=0x7fa92655db60, record=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/logical/decode.c:115
#5 0x00007fa9248b53b4 in XLogSendLogical () at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:2428
#6 0x00007fa9248b6ed3 in WalSndLoop (send_data=send_data@entry=0x7fa9248b5350 <XLogSendLogical>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:1829
#7 0x00007fa9248b7d8a in StartLogicalReplication (cmd=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:996
#8 exec_replication_command (cmd_string=cmd_string@entry=0x7fa926494820 "START_REPLICATION SLOT \"test_slot\" LOGICAL 0/0")
at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/replication/walsender.c:1321
#9 0x00007fa9248f39fe in PostgresMain (argc=<optimized out>, argv=argv@entry=0x7fa92647b448, dbname=0x7fa92647b370 "dbname", username=<optimized out>)
at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/tcop/postgres.c:4077
#10 0x00007fa9246bb92e in BackendRun (port=0x7fa9264bc3d0) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:4252
#11 BackendStartup (port=0x7fa9264bc3d0) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:3917
#12 ServerLoop () at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:1678
#13 0x00007fa92489e58b in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/postmaster/postmaster.c:1287
#14 0x00007fa9246bcac2 in main (argc=5, argv=0x7fa92647a570) at /build/postgresql-9.4-MZhK6O/postgresql-9.4-9.4.5/build/../src/backend/main/main.c:228

Воскресенье, 24 января 2016, 0:08 +09:00 от Michael Paquier <michael.paquier@gmail.com>:

On Fri, Jan 22, 2016 at 11:19 PM, Dmitriy Sarafannikov
< dimon99901@mail.ru > wrote:

Any chance that one of modfied tables in question uses REPLICA IDENTITY
FULL? There's an open bug about too large rows produced by that, which
we don't currently handle correctly. I'm working on fixing that bug.

Thanks,

Yes, i have 3 tables with REPLICA IDENTITY FULL. 2 of this tables have many
fields and big text fields.

The original bug is here:
/messages/by-id/CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Best regards,
Dmitriy Sarafannikov