BUG #14218: pg_logical_slot_get_changes causes segmentation fault

Started by Alexey Kuntsevichalmost 10 years ago4 messagesbugs
Jump to latest
#1Alexey Kuntsevich
alexey.kuntsevich@gmail.com

The following bug has been logged on the website:

Bug reference: 14218
Logged by: Alexey Kuntsevich
Email address: alexey.kuntsevich@gmail.com
PostgreSQL version: 9.5.2
Operating system: Debian Jessie 8.4 x64
Description:

We enabled logical replication on our postgresql cluster, created a new
replication slot with vanilla test_decoding decoder and ran an app that
polls 10000 rows from this slot once per second. It ran fine for a day,
survived several bulk uploads of ~1mln rows until we did bulk upload of
~8mln rows at once. During the upload our postgresql instance disconnected
all clients and reported that it is in recovery. Core dump was created and
when we checked the logs we saw

2016-06-28 22:16:55 GMT [23598]: [28-1] db=,user=,app=,client= LOG: 00000:
server process (PID 8369) was terminated by signal 11: Segmentation fault
2016-06-28 22:16:55 GMT [23598]: [29-1] db=,user=,app=,client= DETAIL:
Failed process was running: SELECT * FROM
pg_logical_slot_get_changes('regression_slot', NULL, NULL) LIMIT 10000;
2016-06-28 22:16:55 GMT [23598]: [30-1] db=,user=,app=,client= LOCATION:
LogChildExit, postmaster.c:3471
2016-06-28 22:16:55 GMT [23598]: [31-1] db=,user=,app=,client= LOG: 00000:
terminating any other active server processes
2016-06-28 22:16:55 GMT [23598]: [32-1] db=,user=,app=,client= LOCATION:
HandleChildCrash, postmaster.c:3191
2016-06-28 22:16:55 GMT [24845]: [85-1]
db=dw,user=app,app=monitor_node,client=<hidden ip1> WARNING: 57P02:
terminating connection because of crash of another server process
2016-06-28 22:16:55 GMT [24845]: [86-1]
db=dw,user=app,app=monitor_node,client=<hidden ip1> DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2016-06-28 22:16:55 GMT [24845]: [87-1]
db=dw,user=app,app=monitor_node,client=<hidden ip1> HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2016-06-28 22:16:55 GMT [24845]: [88-1]
db=dw,user=app,app=monitor_node,client=<hidden ip1> LOCATION: quickdie,
postgres.c:2612
2016-06-28 22:16:55 GMT [24849]: [193-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> WARNING: 57P02:
terminating connection because of crash of another server process
2016-06-28 22:16:55 GMT [24849]: [194-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2016-06-28 22:16:55 GMT [24849]: [195-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2016-06-28 22:16:55 GMT [24849]: [196-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> LOCATION: quickdie,
postgres.c:2612
2016-06-28 22:16:55 GMT [8370]: [58059-1]
db=dw,user=app,app=BonusService,client=<hidden ip2> WARNING: 57P02:
terminating connection because of crash of another server process
2016-06-28 22:16:55 GMT [8370]: [58060-1]
db=dw,user=app,app=BonusService,client=<hidden ip2> DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2016-06-28 22:16:55 GMT [8370]: [58061-1]
db=dw,user=app,app=BonusService,client=<hidden ip2> HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2016-06-28 22:16:55 GMT [8370]: [58062-1]
db=dw,user=app,app=BonusService,client=<hidden ip2> LOCATION: quickdie,
postgres.c:2612
2016-06-28 22:16:55 GMT [24848]: [63-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> WARNING: 57P02:
terminating connection because of crash of another server process
2016-06-28 22:16:55 GMT [24848]: [64-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2016-06-28 22:16:55 GMT [24848]: [65-1]
db=dw,user=app,app=monitor_node,client=<hidden ip2> HINT: In a moment you
should be able to reconnect to the database and repeat your command.

We're able to reproduce the issue atm.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Alexey Kuntsevich (#1)
Re: BUG #14218: pg_logical_slot_get_changes causes segmentation fault

On Wed, Jun 29, 2016 at 12:10 AM, <alexey.kuntsevich@gmail.com> wrote:

We enabled logical replication on our postgresql cluster, created a new
replication slot with vanilla test_decoding decoder and ran an app that
polls 10000 rows from this slot once per second. It ran fine for a day,
survived several bulk uploads of ~1mln rows until we did bulk upload of
~8mln rows at once. During the upload our postgresql instance disconnected
all clients and reported that it is in recovery. Core dump was created and
when we checked the logs we saw

Can you show a backtrace from the coredump?

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Alexey Kuntsevich
alexey.kuntsevich@gmail.com
In reply to: Peter Geoghegan (#2)
Re: BUG #14218: pg_logical_slot_get_changes causes segmentation fault

Hi, Peter

Here is the backtrace:

Reading symbols from /usr/lib/postgresql/9.5/bin/postgres...(no debugging
symbols found)...done.
[New LWP 33731]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: docker turf 172.30.32.56(50865) SELECT
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fdd58521f20 in ReorderBufferCommit ()
(gdb) bt
#0 0x00007fdd58521f20 in ReorderBufferCommit ()
#1 0x00007fdd5851d7b0 in LogicalDecodingProcessRecord ()
#2 0x00007fdd5851f1c5 in ?? ()
#3 0x00007fdd58466ac2 in ExecMakeTableFunctionResult ()
#4 0x00007fdd5847be52 in ?? ()
#5 0x00007fdd58468c73 in ExecScan ()
#6 0x00007fdd58461498 in ExecProcNode ()
#7 0x00007fdd5845e34e in standard_ExecutorRun ()
#8 0x00007fdd5856b1ff in ?? ()
#9 0x00007fdd5856c808 in PortalRun ()
#10 0x00007fdd58569501 in PostgresMain ()
#11 0x00007fdd58303c31 in ?? ()
#12 0x00007fdd5850d54e in PostmasterMain ()
#13 0x00007fdd58304db7 in main ()

On Wed, Jun 29, 2016 at 9:19 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Wed, Jun 29, 2016 at 12:10 AM, <alexey.kuntsevich@gmail.com> wrote:

We enabled logical replication on our postgresql cluster, created a new
replication slot with vanilla test_decoding decoder and ran an app that
polls 10000 rows from this slot once per second. It ran fine for a day,
survived several bulk uploads of ~1mln rows until we did bulk upload of
~8mln rows at once. During the upload our postgresql instance

disconnected

all clients and reported that it is in recovery. Core dump was created

and

when we checked the logs we saw

Can you show a backtrace from the coredump?

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Peter Geoghegan

--
Best regards,
Alexey Kuntsevich

In reply to: Alexey Kuntsevich (#3)
Re: BUG #14218: pg_logical_slot_get_changes causes segmentation fault

On Wed, Jun 29, 2016 at 4:55 AM, Alexey Kuntsevich
<alexey.kuntsevich@gmail.com> wrote:

Here is the backtrace:

Reading symbols from /usr/lib/postgresql/9.5/bin/postgres...(no debugging
symbols found)...done.

That isn't very useful, since you don't have debugging symbols. See
that Wiki page.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs