BUG #14180: Segmentation fault on replication slave
The following bug has been logged on the website:
Bug reference: 14180
Logged by: Bo Ørsted Andresen
Email address: boa@neogrid.dk
PostgreSQL version: 9.5.3
Operating system: Ubuntu 16.04 LTS
Description:
Hello,
We have a replication slot setup where the replication causes a segmentation
fault within eight hours after a rebuild of the slave.
In the following the master is IP 10.0.0.2 and the slave is IP 10.0.0.3.
On the master we have the defaults and the following:
/etc/postgresql/9.5/main/postgresql.conf
----------------------------------------
listen_addresses = '*'
port = 5433
wal_level = archive
archive_mode = on
archive_command = 'test ! -f /mnt/postgres_archive/%f && cp %p
/mnt/postgres_archive/%f'
max_wal_senders = 5
wal_keep_segments = 4000
max_replication_slots = 1
timezone = 'UTC'
----------------------------------------
/etc/postgresql/9.5/main/pg_hba.conf
----------------------------------------
host replication postgres 10.0.0.3/32 trust
----------------------------------------
On the slave we have the defaults and the timezone changed:
/etc/postgresql/9.5/main/postgresql.conf.
----------------------------------------
timezone = 'UTC'
----------------------------------------
On the master we run the SQL query:
SELECT * FROM pg_create_physical_replication_slot('slave');
On the slave we run the command:
pg_basebackup -P -R -X stream -c fast -h 10.0.0.2 -p 5433 -U postgres -D
/var/lib/postgresql/9.5/main
After this recovery.conf looks like this (where we added the slot line):
standby_mode = 'on'
primary_conninfo = 'user=postgres host=10.0.0.2 port=5433 sslmode=prefer
sslcompression=1 krbsrvname=postgres'
primary_slot_name = 'slave'
Then we fix the ownership and start the slave database. After a while -
anything between ten minutes and eight hours we get this error in the log
file:
2016-06-03 05:55:27 UTC [27303-4] LOG: startup process (PID 27305) was
terminated by signal 11: Segmentation fault
2016-06-03 05:55:27 UTC [27303-5] LOG: terminating any other active server
processes
If we attach with gdb before the segmentation fault we get:
# gdb -p 30524
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 30524
Reading symbols from /usr/lib/postgresql/9.5/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/c6/7444cae2dbc6bcac46e8052921c01c06780d72.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libxml2.so.2...Reading
symbols from
/usr/lib/debug/.build-id/a1/55c7bc345d0e0b711be09120204bd88f475f9e.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpam.so.0...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libssl.so.1.0.0...Reading symbols
from
/usr/lib/debug/.build-id/82/2754695e4b31ae82937258bdff3d52efa0ba36.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0...Reading
symbols from
/usr/lib/debug/.build-id/b7/5a96c59be1b5b54fbf1a91ed722bec9406288e.debug...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/librt-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/libm-2.23.so...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2...Reading
symbols from
/usr/lib/debug/.build-id/ad/f6f41f223d42193165fa0c55871f02d915fb19.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicuuc.so.55...Reading
symbols from
/usr/lib/debug/.build-id/32/3e4878073bb4e0d7b174ae24e383ec5e05d68a.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...Reading symbols from
/usr/lib/debug/.build-id/34/0b7b463f981b8a0fb3451751f881df1b0c2f74.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/liblzma.so.5...(no debugging
symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libaudit.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.3...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libk5crypto.so.3...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libcom_err.so.2...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libcom_err.so.2.1...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5support.so.0...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols
from
/usr/lib/debug/.build-id/b7/7847cc9cacbca3b5753d0d25a32e5795afe75b.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2...Reading
symbols from
/usr/lib/debug/.build-id/6b/9f4061a1d44813a54da4dbb0088f529d8d78ea.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libresolv.so.2...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libresolv-2.23.so...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libsasl2.so.2...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgssapi.so.3...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgnutls.so.30...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libicudata.so.55...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no
debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libgcc_s.so.1...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libkeyutils.so.1...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libheimntlm.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libkrb5.so.26...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libasn1.so.8...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libhcrypto.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libroken.so.18...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libidn.so.11...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libtasn1.so.6...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libnettle.so.6...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libhogweed.so.4...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libgmp.so.10...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libwind.so.0...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libheimbase.so.1...(no
debugging symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libhx509.so.5...(no debugging
symbols found)...done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libsqlite3.so.0...Reading
symbols from
/usr/lib/debug/.build-id/d9/782ba023caec26b15d8676e3a5d07b55e121ef.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libcrypt.so.1...Reading symbols
from /usr/lib/debug//lib/x86_64-linux-gnu/libcrypt-2.23.so...done.
done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libffi.so.6...Reading symbols
from
/usr/lib/debug/.build-id/9d/9c958f1f4894afef6aecd90d1c430ea29ac34f.debug...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading
symbols from
/usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.23.so...done.
done.
0x00007f819925de70 in __poll_nocancel () at
../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set pagination off
(gdb) set logging file /tmp/gdb.log
(gdb) set logging on
Copying output to /tmp/gdb.log
(gdb) handle SIGUSR1 nostop
Signal Stop Print Pass to program Description
SIGUSR1 No Yes Yes User defined signal 1
(gdb) handle SIGUSR1 noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
_bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
57
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:
No such file or directory.
(gdb) bt
#0 _bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
#1 0x0000000000000000 in ?? ()
(gdb) p from
$1 = 0x55a0945abb70 "\036"
(gdb) p end
$2 = 0x55a0945ac928 "\305cO"
(gdb) p i
$3 = 3324
(gdb) p len
$4 = <optimized out>
(gdb) p &itupdata
$5 = (IndexTupleData *) 0x7ffe83ea84e0
(gdb) p items
$6 = {0x0 <repeats 408 times>}
(gdb) p &items
$7 = (Item (*)[408]) 0x7ffe83ea8820
(gdb) p itemsz
$8 = <optimized out>
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/lib/postgresql/9.5/bin/postgres
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
"root" execution of the PostgreSQL server is not permitted.
The server must be started under an unprivileged user ID to prevent
possible system security compromise. See the documentation for
more information on how to properly start the server.
[Inferior 1 (process 1887) exited with code 01]
(gdb) quit
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
thanks for reporting this issue.
On 2016-06-07 09:16:18 +0000, boa@neogrid.dk wrote:
Program received signal SIGSEGV, Segmentation fault.
_bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
57
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:
No such file or directory.
(gdb) bt
#0 _bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
#1 0x0000000000000000 in ?? ()
(gdb) p from
$1 = 0x55a0945abb70 "\036"
(gdb) p end
$2 = 0x55a0945ac928 "\305cO"
(gdb) p i
$3 = 3324
(gdb) p len
$4 = <optimized out>
(gdb) p &itupdata
$5 = (IndexTupleData *) 0x7ffe83ea84e0
(gdb) p items
$6 = {0x0 <repeats 408 times>}
(gdb) p &items
$7 = (Item (*)[408]) 0x7ffe83ea8820
(gdb) p itemsz
$8 = <optimized out>
Uhm, this is a bit odd. There's no backtrace, but types and such are
known? I guess you do have the debug symbols installed?
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 19:21, Andres Freund wrote:
Program received signal SIGSEGV, Segmentation fault.
_bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
57
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:
No such file or directory.
(gdb) bt
#0 _bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70
"\036", len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend
/access/nbtree/nbtxlog.c:57
#1 0x0000000000000000 in ?? ()
(gdb) p from
$1 = 0x55a0945abb70 "\036"
(gdb) p end
$2 = 0x55a0945ac928 "\305cO"
(gdb) p i
$3 = 3324
(gdb) p len
$4 = <optimized out>
(gdb) p &itupdata
$5 = (IndexTupleData *) 0x7ffe83ea84e0
(gdb) p items
$6 = {0x0 <repeats 408 times>}
(gdb) p &items
$7 = (Item (*)[408]) 0x7ffe83ea8820
(gdb) p itemsz
$8 = <optimized out>Uhm, this is a bit odd. There's no backtrace, but types and such are known? I
guess you do have the debug symbols installed?
Yeah, it's confusing. I installed the result of ./list-dbgsym-packages-v2.1.sh -p $pid which gave:
libcomerr2-dbg libffi6-dbg libgcc1-dbg libicu55-dbg libldap-2.4-2-dbg libsqlite3-0-dbg libssl1.0.0-dbg libxml2-dbg postgresql-9.5-dbg zlib1g-dbg
Not sure what else I can do short of recompiling postgresql mysql.
Greetings,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 17:36:38 +0000, Bo �rsted Andresen wrote:
On 2016-06-07 19:21, Andres Freund wrote:
Program received signal SIGSEGV, Segmentation fault.
_bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70 "\036",
len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
57
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:
No such file or directory.
(gdb) bt
#0 _bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70
"\036", len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend
/access/nbtree/nbtxlog.c:57
#1 0x0000000000000000 in ?? ()
(gdb) p from
$1 = 0x55a0945abb70 "\036"
(gdb) p end
$2 = 0x55a0945ac928 "\305cO"
(gdb) p i
$3 = 3324
(gdb) p len
$4 = <optimized out>
(gdb) p &itupdata
$5 = (IndexTupleData *) 0x7ffe83ea84e0
(gdb) p items
$6 = {0x0 <repeats 408 times>}
(gdb) p &items
$7 = (Item (*)[408]) 0x7ffe83ea8820
(gdb) p itemsz
$8 = <optimized out>Uhm, this is a bit odd. There's no backtrace, but types and such are known? I
guess you do have the debug symbols installed?Yeah, it's confusing. I installed the result of ./list-dbgsym-packages-v2.1.sh -p $pid which gave:
libcomerr2-dbg libffi6-dbg libgcc1-dbg libicu55-dbg libldap-2.4-2-dbg libsqlite3-0-dbg libssl1.0.0-dbg libxml2-dbg postgresql-9.5-dbg zlib1g-dbg
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the
installed binaries / debug symbols?
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 19:41, Andres Freund wrote:
Program received signal SIGSEGV, Segmentation fault.
_bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70
"\036", len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:57
57
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/backend/access/nbtree/nbtxlog.c:
No such file or directory.
(gdb) bt
#0 _bt_restore_page (page=0x7f816fce2b40 "", from=0x55a0945abb70
"\036", len=<optimized out>) at
/build/postgresql-9.5-xp9utH/postgresql-9.5-9.5.3/build/../src/bac
kend
/access/nbtree/nbtxlog.c:57
#1 0x0000000000000000 in ?? ()
(gdb) p from
$1 = 0x55a0945abb70 "\036"
(gdb) p end
$2 = 0x55a0945ac928 "\305cO"
(gdb) p i
$3 = 3324
(gdb) p len
$4 = <optimized out>
(gdb) p &itupdata
$5 = (IndexTupleData *) 0x7ffe83ea84e0
(gdb) p items
$6 = {0x0 <repeats 408 times>}
(gdb) p &items
$7 = (Item (*)[408]) 0x7ffe83ea8820
(gdb) p itemsz
$8 = <optimized out>Uhm, this is a bit odd. There's no backtrace, but types and such are
known? I guess you do have the debug symbols installed?Yeah, it's confusing. I installed the result of ./list-dbgsym-packages-v2.1.sh -
p $pid which gave:
libcomerr2-dbg libffi6-dbg libgcc1-dbg libicu55-dbg libldap-2.4-2-dbg
libsqlite3-0-dbg libssl1.0.0-dbg libxml2-dbg postgresql-9.5-dbg
zlib1g-dbgNot sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the installed
binaries / debug symbols?
You mean that I upgraded without restarting postgres before the segfault?
Before this I have had the same problem with Ubuntu 14.04 with postgresql 9.5.3 from a ppa. I then tried building a new VM based on Ubuntu 16.04 on the 3rd of June to see if there was something wrong with the slave VM. So it's a clean install where I installed latest postgresql once and ran it. I've reproduced the problem three times since that happened with one reboot in between. Of course what hasn't changed during all of this is the master.
The version of postgresql-client-9.5, libpq5, postgreql-9.5, postgresql-contrib-9.5 and postgresql-9.5-dbg are all at 9.5.3-0ubuntu0.16.04.
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 17:53:46 +0000, Bo �rsted Andresen wrote:
On 2016-06-07 19:41, Andres Freund wrote:
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the installed
binaries / debug symbols?You mean that I upgraded without restarting postgres before the segfault?
Yes, that's what I was wondering. But alas, that's aparently not the
reason.
This is going to be a bit more complicated, sorry :(
Could you try to reproduce the problem, and do 'p/x ReadRecPtr'? That
should give you something like 0x5434343496. If you rewrite this as
first-four-bytes/last-four-bytes e.g. 54/34343496 you get the LSN. With
that, could you try
pg_xlogdump -p /path/to/data/directory -s 54/34343496 -n 100
and send the output?
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 20:00, Andres Freund wrote:
On 2016-06-07 19:41, Andres Freund wrote:
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the
installed binaries / debug symbols?You mean that I upgraded without restarting postgres before the segfault?
Yes, that's what I was wondering. But alas, that's aparently not the reason.
This is going to be a bit more complicated, sorry :(
Could you try to reproduce the problem, and do 'p/x ReadRecPtr'? That
should give you something like 0x5434343496. If you rewrite this as first-four-
bytes/last-four-bytes e.g. 54/34343496 you get the LSN. With that, could you
try pg_xlogdump -p /path/to/data/directory -s 54/34343496 -n 100 and send
the output?
Will do. May take a while.
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 18:04:50 +0000, Bo �rsted Andresen wrote:
On 2016-06-07 20:00, Andres Freund wrote:
On 2016-06-07 19:41, Andres Freund wrote:
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the
installed binaries / debug symbols?You mean that I upgraded without restarting postgres before the segfault?
Yes, that's what I was wondering. But alas, that's aparently not the reason.
This is going to be a bit more complicated, sorry :(
Could you try to reproduce the problem, and do 'p/x ReadRecPtr'? That
should give you something like 0x5434343496. If you rewrite this as first-four-
bytes/last-four-bytes e.g. 54/34343496 you get the LSN. With that, could you
try pg_xlogdump -p /path/to/data/directory -s 54/34343496 -n 100 and send
the output?Will do. May take a while.
To clarify: After the crash recovery continues successfully? Or do you
have to scrap te standby?
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
=?iso-8859-1?Q?Bo_=D8rsted_Andresen?= <boa@neogrid.dk> writes:
On 2016-06-07 19:41, Andres Freund wrote:
Any chance the running version of postgres is out of date with the installed
binaries / debug symbols?
You mean that I upgraded without restarting postgres before the segfault?
I think the reason for the lack of useful backtrace info is that we've
smashed the stack. Note that the original report shows i == 3324 which is
much larger than the available length of the local items[] array (408).
So presumably, the passed-in "len" was bogus (much too large).
If you're prepared to build a custom version of Postgres, you could
try adding this to _bt_restore_page():
/* Need to copy tuple header due to alignment considerations */
memcpy(&itupdata, from, sizeof(IndexTupleData));
itemsz = IndexTupleDSize(itupdata);
itemsz = MAXALIGN(itemsz);
+ if (i >= lengthof(items))
+ elog(PANIC, "too many items on btree page");
+
items[i] = (Item) from;
itemsizes[i] = itemsz;
i++;
from += itemsz;
and then you should get a core dump before the stack is clobbered.
I wonder whether we shouldn't add such a check to the regular sources...
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 20:07, Andres Freund wrote:
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with
the installed binaries / debug symbols?You mean that I upgraded without restarting postgres before the
segfault?
Yes, that's what I was wondering. But alas, that's aparently not the
reason.
This is going to be a bit more complicated, sorry :(
Could you try to reproduce the problem, and do 'p/x ReadRecPtr'?
That should give you something like 0x5434343496. If you rewrite
this as first-four- bytes/last-four-bytes e.g. 54/34343496 you get
the LSN. With that, could you try pg_xlogdump -p
/path/to/data/directory -s 54/34343496 -n 100 and send the output?Will do. May take a while.
To clarify: After the crash recovery continues successfully? Or do you have to
scrap te standby?
The crash recovery does not continue successfully. I don't know of a way to attach in gdb to the process that crashes before it already crashed, which does not involve scrapping the standby.
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 18:15:14 +0000, Bo �rsted Andresen wrote:
The crash recovery does not continue successfully. I don't know of a way to attach in gdb to the process that crashes before it already crashed, which does not involve scrapping the standby.
gdb --args /path/to/postgres --single postgres -D /path/to/datadir
(gdb) run
should probably do the trick.
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Fra: [mailto:andres@anarazel.de]
Sendt: 07 June 2016 20:17
Til: Bo Ørsted Andresen <boa@neogrid.dk>
Cc: pgsql-bugs@postgresql.org
Emne: Re: SV: [BUGS] BUG #14180: Segmentation fault on replication slaveOn 2016-06-07 20:17, Andres Freund wrote:
The crash recovery does not continue successfully. I don't know of a way to
attach in gdb to the process that crashes before it already crashed, which
does not involve scrapping the standby.gdb --args /path/to/postgres --single postgres -D /path/to/datadir
(gdb) runshould probably do the trick.
Will try that next time, thanks. Unfornately it's gone. :(
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 20:08, Tom Lane wrote:
I think the reason for the lack of useful backtrace info is that we've smashed
the stack. Note that the original report shows i == 3324 which is much larger
than the available length of the local items[] array (408).
So presumably, the passed-in "len" was bogus (much too large).If you're prepared to build a custom version of Postgres, you could try adding
this to _bt_restore_page():/* Need to copy tuple header due to alignment
considerations */
memcpy(&itupdata, from, sizeof(IndexTupleData));
itemsz = IndexTupleDSize(itupdata);
itemsz = MAXALIGN(itemsz);+ if (i >= lengthof(items)) + elog(PANIC, "too many items on btree page"); + items[i] = (Item) from; itemsizes[i] = itemsz; i++;from += itemsz;
and then you should get a core dump before the stack is clobbered.
I wonder whether we shouldn't add such a check to the regular sources...
Will give it a shot tomorrow.
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2016-06-07 20:00, Andres Freund wrote:
Not sure what else I can do short of recompiling postgresql mysql.
Any chance the running version of postgres is out of date with the
installed binaries / debug symbols?You mean that I upgraded without restarting postgres before the segfault?
Yes, that's what I was wondering. But alas, that's aparently not the reason.
This is going to be a bit more complicated, sorry :(
Could you try to reproduce the problem, and do 'p/x ReadRecPtr'? That
should give you something like 0x5434343496. If you rewrite this as first-four-
bytes/last-four-bytes e.g. 54/34343496 you get the LSN. With that, could you
try pg_xlogdump -p /path/to/data/directory -s 54/34343496 -n 100 and send
the output?
Output attached.
Thanks,
Bo Ørsted Andresen
Attachments:
On 2016-06-07 20:08, Tom Lane wrote:
I think the reason for the lack of useful backtrace info is that we've
smashed the stack. Note that the original report shows i == 3324
which is much larger than the available length of the local items[] array(408).
So presumably, the passed-in "len" was bogus (much too large).
If you're prepared to build a custom version of Postgres, you could
try adding this to _bt_restore_page():/* Need to copy tuple header due to alignment
considerations */
memcpy(&itupdata, from, sizeof(IndexTupleData));
itemsz = IndexTupleDSize(itupdata);
itemsz = MAXALIGN(itemsz);+ if (i >= lengthof(items)) + elog(PANIC, "too many items on btree page"); + items[i] = (Item) from; itemsizes[i] = itemsz; i++;from += itemsz;
and then you should get a core dump before the stack is clobbered.
I wonder whether we shouldn't add such a check to the regular sources...
Logged:
LOG: started streaming WAL from primary at 631/7000000 on timeline 1
PANIC: too many items on btree page
CONTEXT: xlog redo Btree/SPLIT_R: level 0, firstright 139
Bacttrace:
# gdb -p 10069
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 10069
Reading symbols from /usr/local/pgsql/bin/postgres...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/b7/7847cc9cacbca3b5753d0d25a32e5795afe75b.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.23.so...done.
done.
0x00007ffff73f3e70 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set pagination off
(gdb) set logging file /tmp/debuglog-20160608-2.txt
(gdb) set logging on
Copying output to /tmp/debuglog-20160608-2.txt.
(gdb) handle SIGUSR1 nostop
Signal Stop Print Pass to program Description
SIGUSR1 No Yes Yes User defined signal 1
(gdb) handle SIGUSR1 noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) cont
Continuing.
(gdb)
Program received signal SIGABRT, Aborted.
0x00007ffff732e418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff732e418 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff733001a in __GI_abort () at abort.c:89
#2 0x000000000078ccaa in errfinish (dummy=dummy@entry=0) at elog.c:551
#3 0x000000000079074a in elog_finish (elevel=elevel@entry=22, fmt=fmt@entry=0x7cb187 "too many items on btree page") at elog.c:1368
#4 0x00000000004ae437 in _bt_restore_page (page=page@entry=0x7fffefa2cb40 "", from=<optimized out>, from@entry=0xc52e70 "\036", len=<optimized out>) at nbtxlog.c:58
#5 0x00000000004ae8a4 in btree_xlog_split (onleft=onleft@entry=0 '\000', isroot=isroot@entry=0 '\000', record=record@entry=0xc3b840) at nbtxlog.c:241
#6 0x00000000004aee1c in btree_redo (record=0xc3b840) at nbtxlog.c:984
#7 0x00000000004d5c2b in StartupXLOG () at xlog.c:6825
#8 0x000000000064e212 in StartupProcessMain () at startup.c:215
#9 0x00000000004e3168 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7fffffffe3e0) at bootstrap.c:418
#10 0x000000000064b698 in StartChildProcess (type=StartupProcess) at postmaster.c:5199
#11 0x000000000064dc84 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xc1b9f0) at postmaster.c:1284
#12 0x0000000000467950 in main (argc=3, argv=0xc1b9f0) at main.c:228
Regards,
Bo Ørsted Andresen
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs