BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

Started by TAKATSUKA Harukaover 12 years ago4 messagesbugs

harukat@sraoss.co.jp

over 12 years ago

The following bug has been logged on the website:

Bug reference: 8397
Logged by: TAKATSUKA Haruka
Email address: harukat@sraoss.co.jp
PostgreSQL version: 9.2.4
Operating system: Linux (CentOS6)
Description:

Hi.

I report a small bug.
pg_basebackup -x from new standby server sometimes causes Segmentation
fault.

(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
new standby's wal sender crash

new standby server's core file:

Core was generated by `postgres: wal sender process postgres ::1(55210)
sending backup "pg_basebackup'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
#1 0x0000003b73675990 in _IO_str_init_static_internal () from
/lib64/libc.so.6
#2 0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
#3 0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
#4 0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
tblspcdir=0xd424c0) at basebackup.c:304
#5 0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
at basebackup.c:558
#6 0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
#7 WalSndHandshake () at walsender.c:257
#8 WalSenderMain () at walsender.c:181
#9 0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
#10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
#11 BackendStartup () at postmaster.c:3304
#12 ServerLoop () at postmaster.c:1367
#13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1127
#14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199

./backend/replication/basebackup.c:304
XLogFromFileName(walFiles[0], &tli, &logid, &logseg);

In this case, nWalFiles = 0 and walFiles[] palloced zero size.

Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".

regards,

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Magnus Hagander

magnus@hagander.net

over 12 years ago

In reply to: TAKATSUKA Haruka (#1)

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

On Sat, Aug 24, 2013 at 1:46 PM, <harukat@sraoss.co.jp> wrote:

The following bug has been logged on the website:

Bug reference: 8397
Logged by: TAKATSUKA Haruka
Email address: harukat@sraoss.co.jp
PostgreSQL version: 9.2.4
Operating system: Linux (CentOS6)
Description:

Hi.

I report a small bug.
pg_basebackup -x from new standby server sometimes causes Segmentation
fault.

(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
new standby's wal sender crash

new standby server's core file:

Core was generated by `postgres: wal sender process postgres ::1(55210)
sending backup "pg_basebackup'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.107.el6.x86_64 libxml2-2.7.6-4.el6.x86_64
zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 0x0000003b7368ac66 in __rawmemchr_sse2 () from /lib64/libc.so.6
#1 0x0000003b73675990 in _IO_str_init_static_internal () from
/lib64/libc.so.6
#2 0x0000003b73669935 in vsscanf () from /lib64/libc.so.6
#3 0x0000003b736639a8 in sscanf () from /lib64/libc.so.6
#4 0x0000000000622351 in perform_base_backup (opt=0x7fffc2e22300,
tblspcdir=0xd424c0) at basebackup.c:304
#5 0x0000000000622c50 in SendBaseBackup (cmd=<value optimized out>)
at basebackup.c:558
#6 0x000000000061f5b0 in HandleReplicationCommand () at walsender.c:482
#7 WalSndHandshake () at walsender.c:257
#8 WalSenderMain () at walsender.c:181
#9 0x0000000000650b12 in PostgresMain (argc=1, argv=<value optimized out>,
dbname=0xc82a90 "", username=0xc82a70 "postgres") at postgres.c:3715
#10 0x000000000060c4f1 in BackendRun () at postmaster.c:3614
#11 BackendStartup () at postmaster.c:3304
#12 ServerLoop () at postmaster.c:1367
#13 0x000000000060f031 in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1127
#14 0x00000000005ae140 in main (argc=5, argv=0xc80bb0) at main.c:199

./backend/replication/basebackup.c:304
XLogFromFileName(walFiles[0], &tli, &logid, &logseg);

In this case, nWalFiles = 0 and walFiles[] palloced zero size.

Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".

Yes, we definitely need better error checking there - a crash is never
the right answer.

Does this happen only when you take a backup "really quickly" after
setting up the new standby, or is there some scenario further in it's
lifetime when it can happen? In the first case, throwing a hard error
seems quite reasonable, but if it's repeatable, perhaps there is
something better we can do?

Also, while we definitely need a sanity check at this point, might it
be worth it to put a second check earlier in the process as well -
since AFAICT this error gets thrown only after all the data has been
sent arlready.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

TAKATSUKA Haruka

harukat@sraoss.co.jp

over 12 years ago

In reply to: Magnus Hagander (#2)

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

Thanks for the response.

On Sat, 24 Aug 2013 17:04:21 +0200
Magnus Hagander <magnus@hagander.net> wrote:

(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
new standby's wal sender crash

(snip)

Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".

Yes, we definitely need better error checking there - a crash is never
the right answer.

Does this happen only when you take a backup "really quickly" after
setting up the new standby,

It's just this first case.
Therefore, we recognize that it is the problem of how to use.

regards,

or is there some scenario further in it's
lifetime when it can happen? In the first case, throwing a hard error
seems quite reasonable, but if it's repeatable, perhaps there is
something better we can do?

Also, while we definitely need a sanity check at this point, might it
be worth it to put a second check earlier in the process as well -
since AFAICT this error gets thrown only after all the data has been
sent arlready.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

______________________________________________________
harukat@sraoss.co.jp (SRA OSS, Inc. http://www.sraoss.co.jp)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Magnus Hagander

magnus@hagander.net

over 12 years ago

In reply to: TAKATSUKA Haruka (#3)

Re: BUG #8397: pg_basebackup -x from new standby server sometimes causes Segmentation fault

On Sun, Aug 25, 2013 at 9:05 AM, TAKATSUKA Haruka <harukat@sraoss.co.jp> wrote:

Thanks for the response.

On Sat, 24 Aug 2013 17:04:21 +0200
Magnus Hagander <magnus@hagander.net> wrote:

(1) create new standby server dir by pg_basebackup without -x
(2) start new standby server
(3) pg_basebackup from new standby server with -x
(!) when new standby has no WAL files in pg_xlog,
new standby's wal sender crash

(snip)

Though pg_basebackup does not have to work in this rare case,
we should insert something like "if (nWalFiles <= 0) ereport(...);".

Yes, we definitely need better error checking there - a crash is never
the right answer.

Does this happen only when you take a backup "really quickly" after
setting up the new standby,

It's just this first case.
Therefore, we recognize that it is the problem of how to use.

Yeah. Ok, for now I have the patch I applied yesterday that makes it
an error instead of a crash per your suggestion. And if I failed to
mention it, thanks for the report!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs