Psql crashes with Segmentation fault on copy from

Started by Francisco Reyesalmost 18 years ago12 messagesgeneral
Jump to latest
#1Francisco Reyes
lists@stringsutils.com

I am setting up a new machine and preparing some standard benchmark tests.
While trying to load some data using "copy from" psql is crashing.

OS Red Hat Enterprise Linux ES release 4 (Nahant Update 6)
Postgresql 8.2.7 installed from RPMs

I ran strace on psql and got:

read(5, "DAIRY QUEEN        Q68\t5814\t1215"..., 4096) = 4096
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
...... 36 other lines like the ones above/below.
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(5, "TORE 2653\t5311\t93\t5080284\n2qLMfi"..., 4096) = 3796
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
...... 36 other lines like the ones above/below.
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(5, "", 4096)                       = 0
gettimeofday({1211915058, 706827}, NULL) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

On the server I increased the verbosity to debug levels:
DEBUG: 00000: checkpoint starting
LOCATION: CreateCheckPoint, xlog.c:5584
DEBUG: 00000: checkpoint complete; 0 transaction log file(s) added, 0
removed, 0 recycled
LOCATION: CreateCheckPoint, xlog.c:5712
LOG: 08P01: unexpected EOF on client connection
LOCATION: SocketBackend, postgres.c:307
DEBUG: 00000: proc_exit(0)
LOCATION: proc_exit, ipc.c:94
DEBUG: 00000: shmem_exit(0)
LOCATION: shmem_exit, ipc.c:125
LOG: 00000: disconnection: session time: 0:28:57.253 user=benchmark
database=benchmark host=[local]
LOCATION: log_disconnections, postgres.c:3840
DEBUG: 00000: exit(0)
LOCATION: proc_exit, ipc.c:112
DEBUG: 00000: server process (PID 3112) exited with exit code 0
LOCATION: LogChildExit, postmaster.c:2425

The same data file loads into another machine with Postgresql 8.2.4 and the
same version of RedHat..

Checked the disk with the badblocks command. No errors found.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Francisco Reyes (#1)
Re: Psql crashes with Segmentation fault on copy from

"Francisco Reyes" <lists@stringsutils.com> writes:

I am setting up a new machine and preparing some standard benchmark tests.
While trying to load some data using "copy from" psql is crashing.

Can you get us a stack trace from the crash? (You'd likely need to
install the postgresql-debuginfo RPM to get a useful trace.)
Alternatively, can you provide a self-contained test case?

regards, tom lane

#3Francisco Reyes
lists@stringsutils.com
In reply to: Tom Lane (#2)
Re: Psql crashes with Segmentation fault on copy from

On 6:28 pm 05/27/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Can you get us a stack trace from the crash?

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182894175648 (LWP 4487)]
0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
#1 0x000000364bf0e0ae in PQunescapeBytea () from /usr/lib64/libpq.so.5
#2 0x000000364bf0e230 in PQunescapeBytea () from /usr/lib64/libpq.so.5
#3 0x000000364bf0c09e in PQsendQuery () from /usr/lib64/libpq.so.5
#4 0x000000364bf0c788 in PQexec () from /usr/lib64/libpq.so.5
#5 0x0000000000406e95 in ?? ()
#6 0x0000000000409dfa in ?? ()
#7 0x000000000040408d in ?? ()
#8 0x00000000004057cd in ?? ()
#9 0x0000000000406286 in ?? ()
#10 0x0000000000409f66 in ?? ()
#11 0x000000000040408d in ?? ()
#12 0x00000000004057cd in ?? ()
#13 0x0000000000406286 in ?? ()
#14 0x0000000000409f66 in ?? ()
#15 0x000000000040c687 in ?? ()
#16 0x0000003cc311c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#17 0x0000000000403d2a in ?? ()
#18 0x0000007fbffff558 in ?? ()
#19 0x000000000000001c in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000007fbffff795 in ?? ()
#22 0x0000000000000000 in ?? ()

Is that what you need?
I installed
compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm

What is compat-postgresql-libs-debuginfo?
Neither of those two RPMs are described in the RPM install PDF by Devrim
and Lamar.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Francisco Reyes (#3)
Re: Psql crashes with Segmentation fault on copy from

"Francisco Reyes" <lists@stringsutils.com> writes:

(gdb) bt
#0 0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
#1 0x000000364bf0e0ae in PQunescapeBytea () from /usr/lib64/libpq.so.5
#2 0x000000364bf0e230 in PQunescapeBytea () from /usr/lib64/libpq.so.5
#3 0x000000364bf0c09e in PQsendQuery () from /usr/lib64/libpq.so.5
#4 0x000000364bf0c788 in PQexec () from /usr/lib64/libpq.so.5
#5 0x0000000000406e95 in ?? ()
#6 0x0000000000409dfa in ?? ()
#7 0x000000000040408d in ?? ()

Is that what you need?

Well, it would be if it were right; but PQsendQuery doesn't call
PQunescapeBytea, so there's something wrong with the debug info.
(The fact that we're not seeing any parameter values is another
tip that it's not right. Apparently gdb doesn't think the debuginfo
matches at all.)

I installed
compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm

Do those *exactly* match the versions of the Postgres RPMs you're
using?

regards, tom lane

#5Francisco Reyes
lists@stringsutils.com
In reply to: Tom Lane (#4)
Re: Psql crashes with Segmentation fault on copy from

On 11:09 am 05/28/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I installed
compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm

Do those *exactly* match the versions of the Postgres RPMs you're
using?

I got them from the same directory as the rest of the 8.2.7 RPMs I
downloaded.

I am going to try uninstalling the RPMs and using the Source RPMS and
report back. There is no production data yet since I am working on testing
the machine and creating some benchmarks so I can redo the entire setup as
needed.

#6Francisco Reyes
lists@stringsutils.com
In reply to: Tom Lane (#4)
Re: Psql crashes with Segmentation fault on copy from

On 11:09 am 05/28/08 Tom Lane <tgl@sss.pgh.pa.us> wrote:

Re-send. Didn't do reply-all before.

Well, it would be if it were right; but PQsendQuery doesn't call
PQunescapeBytea, so there's something wrong with the debug info.

..

compat-postgresql-libs-debuginfo-3-2PGDG.rhel4.x86_64.rpm

That was the wrong lib.
Installed
postgresql-debuginfo-8.2.7-1PGDG.rhel4.x86_64.rpm and got
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182895355296 (LWP 5008)]
0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0000003cc31723e6 in memcpy () from /lib64/tls/libc.so.6
#1 0x0000002a955820ae in pqPutMsgBytes (buf=0x2a9860a010, len=2147483647,
conn=0x560a80) at fe-misc.c:475
#2 0x0000002a95582230 in pqPuts (
s=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY WIGGLY#52\t5411"...,
conn=0x560a80) at fe-misc.c:151
#3 0x0000002a9558009e in PQsendQuery (conn=0x560a80,
query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY WIGGLY#52\t5411"...)
at fe-exec.c:687
#4 0x0000002a95580788 in PQexec (conn=0x560a80,
query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY WIGGLY#52\t5411"...)
at fe-exec.c:1288
#5 0x0000000000406e95 in SendQuery (
query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY WIGGLY#52\t5411"...)
at common.c:878
#6 0x0000000000409dfa in MainLoop (source=0x5414f0) at mainloop.c:340
#7 0x000000000040408d in process_file (filename=0x541d70
"data/usb_t_60M.sql", single_txn=0 '\0') at command.c:1394
#8 0x00000000004057cd in exec_command (cmd=0x541bc0 "i",
scan_state=0x549b30, query_buf=0x549890) at command.c:546
#9 0x0000000000406286 in HandleSlashCmds (scan_state=0x549b30,
query_buf=0x549890) at command.c:92
#10 0x0000000000409f66 in MainLoop (source=0x541070) at mainloop.c:259
#11 0x000000000040408d in process_file (filename=0x560750
"bench-index-after.sql", single_txn=0 '\0') at command.c:1394
#12 0x00000000004057cd in exec_command (
cmd=0x569360 "Hxnh9aOwMviVVP9I=\t200803\t2008-03-10\t0\t15.54\t434240509886\t\tNU-YALE CLEANERS (BARD\t4225\t0\t11636644\n3MM9gwBGMHiyPGmUrqIEOOha1Ao=\t200803\t2008-03-11\t0\t739.95\t554172100100007\t\tUSAA P&C PREMIUM\t6300\t0\t183718"...,
scan_state=0x549c20, query_buf=0x54a480) at command.c:546
#13 0x0000000000406286 in HandleSlashCmds (scan_state=0x549c20,
query_buf=0x54a480) at command.c:92
#14 0x0000000000409f66 in MainLoop (source=0x3cc332fb00) at mainloop.c:259
#15 0x000000000040c687 in main (argc=80207, argv=0x0) at startup.c:367
#16 0x0000003cc311c3fb in __libc_start_main () from /lib64/tls/libc.so.6
#17 0x0000000000403d2a in _start ()
#18 0x0000007fbffff558 in ?? ()
#19 0x000000000000001c in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0x0000007fbffff795 in ?? ()
#22 0x0000000000000000 in ?? ()

#7Bruce Momjian
bruce@momjian.us
In reply to: Francisco Reyes (#6)
Re: Psql crashes with Segmentation fault on copy from

"Francisco Reyes" <lists@stringsutils.com> writes:

#1 0x0000002a955820ae in pqPutMsgBytes (buf=0x2a9860a010, len=2147483647,

So it's trying to execute an sql query that's MAXINT bytes long which is
pretty off-course.

#5 0x0000000000406e95 in SendQuery (
query=0x2a9860a010 "yjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t10.47\t08620667742MDF1\t\tMCDONALD'S F7742\t5814\t1003\t1187954\nyjbjK8WKLRHKavptOqlPD4QeI24=\t200803\t2008-03-11\t0\t16.99\t00614111869AWG1\t\tPIGGLY WIGGLY#52\t5411"...)
at common.c:878
#6 0x0000000000409dfa in MainLoop (source=0x5414f0) at mainloop.c:340
#7 0x000000000040408d in process_file (filename=0x541d70
"data/usb_t_60M.sql", single_txn=0 '\0') at command.c:1394
#8 0x00000000004057cd in exec_command (cmd=0x541bc0 "i",
scan_state=0x549b30, query_buf=0x549890) at command.c:546

Could you post the first couple lines of data/usb_t_60M.sql ? Does it really
have a COPY command at the beginning? Are you really doing \i
data/usb_t_60M.sql or were you trying to do a copy from this file?

#9 0x0000000000406286 in HandleSlashCmds (scan_state=0x549b30,
query_buf=0x549890) at command.c:92
#10 0x0000000000409f66 in MainLoop (source=0x541070) at mainloop.c:259
#11 0x000000000040408d in process_file (filename=0x560750
"bench-index-after.sql", single_txn=0 '\0') at command.c:1394
#12 0x00000000004057cd in exec_command (
cmd=0x569360 "Hxnh9aOwMviVVP9I=\t200803\t2008-03-10\t0\t15.54\t434240509886\t\tNU-YALE CLEANERS (BARD\t4225\t0\t11636644\n3MM9gwBGMHiyPGmUrqIEOOha1Ao=\t200803\t2008-03-11\t0\t739.95\t554172100100007\t\tUSAA P&C PREMIUM\t6300\t0\t183718"...,
scan_state=0x549c20, query_buf=0x54a480) at command.c:546

I'm a bit puzzled by this. The command being executed here seems to have been
overwritten by data from the later command -- before it was overwritten it was
apparently "\i bench-index-after.sql". It seems something with the
PsqlScanState is not being sufficient to make the lexer completely reentrant
here.

What version of flex is this built with?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#8Francisco Reyes
lists@stringsutils.com
In reply to: Bruce Momjian (#7)
Re: Psql crashes with Segmentation fault on copy from

On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:

Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying to do a copy from this file?

Argh..That's it.
When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a 'copy from' in the calling script.

Thanks!!!

Is there a slowdown by keeping the debug RPM?
Or should I only have it when trying to troubleshoot a problem?

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: Francisco Reyes (#8)
Re: Psql crashes with Segmentation fault on copy from

On Wed, May 28, 2008 at 03:42:47PM -0400, Francisco Reyes wrote:

Is there a slowdown by keeping the debug RPM?
Or should I only have it when trying to troubleshoot a problem?

Debug info costs nothing if its not being used.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Francisco Reyes (#8)
Re: Psql crashes with Segmentation fault on copy from

"Francisco Reyes" <lists@stringsutils.com> writes:

On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:

Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying to do a copy from this file?

Argh..That's it.
When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a 'copy from' in the calling script.

Hmm ... even so, it shouldn't have crashed, it should at worst have
given you an out-of-memory error message.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: Psql crashes with Segmentation fault on copy from

I wrote:

"Francisco Reyes" <lists@stringsutils.com> writes:

On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:

Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying to do a copy from this file?

Argh..That's it.
When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a 'copy from' in the calling script.

Hmm ... even so, it shouldn't have crashed, it should at worst have
given you an out-of-memory error message.

Hmm, I see the problem: the pqexpbuffer code maxes out at INT_MAX bytes
in the string buffer, which would be all right except that it has no
good way to report the error and so the input is just getting truncated
at that length. But then what happens is that pqPuts computes
strlen(s) + 1, which is still all right because size_t is 64 bits
on this machine, and passes that to pqPutMsgBytes, which computes
conn->outMsgEnd + len *and smashes that to int*. So the value passed
to pqCheckOutBufferSpace is negative, and the latter falsely concludes
there's enough space in the buffer, and then the memcpy goes boom.

A minimum solution would be to make pqCheckOutBufferSpace deal in
size_t not int, but I have a feeling there are a lot of other similar
gotchas when running this code on a 64-bit machine. We use int
arithmetic an awful lot for stuff that probably should be size_t
or ssize_t ...

regards, tom lane

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: Psql crashes with Segmentation fault on copy from

Tom Lane wrote:

I wrote:

"Francisco Reyes" <lists@stringsutils.com> writes:

On 3:09 pm 05/28/08 Gregory Stark <stark@enterprisedb.com> wrote:

Does it really have a COPY command at the beginning? Are you really doing >\i data/usb_t_60M.sql or were you trying to do a copy from this file?

Argh..That's it.
When I re-organized the scripts I must have taken the copy command from the top of the data file and did not put a 'copy from' in the calling script.

Hmm ... even so, it shouldn't have crashed, it should at worst have
given you an out-of-memory error message.

Hmm, I see the problem: the pqexpbuffer code maxes out at INT_MAX bytes
in the string buffer, which would be all right except that it has no
good way to report the error and so the input is just getting truncated
at that length. But then what happens is that pqPuts computes
strlen(s) + 1, which is still all right because size_t is 64 bits
on this machine, and passes that to pqPutMsgBytes, which computes
conn->outMsgEnd + len *and smashes that to int*. So the value passed
to pqCheckOutBufferSpace is negative, and the latter falsely concludes
there's enough space in the buffer, and then the memcpy goes boom.

A minimum solution would be to make pqCheckOutBufferSpace deal in
size_t not int, but I have a feeling there are a lot of other similar
gotchas when running this code on a 64-bit machine. We use int
arithmetic an awful lot for stuff that probably should be size_t
or ssize_t ...

I assume this is not a TODO item.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +