GiST: Bad newtup On Exit From gistSplit() ?

Started by Itai Zukermanabout 23 years ago8 messageshackers
Jump to latest
#1Itai Zukerman
zukerman@math-hat.com

Just before the return from gistSplit() I see this:

(gdb) p (*newtup)[0]
$147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136}
(gdb) p (*newtup)[1]
$148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575}
(gdb) p nlen
$149 = 2

(*newtup)[1] doesn't look right, correct? Everything *seems* OK. I
need to go back and trace gistFormTuple() and ItemPointerSet(), maybe?

--
Itai Zukerman <http://www.math-hat.com/~zukerman/&gt;

#2Itai Zukerman
zukerman@math-hat.com
In reply to: Itai Zukerman (#1)
Re: GiST: Bad newtup On Exit From gistSplit() ?

(gdb) p (*newtup)[0]
$147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136}
(gdb) p (*newtup)[1]
$148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575}
(gdb) p nlen
$149 = 2

(*newtup)[1] doesn't look right, correct?

Sorry, my fault, that should've been *(newtup[1]), which indeed looks
fine.

I'm still not having any luck tracking down my SEGV. If anyone is
interested in debugging or even just reproducing this, I'd be happy to
post or E-mail you my code. It looks like shortly after calling
gistSplit() there's maybe a double-free() somewhere? Here's a typical
session; the NOTICEs are generated by my picksplit function:

$ gdb ./postgres
(gdb) run -D /var/lib/postgres/data test <x

[... output as many tuples are inserted, and finally ...]

NOTICE: g_sig_picksplit: <--
NOTICE: g_sig_picksplit: left ( 9) *.*.***..*..**....*....*.***.**.*..*.*.****....*..**..**.....*****...***..*.*.*.***..*.**..**.*..*.**.*..**.***.*...*.**.**.......*......***....**......*.***.*......*.*..**....**.*..*..**.*..*.*.***.*.*.**.*....*.**..*...*..*.*****.*.*...*..*.*...*..**..*..***....***.*.*.*..*...**.......*....**.*.*.*..*.*..***.*.****.....*.....**..**...*.*....*..**..**.***...*......*..*....*.*.*.....*.*..*....**......****.*.***.*........***..*....*.*....**.*..***.*.*...*..**.*...*...*...*.......*.*.....**......*............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
NOTICE: g_sig_picksplit: right ( 50) .**..********************************************************.*.********.****************************************************************.********.***************************************.**********************************.********************.*.*************.********.****************************.***.**.*******************.*.************************************************.******************************.*****.*******************************************************************************.********............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

Program received signal SIGSEGV, Segmentation fault.
0x4028e2ff in mallopt () from /lib/libc.so.6
(gdb) bt
#0 0x4028e2ff in mallopt () from /lib/libc.so.6
#1 0x4028d2ca in free () from /lib/libc.so.6
#2 0x081d05e5 in AllocSetDelete (context=0x82ae098) at aset.c:460
#3 0x081d0f03 in MemoryContextDelete (context=0x82ae098) at mcxt.c:188
#4 0x081d0f36 in MemoryContextDeleteChildren (context=0x82ae010) at mcxt.c:207
#5 0x081d0e8f in MemoryContextDelete (context=0x82ae010) at mcxt.c:161
#6 0x08094070 in AtCommit_Memory () at xact.c:685
#7 0x080943e6 in CommitTransaction () at xact.c:1033
#8 0x080946b6 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304
#9 0x08160966 in finish_xact_command (forceCommit=0 '\0') at postgres.c:978
#10 0x081607ef in pg_exec_query_string (query_string=0x82e4570, dest=Debug, parse_context=0x82adf88) at postgres.c:897
#11 0x081619f0 in PostgresMain (argc=4, argv=0x82960c0, username=0x82966c0 "postgres") at postgres.c:2013
#12 0x08110625 in main (argc=4, argv=0xbffffc14) at main.c:235

--
Itai Zukerman <http://www.math-hat.com/~zukerman/&gt;

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itai Zukerman (#2)
Re: GiST: Bad newtup On Exit From gistSplit() ?

Itai Zukerman <zukerman@math-hat.com> writes:

I'm still not having any luck tracking down my SEGV. If anyone is
interested in debugging or even just reproducing this, I'd be happy to
post or E-mail you my code. It looks like shortly after calling
gistSplit() there's maybe a double-free() somewhere?

It might help to build with --enable-cassert, if you didn't already.

regards, tom lane

#4Itai Zukerman
zukerman@math-hat.com
In reply to: Tom Lane (#3)
Re: GiST: Bad newtup On Exit From gistSplit() ?

I'm still not having any luck tracking down my SEGV. If anyone is
interested in debugging or even just reproducing this, I'd be happy to
post or E-mail you my code. It looks like shortly after calling
gistSplit() there's maybe a double-free() somewhere?

It might help to build with --enable-cassert, if you didn't already.

After recompiling with --enable-cassert, nserting into an empty table,
I get:

TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113)

Program received signal SIGABRT, Aborted.
0x400eda51 in kill () from /lib/libc.so.6
(gdb) bt
#0 0x400eda51 in kill () from /lib/libc.so.6
#1 0x400ed872 in raise () from /lib/libc.so.6
#2 0x400ee986 in abort () from /lib/libc.so.6
#3 0x081dc4d7 in ExceptionalCondition (conditionName=0x8262d92 "!((VfdCache[0].fd == (-1)))", errorType=0x8262a13 "FailedAssertion", fileName=0x8262a0e "fd.c",
lineNumber=1113) at assert.c:46
#4 0x0816a085 in AtEOXact_Files () at fd.c:1113
#5 0x080a1921 in CommitTransaction () at xact.c:1031
#6 0x080a1c85 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304
#7 0x08177aaa in finish_xact_command (forceCommit=0 '\0') at postgres.c:978
#8 0x08177933 in pg_exec_query_string (query_string=0x8314054, dest=Debug, parse_context=0x82ddb18) at postgres.c:897
#9 0x08178ba7 in PostgresMain (argc=4, argv=0x82c5e00, username=0x82c6400 "postgres") at postgres.c:2013
#10 0x081233b9 in main (argc=4, argv=0xbffffc14) at main.c:235

But if I restart postgres and start with some data in the table, I
get, as before:

Program received signal SIGSEGV, Segmentation fault.
0x401312ff in mallopt () from /lib/libc.so.6
(gdb) bt
#0 0x401312ff in mallopt () from /lib/libc.so.6
#1 0x401302ca in free () from /lib/libc.so.6
#2 0x081ea205 in AllocSetDelete (context=0x82efffc) at aset.c:460
#3 0x081eb107 in MemoryContextDelete (context=0x82efffc) at mcxt.c:188
#4 0x081eb172 in MemoryContextDeleteChildren (context=0x82eff70) at mcxt.c:207
#5 0x081eb093 in MemoryContextDelete (context=0x82eff70) at mcxt.c:161
#6 0x080a1587 in AtCommit_Memory () at xact.c:685
#7 0x080a192b in CommitTransaction () at xact.c:1033
#8 0x080a1c85 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304
#9 0x08177aaa in finish_xact_command (forceCommit=0 '\0') at postgres.c:978
#10 0x08177933 in pg_exec_query_string (query_string=0x8313fcc, dest=Debug, parse_context=0x82efee4) at postgres.c:897
#11 0x08178ba7 in PostgresMain (argc=4, argv=0x82c5e00, username=0x82c6400 "postgres") at postgres.c:2013
#12 0x081233b9 in main (argc=4, argv=0xbffffc14) at main.c:235

Since this only happens after a call to gistSplit(), I still suspect
GiST. But I've been going over the GiST code the past 3 days and I
still can't find anything substantially wrong.

--
Itai Zukerman <http://www.math-hat.com/~zukerman/&gt;

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itai Zukerman (#4)
Re: GiST: Bad newtup On Exit From gistSplit() ?

Itai Zukerman <zukerman@math-hat.com> writes:

After recompiling with --enable-cassert, nserting into an empty table,
I get:

TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113)

Begins to look like a plain old wild store: ain't *nothing* should ever
write into VfdCache[0].fd. If that's repeatable, you could try homing
in on the place that is clobbering that variable by stepping through
major routines with gdb.

regards, tom lane

#6Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#5)
Re: GiST: Bad newtup On Exit From gistSplit() ?

Tom Lane wrote:

Itai Zukerman <zukerman@math-hat.com> writes:

After recompiling with --enable-cassert, nserting into an empty table,
I get:

TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113)

Begins to look like a plain old wild store: ain't *nothing* should ever
write into VfdCache[0].fd. If that's repeatable, you could try homing
in on the place that is clobbering that variable by stepping through
major routines with gdb.

If this is a system that supports hardware watchpoints (like Linux on
x86), then you should be able to do a "watch VfdCache[0].fd". In
fact, if you know the specific value it's going to be set to, you can
do a conditional watchpoint: "watch VfdCache[0].fd if VfdCache[0] ==
<value>" (you can set the condition to be pretty much anything).

The program will stop on the statement immediately following the one
that scribbled on the area you're watching when the condition you
specify is met. Note, though, that the condition is tested *after*
the modification happens.

--
Kevin Brown kevin@sysexperts.com

#7Itai Zukerman
zukerman@math-hat.com
In reply to: Kevin Brown (#6)
Re: GiST: Bad newtup On Exit From gistSplit() ?

TRAP: FailedAssertion("!((VfdCache[0].fd == (-1)))", File: "fd.c", Line: 1113)

If this is a system that supports hardware watchpoints (like Linux on
x86), then you should be able to do a "watch VfdCache[0].fd".

That's exactly what I did, and tracked the problem down to a typo in
my code *blush*. Thanks for all the great suggestions!

--
Itai Zukerman <http://www.math-hat.com/~zukerman/&gt;

#8Teodor Sigaev
teodor@sigaev.ru
In reply to: Itai Zukerman (#2)
Re: GiST: Bad newtup On Exit From gistSplit() ?

Send me 'x' file and I'll try to help you.

Itai Zukerman wrote:

(gdb) p (*newtup)[0]
$147 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 34}, ip_posid = 1}, t_info = 136}
(gdb) p (*newtup)[1]
$148 = {t_tid = {ip_blkid = {bi_hi = 65510, bi_lo = 65535}, ip_posid = 65535}, t_info = 24575}
(gdb) p nlen
$149 = 2

(*newtup)[1] doesn't look right, correct?

Sorry, my fault, that should've been *(newtup[1]), which indeed looks
fine.

I'm still not having any luck tracking down my SEGV. If anyone is
interested in debugging or even just reproducing this, I'd be happy to
post or E-mail you my code. It looks like shortly after calling
gistSplit() there's maybe a double-free() somewhere? Here's a typical
session; the NOTICEs are generated by my picksplit function:

$ gdb ./postgres
(gdb) run -D /var/lib/postgres/data test <x

[... output as many tuples are inserted, and finally ...]

NOTICE: g_sig_picksplit: <--
NOTICE: g_sig_picksplit: left ( 9) *.*.***..*..**....*....*.***.**.*..*.*.****....*..**..**.....*****...***..*.*.*.***..*.**..**.*..*.**.*..**.***.*...*.**.**.......*......***....**......*.***.*......*.*..**....**.*..*..**.*..*.*.***.*.*.**.*....*.**..*...*..*.*****.*.*...*..*.*...*..**..*..***....***.*.*.*..*...**.......*....**.*.*.*..*.*..***.*.****.....*.....**..**...*.*....*..**..**.***...*......*..*....*.*.*.....*.*..*....**......****.*.***.*........***..*....*.*....**.*..***.*.*...*..**.*...*...*...*.......*.*.....**......*.................................................................................................................................................................................................................................................................................................................................................................................................................................................................

...........................................................................

NOTICE: g_sig_picksplit: right ( 50) .**..********************************************************.*.********.****************************************************************.********.***************************************.**********************************.********************.*.*************.********.****************************.***.**.*******************.*.************************************************.******************************.*****.*******************************************************************************.********.................................................................................................................................................................................................................................................................................................................................................................................................................................................................

...........................................................................

Program received signal SIGSEGV, Segmentation fault.
0x4028e2ff in mallopt () from /lib/libc.so.6
(gdb) bt
#0 0x4028e2ff in mallopt () from /lib/libc.so.6
#1 0x4028d2ca in free () from /lib/libc.so.6
#2 0x081d05e5 in AllocSetDelete (context=0x82ae098) at aset.c:460
#3 0x081d0f03 in MemoryContextDelete (context=0x82ae098) at mcxt.c:188
#4 0x081d0f36 in MemoryContextDeleteChildren (context=0x82ae010) at mcxt.c:207
#5 0x081d0e8f in MemoryContextDelete (context=0x82ae010) at mcxt.c:161
#6 0x08094070 in AtCommit_Memory () at xact.c:685
#7 0x080943e6 in CommitTransaction () at xact.c:1033
#8 0x080946b6 in CommitTransactionCommand (forceCommit=0 '\0') at xact.c:1304
#9 0x08160966 in finish_xact_command (forceCommit=0 '\0') at postgres.c:978
#10 0x081607ef in pg_exec_query_string (query_string=0x82e4570, dest=Debug, parse_context=0x82adf88) at postgres.c:897
#11 0x081619f0 in PostgresMain (argc=4, argv=0x82960c0, username=0x82966c0 "postgres") at postgres.c:2013
#12 0x08110625 in main (argc=4, argv=0xbffffc14) at main.c:235

--
Teodor Sigaev
teodor@stack.net