BUG #3752: query yields "could not find block containing chunk", then server crashes
The following bug has been logged online:
Bug reference: 3752
Logged by: Michael Charnoky
Email address: noky@nextbus.com
PostgreSQL version: 8.3beta2
Operating system: Linux (Fedora Core 3) 2.6.17
Description: query yields "could not find block containing chunk",
then server crashes
Details:
I installed PG8.3beta2 and created a db instance using pg_restore. (The
dump was created using the pg8.3beta2 pg_dump util, from a db on a pg8.1
server). Data restored with no errors, later our app encountered an sql
error while querying data in the db. Here's the relevant log snippet:
2007-11-15 15:38:03.880 PST: ERROR: could not find block containing chunk
0x902fb98
2007-11-15 15:38:03.880 PST: STATEMENT: SELECT path_tag, dayset_tag,
time2secs(ts_endtime), segtimes
FROM pathtimes where rev=(select rev from projects) ORDER BY
time2secs(ts_endtime);
2007-11-15 15:38:29.821 PST: LOG: server process (PID 17777) was terminated
by signal 11: Segmentation fault
2007-11-15 15:38:29.821 PST: LOG: terminating any other active server
processes
2007-11-15 15:38:29.825 PST: LOG: all server processes terminated;
reinitializing
2007-11-15 15:38:29.887 PST: LOG: database system was interrupted; last
known up at 2007-11-15 15:28:27 PST
2007-11-15 15:38:29.887 PST: LOG: database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:30.044 PST: FATAL: the database system is in recovery
mode
2007-11-15 15:38:30.285 PST: LOG: record with zero length at 0/7CB47A08
2007-11-15 15:38:30.286 PST: LOG: redo is not required
2007-11-15 15:38:30.714 PST: LOG: autovacuum launcher started
2007-11-15 15:38:30.715 PST: LOG: database system is ready to accept
connections
*** glibc detected *** free(): invalid next size (normal): 0x09045378 ***
2007-11-15 15:38:41.463 PST: LOG: server process (PID 17811) was terminated
by signal 6: Aborted
2007-11-15 15:38:41.463 PST: LOG: terminating any other active server
processes
2007-11-15 15:38:41.464 PST: LOG: all server processes terminated;
reinitializing
2007-11-15 15:38:41.516 PST: LOG: database system was interrupted; last
known up at 2007-11-15 15:38:30 PST
2007-11-15 15:38:41.516 PST: LOG: database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:41.683 PST: LOG: record with zero length at 0/7CB47A48
2007-11-15 15:38:41.683 PST: LOG: redo is not required
2007-11-15 15:38:41.702 PST: FATAL: the database system is in recovery
mode
2007-11-15 15:38:41.806 PST: LOG: autovacuum launcher started
2007-11-15 15:38:41.807 PST: LOG: database system is ready to accept
connections
Do you have a core file? Can you provide stack trace output?
Thanks
Michael Charnoky wrote:
Show quoted text
The following bug has been logged online:
Bug reference: 3752
Logged by: Michael Charnoky
Email address: noky@nextbus.com
PostgreSQL version: 8.3beta2
Operating system: Linux (Fedora Core 3) 2.6.17
Description: query yields "could not find block containing chunk",
then server crashes
Details:I installed PG8.3beta2 and created a db instance using pg_restore. (The
dump was created using the pg8.3beta2 pg_dump util, from a db on a pg8.1
server). Data restored with no errors, later our app encountered an sql
error while querying data in the db. Here's the relevant log snippet:2007-11-15 15:38:03.880 PST: ERROR: could not find block containing chunk
0x902fb98
2007-11-15 15:38:03.880 PST: STATEMENT: SELECT path_tag, dayset_tag,
time2secs(ts_endtime), segtimes
FROM pathtimes where rev=(select rev from projects) ORDER BY
time2secs(ts_endtime);
2007-11-15 15:38:29.821 PST: LOG: server process (PID 17777) was terminated
by signal 11: Segmentation fault
2007-11-15 15:38:29.821 PST: LOG: terminating any other active server
processes
2007-11-15 15:38:29.825 PST: LOG: all server processes terminated;
reinitializing
2007-11-15 15:38:29.887 PST: LOG: database system was interrupted; last
known up at 2007-11-15 15:28:27 PST
2007-11-15 15:38:29.887 PST: LOG: database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:30.044 PST: FATAL: the database system is in recovery
mode
2007-11-15 15:38:30.285 PST: LOG: record with zero length at 0/7CB47A08
2007-11-15 15:38:30.286 PST: LOG: redo is not required
2007-11-15 15:38:30.714 PST: LOG: autovacuum launcher started
2007-11-15 15:38:30.715 PST: LOG: database system is ready to accept
connections
*** glibc detected *** free(): invalid next size (normal): 0x09045378 ***
2007-11-15 15:38:41.463 PST: LOG: server process (PID 17811) was terminated
by signal 6: Aborted
2007-11-15 15:38:41.463 PST: LOG: terminating any other active server
processes
2007-11-15 15:38:41.464 PST: LOG: all server processes terminated;
reinitializing
2007-11-15 15:38:41.516 PST: LOG: database system was interrupted; last
known up at 2007-11-15 15:38:30 PST
2007-11-15 15:38:41.516 PST: LOG: database system was not properly shut
down; automatic recovery in progress
2007-11-15 15:38:41.683 PST: LOG: record with zero length at 0/7CB47A48
2007-11-15 15:38:41.683 PST: LOG: redo is not required
2007-11-15 15:38:41.702 PST: FATAL: the database system is in recovery
mode
2007-11-15 15:38:41.806 PST: LOG: autovacuum launcher started
2007-11-15 15:38:41.807 PST: LOG: database system is ready to accept
connections---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
Michael Charnoky wrote:
<snip>
2007-11-15 15:38:03.880 PST: ERROR: could not find block containing chunk
0x902fb98
This message appears in AllocSetFree or AllocSetRealloc function in
aset.c source. In both function it means that defined context does not
contain memory block. By my opinion there should be two more probable
scenarios:
1) memory block does not exist -> for AllocSetFree it means e.g. double
free or for AllocSetRealloc it means that somebody want to realloc
memory which was already freed.
2) memory is still allocated but in different context. However, palloc
and pfree should control it.
By my opinion it is double free problem, but without stack trace or
reproduction scenario it is difficult to find it.
Zdenek
"Michael Charnoky" <noky@nextbus.com> writes:
2007-11-15 15:38:03.880 PST: ERROR: could not find block containing chunk
0x902fb98
We can't do much about this without a self-contained test case.
2007-11-15 15:38:03.880 PST: STATEMENT: SELECT path_tag, dayset_tag,
time2secs(ts_endtime), segtimes
FROM pathtimes where rev=(select rev from projects) ORDER BY
time2secs(ts_endtime);
Is this query using any complex views? If so, I'd assume the bug is
somehow triggered by that, and try to extract a test case using the
view definition(s).
regards, tom lane
Just forwarding this info along as Zdenek requested...
Turns out this problem is not a bug in pg8.3, it was a problem with our
custom data type. I have since dropped the custom data type and am now
using standard pg float4 arrays. Did the dump and restore, and our app
works just fine, no crash when the query is run.
BTW- PG8.3 seriously rocks! We've got some large tables that had very
poor performance in PG8.1... things are really snappy now, HOT usage
really helps our app (as shown by the handy pg_stat_all_tables).
Mike
Zdenek Kotala wrote:
Show quoted text
Mike Charnoky wrote:
It seems this problem has to do with a custom data type we are using. I
am working on eliminating this custom data type, as it is becoming too
much of a pain to support (it is basically float4[]). If the problem
persists after the data type conversion, I will follow up. Otherwise, I
think this was an error in our custom type code (maybe corruption during
dump/reload)Thanks for update.
Would the stack trace still be useful? Where would I find the dump
file? I didn't see anything...If you are sure, that it is in your data type implementation then
probably not. You can find core file usually in postgres data directory
if you have core file generation enabled by ulimit command. You can get
stack trace by gdb.Zdenek
Mike
Zdenek Kotala wrote:
Michael Charnoky wrote:
<snip>
2007-11-15 15:38:03.880 PST: ERROR: could not find block containing
chunk
0x902fb98This message appears in AllocSetFree or AllocSetRealloc function in
aset.c source. In both function it means that defined context does not
contain memory block. By my opinion there should be two more probable
scenarios:1) memory block does not exist -> for AllocSetFree it means e.g. double
free or for AllocSetRealloc it means that somebody want to realloc
memory which was already freed.2) memory is still allocated but in different context. However, palloc
and pfree should control it.By my opinion it is double free problem, but without stack trace or
reproduction scenario it is difficult to find it.Zdenek
Import Notes
Reply to msg id not found: 4741A1FD.30801@sun.com