BUG #1800: "unexpected chunk number" during pg_dump

Started by Aaron Harshover 20 years ago6 messagesbugs

ajh@rentrak.com

over 20 years ago

The following bug has been logged online:

Bug reference: 1800
Logged by: Aaron Harsh
Email address: ajh@rentrak.com
PostgreSQL version: 7.4.6
Operating system: RedHat ES release 3, x86_64
Description: "unexpected chunk number" during pg_dump
Details:

Our regular pg_dump aborted this afternoon with this output:

pg_dump: ERROR: unexpected chunk number 0 (expected 1) for toast value
4294879152
pg_dump: SQL command to dump the contents of table "dataset_cache" failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR: unexpected chunk number 0
(expected 1) for toast value 4294879152
pg_dump: The command was: COPY public.dataset_cache (checksum, version_no,
bind_params, sql_statement, serialized_value, date_created) TO stdout;
pg_dump: *** aborted because of error

We saw the same message when we tried to cluster the effective table. The
problem went away after truncating the table.

I've searched the pgsql-bugs archives and found reports of this problem, but
haven't seen a solution. Is there a solution to keep this from happening in
the future? (Version upgrade maybe?)

Alvaro Herrera

alvherre@2ndquadrant.com

over 20 years ago

In reply to: Aaron Harsh (#1)

Re: BUG #1800: "unexpected chunk number" during pg_dump

On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:

pg_dump: ERROR: unexpected chunk number 0 (expected 1) for toast value
4294879152
pg_dump: SQL command to dump the contents of table "dataset_cache" failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR: unexpected chunk number 0
(expected 1) for toast value 4294879152
pg_dump: The command was: COPY public.dataset_cache (checksum, version_no,
bind_params, sql_statement, serialized_value, date_created) TO stdout;
pg_dump: *** aborted because of error

We saw the same message when we tried to cluster the effective table. The
problem went away after truncating the table.

I've searched the pgsql-bugs archives and found reports of this problem, but
haven't seen a solution. Is there a solution to keep this from happening in
the future? (Version upgrade maybe?)

Looks very much like the table was corrupted. Maybe you should try to
test your RAM and disks. Not sure how to do that on x86-64 though,
unless the test utility at www.memtest86.com has been ported to it.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La naturaleza, tan frï¿½gil, tan expuesta a la muerte... y tan viva"

Oliver Jowett

oliver@opencloud.com

over 20 years ago

In reply to: Alvaro Herrera (#2)

Re: BUG #1800: "unexpected chunk number" during pg_dump

Alvaro Herrera wrote:

Looks very much like the table was corrupted. Maybe you should try to
test your RAM and disks. Not sure how to do that on x86-64 though,
unless the test utility at www.memtest86.com has been ported to it.

x86-64 systems will still boot and run 32-bit code fine (although
obviously memtest86 isn't going to test memory it can't address in
32-bit mode)

-O

Aaron Harsh

ajh@rentrak.com

over 20 years ago

In reply to: Oliver Jowett (#3)

Re: BUG #1800: "unexpected chunk number" during pg_dump

Alvaro Herrera <alvherre@alvh.no-ip.org> 08/10/05 9:03 AM >>>

On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:

pg_dump: ERROR: unexpected chunk number 0 (expected 1) for toast value
...

Looks very much like the table was corrupted. Maybe you should try to
test your RAM and disks. Not sure how to do that on x86-64 though,
unless the test utility at www.memtest86.com has been ported to it.

The server is running off of ECC RAM on a RAID-10 set, so a one-off disk/RAM failure seems unlikely. The server had been running beautifully for 6 months prior to this error, and hasn't been evidencing the problem since, so it seems unlikely that this is due to a bad DIMM or RAID controller.

The timing might be a coincidence, but this error happened within a day of our OID counter wrapping around back to 0. (Although Tom Lane mentioned in pgsql-general that he was inclined to consider the timing a coincidence).

--
Aaron Harsh
ajh@rentrak.com
503-284-7581 x347

Import Notes

Resolved by subject fallback

Alvaro Herrera

alvherre@2ndquadrant.com

over 20 years ago

In reply to: Aaron Harsh (#4)

Re: BUG #1800: "unexpected chunk number" during pg_dump

On Wed, Aug 10, 2005 at 06:07:24PM -0700, Aaron Harsh wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> 08/10/05 9:03 AM >>>

On Mon, Aug 01, 2005 at 06:02:30AM +0100, Aaron Harsh wrote:

pg_dump: ERROR: unexpected chunk number 0 (expected 1) for toast value
...

Looks very much like the table was corrupted. Maybe you should try to
test your RAM and disks. Not sure how to do that on x86-64 though,
unless the test utility at www.memtest86.com has been ported to it.

The server is running off of ECC RAM on a RAID-10 set, so a one-off
disk/RAM failure seems unlikely. The server had been running
beautifully for 6 months prior to this error, and hasn't been
evidencing the problem since, so it seems unlikely that this is due to
a bad DIMM or RAID controller.

The timing might be a coincidence, but this error happened within a
day of our OID counter wrapping around back to 0. (Although Tom Lane
mentioned in pgsql-general that he was inclined to consider the timing
a coincidence).

Not sure what else to attribute the failure to then. But I should point
out that Oid normally wraps to FirstNormalObjectId (known as
BootstrapObjectIdData on previous sources), which is 16384, not 0.

Anyway I was originally thinking the problem data was 4294879152
(0xFFFEA7B0), not the 0. Have you tried to manually extract the data
from the dataset_cache table? You could try figuring out what page
contains the bad data, and manually peek into it using pg_filedump.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"

Aaron Harsh

ajh@rentrak.com

over 20 years ago

In reply to: Alvaro Herrera (#5)

Re: BUG #1800: "unexpected chunk number" during pg_dump

Alvaro Herrera <alvherre@alvh.no-ip.org> 08/11/05 9:52 AM >>>
Anyway I was originally thinking the problem data was 4294879152
(0xFFFEA7B0), not the 0. Have you tried to manually extract the data
from the dataset_cache table? You could try figuring out what page
contains the bad data, and manually peek into it using pg_filedump.

Unfortunately, the table doesn't show any problems now (I truncated it after the pg_dump failed)and so there's not a lot of further detail I can give you. I suppose this means that we'll have to wait until such time as the problem shows up again before we can continue.

Thanks for your help.

--
Aaron Harsh
ajh@rentrak.com
503-284-7581 x347

Import Notes

Resolved by subject fallback