BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

Started by 中嶋信二about 16 years ago6 messagesbugs

sinakaj@jops.co.jp

about 16 years ago

The following bug has been logged online:

Bug reference: 5507
Logged by: Shinji Nakajima
Email address: sinakaj@jops.co.jp
PostgreSQL version: 8.3.8
Operating system: Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Description: missing chunk number 0 for toast value XXXXX in
pg_toast_XXXXX
Details:

Error message called "missing chunk number" occurred when I did select of
the specific column of the specific table.
I did not update this record, but was in such a condition suddenly.
There seems to be the person that a similar phenomenon
occurs."http://www.ruizs.org/archives/138"
I delete a record, and the system restores, but prime cause is unknown.
Will this be a bug of the databases?

tgl@sss.pgh.pa.us

about 16 years ago

In reply to: 中嶋信二 (#1)

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

"Shinji Nakajima" <sinakaj@jops.co.jp> writes:

Error message called "missing chunk number" occurred when I did select of
the specific column of the specific table.

This might indicate that the toast table's index was corrupted.

I delete a record, and the system restores, but prime cause is unknown.
Will this be a bug of the databases?

Perhaps, but there's not a lot we can do without a lot more information...

regards, tom lane

bruce@momjian.us

about 16 years ago

In reply to: 中嶋信二 (#1)

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

On Mon, Jun 14, 2010 at 11:28 AM, Shinji Nakajima <sinakaj@jops.co.jp> wrote:

PostgreSQL version: 8.3.8
Description: missing chunk number 0 for toast value XXXXX in
pg_toast_XXXXX

I delete a record, and the system restores, but prime cause is unknown.
Will this be a bug of the databases?

Probably. Or possibly bad hardware. Assuming you didn't manually go in
and delete that record from the toast table, which would be a strange
thing to do.

The problem is it could have happened a long time ago and you just
discovered it now. Have you had any other significant events on this
machine? Any system crashes or power failures? Any drive crashes or
signs of bad memory?

In the postgres logs are there any instances of unusual error messages
or warnings?

--
greg

Kevin.Grittner@wicourts.gov

about 16 years ago

In reply to: 中嶋信二 (#1)

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

"Shinji Nakajima" <sinakaj@jops.co.jp> wrote:

Error message called "missing chunk number" occurred when I did
select of the specific column of the specific table.

I delete a record, and the system restores, but prime cause is
unknown. Will this be a bug of the databases?

Errors like this are usually caused by hardware problems. I think
the second-most common cause is running in a configuration with
fsync = off or full_page_writes = off, and suffering a power outage
or OS crash. I would recommend that you check your configuration
for these unsafe settings and schedule a check of your hardware and
drivers.

-Kevin

sinakaj@jops.co.jp

about 16 years ago

In reply to: Bruce Momjian (#3)

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

Thank you for a reply, everybody.

On Mon, Jun 14, 2010 at 11:28 AM, Shinji Nakajima <sinakaj@jops.co.jp>
wrote:

PostgreSQL version: 8.3.8
Description: missing chunk number 0 for toast value XXXXX in
pg_toast_XXXXX

I delete a record, and the system restores, but prime cause is unknown.
Will this be a bug of the databases?

Probably. Or possibly bad hardware. Assuming you didn't manually go in
and delete that record from the toast table, which would be a strange
thing to do.

The table restored.
However, there were tables when I checked the other tables.
Because primary key repeated in the same table,
similar error message was displayed when I did select entirely.

The problem is it could have happened a long time ago and you just
discovered it now. Have you had any other significant events on this
machine? Any system crashes or power failures? Any drive crashes or
signs of bad memory?

postgres is duplicated.
Red Hat Cluster Suite watches a process of each service.
PGDATA shares it in strage.

There is the thing that a wait server started.
A cluster began the change disposal of servers.
Because A cluster judged a state of postgres to be a stop.

I do not understand why duplex system to refer to same PGDATA was able to start.
I was able to surely carry out SQL by a psql command in duplex system.
I did not output log in those days.

In the postgres logs are there any instances of unusual error messages
or warnings?
--
greg

It continues, and an error occurs.
"could not read block 17 of relation 1663/16872/2840: read only 0 of 8192 bytes"

A data file seems to be broken...

Two postgres that PGDATA was shared will have started
why if it was thought that it was caused by double start.
Is there such a precedent?
Does a data file lead to the cause that failed?

Regards,
Nakajima

Kevin.Grittner@wicourts.gov

about 16 years ago

In reply to: 中嶋信二 (#5)

Re: BUG #5507: missing chunk number 0 for toast value XXXXX in pg_toast_XXXXX

中嶋信二<sinakaj@jops.co.jp> wrote:

postgres is duplicated.
Red Hat Cluster Suite watches a process of each service.
PGDATA shares it in strage.

There is the thing that a wait server started.
A cluster began the change disposal of servers.
Because A cluster judged a state of postgres to be a stop.

I do not understand why duplex system to refer to same PGDATA was
able to start.
I was able to surely carry out SQL by a psql command in duplex
system.
I did not output log in those days.

Two postgres that PGDATA was shared will have started
why if it was thought that it was caused by double start.
Is there such a precedent?
Does a data file lead to the cause that failed?

I'm not sure I totally understand, but it sounds like you had two
postmasters running against a single data directory. If so, that
could cause all kinds of corruption. It's hard to see how that
could happen unless you deleted a PostgreSQL data directory, or at
least the postmaster.pid file, while an instance was running.

I would start by capturing "ps auxf" output, to be able to
understand what postgres processes were running and when they
started. Then I would probably make sure they all got stopped.
Then I would be seriously looking at restoring from backup, unless
this was a development database which could just be recreated from
scratch.

-Kevin