random system table corruption ...

Started by Hans-Jürgen Schönigalmost 21 years ago7 messageshackers

postgres@cybertec.at

almost 21 years ago

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

my question is: are there any options to implement something which makes
system tables more robust? the problem is: the described error happens
only once i an while and cannot be reproduced. maybe there is a way to
add some more sanity checks before the page is actually written.

any suggestions?

best regards,

hans

--
Cybertec Geschwinde & Schï¿½nig GmbH
Schï¿½ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Martijn van Oosterhout

kleptog@svana.org

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: random system table corruption ...

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

Near as I can tell, the only times pages are zeroed out is if
zero_damaged_pages is set (destroying the evidence) or during WAL
recovery.

my question is: are there any options to implement something which makes
system tables more robust? the problem is: the described error happens
only once i an while and cannot be reproduced. maybe there is a way to
add some more sanity checks before the page is actually written.

Well, the most common causes are dodgy memory. Other than that I guess
you could arrange for bgwriter to check the pages it is writing. I
imagine it already does check the header, checking the data requires
knowledge about the actual table and attributes. And about the only
thing that says "I'm broken" is a varlena value with a long value.

As they say, the only thing sure would be to have a backup. the only
thing I can imagine being really useful would be a restore mode where
you feed it the schema so it can reconstruct the pg_class and
pg_attribute just enough for you to dump it to reconstruct
everything...

You know, VACUUM FREEZE BACKUP on pg_catalog, physically copy the
datafiles and offer the option to blat your catalog with an old one...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: random system table corruption ...

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

That sounds to me like a hardware problem --- disk or disk controller
momentarily writing zeroes instead of what it should write. Have you
seen this on more than one physical machine? Do you have any evidence
for the implication that it only happens to system tables and not user
tables?

Also, you don't have zero_damaged_pages turned on by any chance?

regards, tom lane

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Tom Lane (#3)

Re: random system table corruption ...

Tom Lane wrote:

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

That sounds to me like a hardware problem --- disk or disk controller
momentarily writing zeroes instead of what it should write. Have you
seen this on more than one physical machine? Do you have any evidence
for the implication that it only happens to system tables and not user
tables?

Also, you don't have zero_damaged_pages turned on by any chance?

regards, tom lane

tom,

well, there is some evidence that this is not a hardware related issue.
we have only seen this problem from time to time but it happened on
different machines. it cannot be reproduced. it can even happen when
somebody runs a script which has been called million times before.
in my current scenario the page header only consists of 0x00 bytes and
therefore the page checks fails when reading the system table.

i have never seen this in data files up to now (at least not when the
hardware was still intact).

did anybody face similar problems? maybe on sun?
by the way: currently the broken system is running PostgreSQL 7.4 but as
I said - we have also seen that on 8.0 once.

best regards,

hans

--
Cybertec Geschwinde & Schï¿½nig GmbH
Schï¿½ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Alvaro Herrera

alvherre@2ndquadrant.com

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: random system table corruption ...

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jï¿½rgen Schï¿½nig wrote:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself). I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that. But I thought I'd mention it.

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Just treat us the way you want to be treated + some extra allowance
for ignorance." (Michael Brusser)

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Alvaro Herrera (#5)

Re: random system table corruption ...

Alvaro Herrera wrote:

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jï¿½rgen Schï¿½nig wrote:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has
actually been zeroed out.

IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself). I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that. But I thought I'd mention it.

alvora,

thanks a lot.
we have some reports about sun systems.
meanwhile i got the impression that the filesystem might be doing
something wrong. i have seen that the page is not completely zeroed out.
at some strange positions there are 2 bytes of crap (i have overlooked
that at first glance). the first couple hundreds of bytes are crap,
however. very strange ...

best regards,

hans

--
Cybertec Geschwinde & Schï¿½nig GmbH
Schï¿½ngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Alvaro Herrera (#5)

Re: random system table corruption ...

alvora,

what concerns me here: this is a sun system and the problem happened
during normal operation.
there should not be a recovery related operation. something which is
also interesting: there are two corrupted pages in there (page number
22 and 26).
strange thing :(.

thanks a lot,

hans

On 11 Sep 2005, at 20:01, Alvaro Herrera wrote:

Show quoted text

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:

in the past we have faced a couple of problems with corrupted system
tables. this seems to be a version independent problem which
occurs on
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted
page has
actually been zeroed out.

IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself). I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that. But I thought I'd mention it.

--
Alvaro Herrera -- Valdivia, Chile Architect,
www.EnterpriseDB.com
"Just treat us the way you want to be treated + some extra allowance
for ignorance." (Michael Brusser)

---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match