Strange database corruption with PostgreSQL 7.4.x on Debian Sarge
Hello!
We're running the latest release of PostgreSQL 7.4.13 on a Debian Sarge
machine. Postgres has been compiled by oureselves.
We have a pretty big database running on this machine, it has about 6.4 GB
approximately. One table contains about 55 million rows.
Into this table we insert about 500000 rows each day. Our problem is that
without any obvious reason the database gets corrupt. The messages we get
are:
invalid page header in block 437702 of relation "xxxx"
We already have tried out some other versions of 7.4. On another machine
running Debian Woody with PotgreSQL 7.4.10 we don't have any problems.
Kernels are 2.4.33 on the Sarge machine, 2.4.28 on the Woody machine. Both
are SMP kernels.
Does anyone of you perhaps have some hints what's going wrong here?
Best regards,
Matthias
On Wed, 2006-09-20 at 14:34 +0200, Matthias.Pitzl@izb.de wrote:
Hello!
We're running the latest release of PostgreSQL 7.4.13 on a Debian Sarge
machine. Postgres has been compiled by oureselves.
We have a pretty big database running on this machine, it has about 6.4 GB
approximately. One table contains about 55 million rows.
Into this table we insert about 500000 rows each day. Our problem is that
without any obvious reason the database gets corrupt. The messages we get
are:
invalid page header in block 437702 of relation "xxxx"
We already have tried out some other versions of 7.4. On another machine
running Debian Woody with PotgreSQL 7.4.10 we don't have any problems.
Kernels are 2.4.33 on the Sarge machine, 2.4.28 on the Woody machine. Both
are SMP kernels.
Does anyone of you perhaps have some hints what's going wrong here?
Most likely causes in these cases tends to be, bad memory, bad hard
drive, bad cpu, bad RAID / IDE / SCSI controller, loss of power when
writing to IDE drives / RAID controllers with cache with no battery
backup.
I.e. check your hardware.
Matthias.Pitzl@izb.de writes:
invalid page header in block 437702 of relation "xxxx"
I concur with Scott that this sounds suspiciously like a hardware
problem ... but have you tried dumping out the bad pages with
pg_filedump or even just od? The pattern of damage would help to
confirm or disprove the theory.
You can find pg_filedump source code at
http://sources.redhat.com/rhdb/
regards, tom lane