Missing files under pg_data following power failure

Started by Jack Orensteinover 18 years ago4 messagesgeneral
Jump to latest
#1Jack Orenstein
jack.orenstein@hds.com

Our testing involves cutting power to machines running postgres 7.4,
while under load (lots of reads and writes). When we do this, we often
lose some files under pg_data storing table content. I.e., the file
named for a table's pg_class.oid value is simply gone. This can affect
many tables following a power outage. We know this problem has
occurred when we are unable to access a table, e.g.

ERROR: relation "xxx" does not exist

The table is present in the system tables, but the file storing table
content under pg_data is missing.

Can anyone provide insight on this phenomenon? Why are these files
lost? Are they really lost, or have they simply moved somewhere? What
happens to the disk blocks formerly occupied by the files?

Getting back in service following this file loss is not a problem; I'm
just trying to understand how postgres gets into this state.

Jack Orenstein

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Jack Orenstein (#1)
Re: Missing files under pg_data following power failure

On Fri, Oct 12, 2007 at 11:28:04AM -0400, Jack Orenstein wrote:

Our testing involves cutting power to machines running postgres 7.4,
while under load (lots of reads and writes). When we do this, we often
lose some files under pg_data storing table content. I.e., the file
named for a table's pg_class.oid value is simply gone. This can affect
many tables following a power outage. We know this problem has
occurred when we are unable to access a table, e.g.

Really, the files should be under management of the operating system.
What OS are you using and is the filesystems you're using crash safe?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Jack Orenstein (#1)
Re: Missing files under pg_data following power failure

On 10/12/07, Jack Orenstein <jack.orenstein@hds.com> wrote:

Our testing involves cutting power to machines running postgres 7.4,
while under load (lots of reads and writes). When we do this, we often
lose some files under pg_data storing table content. I.e., the file
named for a table's pg_class.oid value is simply gone. This can affect
many tables following a power outage. We know this problem has
occurred when we are unable to access a table, e.g.

ERROR: relation "xxx" does not exist

The table is present in the system tables, but the file storing table
content under pg_data is missing.

Can anyone provide insight on this phenomenon? Why are these files
lost? Are they really lost, or have they simply moved somewhere? What
happens to the disk blocks formerly occupied by the files?

Getting back in service following this file loss is not a problem; I'm
just trying to understand how postgres gets into this state.

First of all, this should not happen on a machine with proper
fsyncing. The possible causes are generally either fsync is off in
postgresql.conf or the drive array <--> OS layer is lying about fsync
operations.

The most common hardware cause is IDE / SATA drives / controllers that
do not enforce fsync, but instead fsync when the data is written to
drive / controller buffer memory and continue on. On IDE / SATA
drives you can often fix this by turning off the cache.

The best way to work with this is to get a known reliable battery
backed caching RAID controller which will make the system fast as well
as reliable. LSI, Areca, and Escalade are all known to make reliable
controllers. Late model Adaptecs have gotten some good reports also,
but their earlier controllers were terrible.

#4Brad Nicholson
bnichols@ca.afilias.info
In reply to: Scott Marlowe (#3)
Re: Missing files under pg_data following power failure

On Fri, 2007-10-12 at 10:54 -0500, Scott Marlowe wrote:

On 10/12/07, Jack Orenstein <jack.orenstein@hds.com> wrote:

Our testing involves cutting power to machines running postgres 7.4,
while under load (lots of reads and writes). When we do this, we often
lose some files under pg_data storing table content. I.e., the file
named for a table's pg_class.oid value is simply gone. This can affect
many tables following a power outage. We know this problem has
occurred when we are unable to access a table, e.g.

ERROR: relation "xxx" does not exist

The table is present in the system tables, but the file storing table
content under pg_data is missing.

Can anyone provide insight on this phenomenon? Why are these files
lost? Are they really lost, or have they simply moved somewhere? What
happens to the disk blocks formerly occupied by the files?

Getting back in service following this file loss is not a problem; I'm
just trying to understand how postgres gets into this state.

First of all, this should not happen on a machine with proper
fsyncing. The possible causes are generally either fsync is off in
postgresql.conf or the drive array <--> OS layer is lying about fsync
operations.

What filesystem are you using? I've seen similar problems on JFS2, it
was something with logredo or fsck being busted on a Big Blue OS.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.