Durability?

Started by Emmanuel Cecchetover 16 years ago4 messages
#1Emmanuel Cecchet
manu@frogthinker.org

Hi,

I got an error like this:

ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440
CONTEXT: writing block 529 of relation 1663/233690/1247
WARNING: could not write block 529 of 1663/233690/1247
DETAIL: Multiple failures --- write error might be permanent.

The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing.

I am not sure to understand clearly the consequences of such error since Postgres continues to accept new transactions. If my WAL is corrupted, are my transactions still durable?
If this is a violation of durability, is there a way to force Postgres to terminate on such error?

Thanks in advance for the clarification.
Emmanuel

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Emmanuel Cecchet (#1)
Re: Durability?

Emmanuel Cecchet <manu@frogthinker.org> writes:

I got an error like this:

ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440
CONTEXT: writing block 529 of relation 1663/233690/1247
WARNING: could not write block 529 of 1663/233690/1247
DETAIL: Multiple failures --- write error might be permanent.

The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing.

It looks like you've got a corrupted page in shared buffers, and every
time the system tries to flush it to disk for a checkpoint, it fails.

What I'd try for getting out this is to kill -9 some backend in order
to force a database restart. Of course, if you want to investigate
what caused it, you should dig around in shared memory first and try
to get a copy of that buffer's contents.

regards, tom lane

#3Emmanuel Cecchet
manu@frogthinker.org
In reply to: Tom Lane (#2)
Re: Durability?

Tom Lane wrote:

Emmanuel Cecchet <manu@frogthinker.org> writes:

I got an error like this:

ERROR: xlog flush request 1/C121E998 is not satisfied --- flushed only to 1/BCBCB440
CONTEXT: writing block 529 of relation 1663/233690/1247
WARNING: could not write block 529 of 1663/233690/1247
DETAIL: Multiple failures --- write error might be permanent.

The xrecoff value (logs show 1/xrecoff) advances a few times during the day, but the message keeps appearing.

It looks like you've got a corrupted page in shared buffers, and every
time the system tries to flush it to disk for a checkpoint, it fails.

What I'd try for getting out this is to kill -9 some backend in order
to force a database restart. Of course, if you want to investigate
what caused it, you should dig around in shared memory first and try
to get a copy of that buffer's contents.

Will the database be able to restart with a corrupted WAL?
If the database restarts, what transactions will be missing:
- just the block that couldn't be flushed?
- all transactions that were committed after the faulty block?
- more?

Thanks
Emmanuel

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Emmanuel Cecchet (#3)
Re: Durability?

Emmanuel Cecchet <manu@frogthinker.org> writes:

Tom Lane wrote:

It looks like you've got a corrupted page in shared buffers, and every
time the system tries to flush it to disk for a checkpoint, it fails.

Will the database be able to restart with a corrupted WAL?

I don't think you have a corrupted WAL.

regards, tom lane