fault tolerance...

Started by Christopher Quinnabout 24 years ago3 messageshackers
Jump to latest
#1Christopher Quinn
cq@htec.demon.co.uk

Hello,

I've been wondering how pgsql goes about guaranteeing data
integrity in the face of soft failures. In particular
whether it uses an alternative to the double root block
technique - which is writing, as a final indication of the
validity of new log records, to alternate disk blocks at
fixed disk locations some meta information including the
location of the last log record written.
This is the only technique I know of - does pgsql use
something analogous?

Also, I note from the developer docs the comment on cacheing
disk drives: can anyone supply a reference on this subject
(I have been on the lookout for a long time without success)
and perhaps more generally on the subject of what exactly
can go wrong with a disk write when struck by power failure.

Lastly, is there any form of integrity checking on disk
block level data? I have vague recollections of seeing
mention of crc/xor in relation to Oracle or DB2.
Whether or not pgsql uses any such scheme I am curious to
know a rationale for its use - it makes me wonder about
what, if anything, can be relied on 100%!

Thanks,
Chris Quinn

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Quinn (#1)
Re: fault tolerance...

Christopher Quinn <cq@htec.demon.co.uk> writes:

I've been wondering how pgsql goes about guaranteeing data
integrity in the face of soft failures. In particular
whether it uses an alternative to the double root block
technique - which is writing, as a final indication of the
validity of new log records, to alternate disk blocks at
fixed disk locations some meta information including the
location of the last log record written.
This is the only technique I know of - does pgsql use
something analogous?

The WAL log uses per-record CRCs plus sequence numbers (both per-record
and per-page) as a way of determining where valid information stops.
I don't see any need for relying on a "root block" in the sense you
describe.

Lastly, is there any form of integrity checking on disk
block level data? I have vague recollections of seeing
mention of crc/xor in relation to Oracle or DB2.

At present we rely on the disk drive to not drop data once it's been
successfully fsync'd (at least not without detecting a read error later).
There was some discussion of adding per-page CRCs as a second-layer
check, but no one seems very excited about it. The performance costs
would be nontrivial and we have not seen all that many reports of field
failures in which a CRC would have improved matters.

regards, tom lane

#3Christopher Quinn
cq@htec.demon.co.uk
In reply to: Christopher Quinn (#1)
Re: fault tolerance...

Tom Lane wrote:

Christopher Quinn <cq@htec.demon.co.uk> writes:

The WAL log uses per-record CRCs plus sequence numbers (both per-record
and per-page) as a way of determining where valid information stops.
I don't see any need for relying on a "root block" in the sense you
describe.

Yes I see.
I imagine if a device were used for the log (non-file so no
EOF to denote end of log/valid-data) there is the
possibility that old record space after the last/valid
record might contain bytes which appear to form another
valid record ... if it weren't for the security of a crc.

check, but no one seems very excited about it. The performance costs
would be nontrivial and we have not seen all that many reports of field
failures in which a CRC would have improved matters.

Access to hard data on such corruption or its theoretical
likelihood would be nice!
Have you referenced any material yourself in deciding what
measures to implement to achieve the level of data security
pgsql currently offers?

Thanks,
Chris