Re: TODO list
Bruce,
Two changes for the TODO list.
1. Under "RELIABILITY/MISC", add:
Write out a CRC with each data block, and verify it on reading.
2. Under SOURCE CODE, I believe Tom has already implemented:
Correct CRC WAL code to be a real CRC64 algorithm
TODO updated. I know we did number 2, but did we agree on #1 and is it
done?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Import Notes
Reply to msg id not found: 20010404135342.A9514@store.zembu.com
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Two changes for the TODO list.
1. Under "RELIABILITY/MISC", add:
Write out a CRC with each data block, and verify it on reading.
2. Under SOURCE CODE, I believe Tom has already implemented:
Correct CRC WAL code to be a real CRC64 algorithm
TODO updated. I know we did number 2, but did we agree on #1 and is it
done?
#2 is indeed done. #1 is not done, and possibly not agreed to ---
I think Vadim had doubts about its usefulness, though personally I'd
like to see it.
regards, tom lane
TODO updated. I know we did number 2, but did we agree on #1 and is it
done?#2 is indeed done. #1 is not done, and possibly not agreed to ---
I think Vadim had doubts about its usefulness, though personally I'd
like to see it.
That was my recollection too. This was the discussion about testing the
disk hardware. #1 removed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
1. Under "RELIABILITY/MISC", add:
Write out a CRC with each data block, and verify it on reading.
TODO updated. I know we did number 2, but did we agree on #1 and is it
done?
Has anybody done performance and reliability tests with CRC64 ?
I think it must be a CPU eater. It looks a lot more complex than a CRC32.
Since we need to guard a maximum of 32k bytes for pg pages I would - if at all -
consider to use a 32bit adler instead of a CRC, since that is a lot cheaper
to calculate.
Andreas
Import Notes
Resolved by subject fallback
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
Has anybody done performance and reliability tests with CRC64 ?
I think it must be a CPU eater. It looks a lot more complex than a CRC32.
On my box (PA-RISC) the inner loop is about 14 cycles/byte, vs. about
7 cycles/byte for CRC32. On almost any machine, either one will be
negligible in comparison to the cost of disk I/O.
Since we need to guard a maximum of 32k bytes for pg pages I would -
if at all - consider to use a 32bit adler instead of a CRC, since that
is a lot cheaper to calculate.
You are several months too late to re-open that argument. It's done and
it's not changing for 7.1.
regards, tom lane
TODO updated. I know we did number 2, but did we agree on #1 and is
it
done?
#2 is indeed done. #1 is not done, and possibly not agreed to ---
I think Vadim had doubts about its usefulness, though personally I'd
like to see it.That was my recollection too. This was the discussion about testing the
disk hardware. #1 removed.
What is recommended in the bible (Gray and Reuter), especially for larger
disk block sizes that may not be written atomically, is to have a word at
the end of the that must match a word at the beginning of the block. It
gets changed each time you write the block.
Ken Hirsch
All your database are belong to us.
On Thu, Apr 05, 2001 at 04:25:42PM -0400, Ken Hirsch wrote:
TODO updated. I know we did number 2, but did we agree on #1 and is
it
done?
#2 is indeed done. #1 is not done, and possibly not agreed to ---
I think Vadim had doubts about its usefulness, though personally I'd
like to see it.That was my recollection too. This was the discussion about testing the
disk hardware. #1 removed.What is recommended in the bible (Gray and Reuter), especially for larger
disk block sizes that may not be written atomically, is to have a word at
the end of the that must match a word at the beginning of the block. It
gets changed each time you write the block.
That only works if your blocks are atomic. Even SCSI disks reorder
sector writes, and they are free to write the first and last sectors
of an 8k-32k block, and not have written the intermediate blocks
before the power goes out. On IDE disks it is of course far worse.
(On many (most?) IDE drives, even when they have been told to report
write completion only after data is physically on the platter, they will
"forget" if they see activity that looks like benchmarking. Others just
ignore the command, and in any case they all default to unsafe mode.)
If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects? Maybe
the objections could be dealt with, and everyone satisfied.
Nathan Myers
ncm@zembu.com
If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects? Maybe
the objections could be dealt with, and everyone satisfied.
Unordered disk writes are covered by backing up modified blocks
in log. It allows not only catch such writes, as would CRC do,
but *avoid* them.
So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.
Vadim
Import Notes
Resolved by subject fallback
On Thu, Apr 05, 2001 at 02:27:48PM -0700, Mikheev, Vadim wrote:
If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects? Maybe
the objections could be dealt with, and everyone satisfied.Unordered disk writes are covered by backing up modified blocks
in log. It allows not only catch such writes, as would CRC do,
but *avoid* them.So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.
OK, this was already discussed, maybe while Vadim was absent.
Should I re-post the previous text?
Nathan Myers
ncm@zembu.com
So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.OK, this was already discussed, maybe while Vadim was absent.
Should I re-post the previous text?
Let's return to this discussion *after* 7.1 release.
My main objection was (and is) - no time to deal with
this issue for 7.1
Vadim
Import Notes
Resolved by subject fallback
On Thu, Apr 05, 2001 at 02:47:41PM -0700, Mikheev, Vadim wrote:
So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.OK, this was already discussed, maybe while Vadim was absent.
Should I re-post the previous text?Let's return to this discussion *after* 7.1 release.
My main objection was (and is) - no time to deal with
this issue for 7.1.
OK, everybody agreed on that before.
This doesn't read like an objection to having it on the TODO list for
some future release.
Nathan Myers
ncm@zembu.com
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects? Maybe
the objections could be dealt with, and everyone satisfied.
Unordered disk writes are covered by backing up modified blocks
in log. It allows not only catch such writes, as would CRC do,
but *avoid* them.
So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.
Oh, I see. For anyone else who has trouble reading between the lines:
Blocks that have recently been written, but failed to make it down to
the disk platter intact, should be restorable from the WAL log. So we
do not need a block-level CRC to guard against partial writes.
A block-level CRC might be useful to guard against long-term data
lossage, but Vadim thinks that the disk's own CRCs ought to be
sufficient for that (and I can't say I disagree).
So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else, ie,
during read or during a "cp -r" type copy of the database to another
location. That's not a totally negligible risk, but is it worth the
overhead of updating and checking block CRCs? Seems dubious at best.
regards, tom lane
So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.Oh, I see. For anyone else who has trouble reading between the lines:
Blocks that have recently been written, but failed to make it down to
the disk platter intact, should be restorable from the WAL log. So we
do not need a block-level CRC to guard against partial writes.A block-level CRC might be useful to guard against long-term data
lossage, but Vadim thinks that the disk's own CRCs ought to be
sufficient for that (and I can't say I disagree).So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else, ie,
during read or during a "cp -r" type copy of the database to another
location. That's not a totally negligible risk, but is it worth the
overhead of updating and checking block CRCs? Seems dubious at best.
Agreed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Thu, Apr 05, 2001 at 06:25:17PM -0400, Tom Lane wrote:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
If the reason that a block CRC isn't on the TODO list is that Vadim
objects, maybe we should hear some reasons why he objects? Maybe
the objections could be dealt with, and everyone satisfied.Unordered disk writes are covered by backing up modified blocks
in log. It allows not only catch such writes, as would CRC do,
but *avoid* them.So, for what CRC could be used? To catch disk damages?
Disk has its own CRC for this.Blocks that have recently been written, but failed to make it down to
the disk platter intact, should be restorable from the WAL log. So we
do not need a block-level CRC to guard against partial writes.
If a block is missing some sectors in the middle, how would you know
to reconstruct it from the WAL, without a block CRC telling you that
the block is corrupt?
A block-level CRC might be useful to guard against long-term data
lossage, but Vadim thinks that the disk's own CRCs ought to be
sufficient for that (and I can't say I disagree).
The people who make the disks don't agree.
They publish the error rate they guarantee, and they meet it, more
or less. They publish a rate that is _just_ low enough to satisfy
noncritical requirements (on the correct assumption that they can't
satisfy critical requirements in any case) and high enough not to
interfere with benchmarks. They assume that if you need better
reliability you can and will provide it yourself, and rely on their
CRC only as a performance optimization.
At the raw sector level, they get (and correct) errors very frequently;
when they are not getting "enough" errors, they pack the bits more
densely until they do, and sell a higher-density drive.
So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else, ie,
during read or during a "cp -r" type copy of the database to another
location. That's not a totally negligible risk, but is it worth the
overhead of updating and checking block CRCs? Seems dubious at best.
Vadim didn't want to re-open this discussion until after 7.1 is out
the door, but that "dubious at best" demands an answer. See the archive
posting:
http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/msg00473.html
...
Incidentally, is the page at
http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/
the best place to find old messages? It's never worked right for me.
Nathan Myers
ncm@zembu.com
At 18:25 5/04/01 -0400, Tom Lane wrote:
A block-level CRC might be useful to guard against long-term data
lossage, but Vadim thinks that the disk's own CRCs ought to be
sufficient for that (and I can't say I disagree).So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else
What about guarding against file system problems, like blocks of one
(non-PG) file erroneously writing to blocks of another (PG table) file?
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
Blocks that have recently been written, but failed to make
it down to the disk platter intact, should be restorable from
the WAL log. So we do not need a block-level CRC to guard
against partial writes.If a block is missing some sectors in the middle, how would you know
to reconstruct it from the WAL, without a block CRC telling you that
the block is corrupt?
On recovery we unconditionally copy *entire* block content from the log
for each block modified since last checkpoint. And we do not write new
checkpoint record (ie do not advance recovery start point) untill we know
that all data blocks are flushed on disk (including blocks modified before
checkpointer started).
Vadim
Import Notes
Resolved by subject fallback
Philip Warner <pjw@rhyme.com.au> writes:
So the only real benefit of a block-level CRC would be to guard against
bits dropped in transit from the disk surface to someplace else
What about guarding against file system problems, like blocks of one
(non-PG) file erroneously writing to blocks of another (PG table) file?
Well, what about it? Can you offer numbers demonstrating that this risk
is probable enough to justify the effort and runtime cost of a block
CRC?
If we're in the business of expending cycles to guard against
nil-probability risks, let's checksum our executables every time we
start up, to make sure they're not overwritten. Actually, we'd better
re-checksum program text memory every few seconds, in case RAM dropped
a bit since we looked last. And let's follow every memcpy by a memcmp
to make sure that didn't drop a bit. Heck, let's keep a CRC on every
palloc'd memory block. And so on and so forth. Sooner or later you've
got to draw the line at diminishing returns, both for runtime costs
and for the programming effort you spent on this stuff (instead of on
finding/fixing bugs that might bite you with far greater frequency than
anything a CRC might catch for you).
To be perfectly clear: I have actually seen bug reports trace to
problems that I think a block-level CRC might have detected (not
corrected, of course, but at least the user might have realized he had
flaky hardware a little sooner). So I do not say that the upside to
a block CRC is nil. But I am unconvinced that it exceeds the downside,
in development effort, runtime, false failure reports (is that CRC error
really due to hardware trouble, or a software bug that failed to update
the CRC? and how do you get around the CRC error to get at your data??)
etc etc.
regards, tom lane
If we're in the business of expending cycles to guard against
nil-probability risks, let's checksum our executables every time we
start up, to make sure they're not overwritten. Actually, we'd
better
re-checksum program text memory every few seconds, in case RAM
dropped
a bit since we looked last. And let's follow every memcpy by a
memcmp
to make sure that didn't drop a bit. Heck, let's keep a CRC on
every
Why does it sound like you have problems with radiation eating away at
your live memory for satellite operations?
At 22:52 5/04/01 -0400, Tom Lane wrote:
What about guarding against file system problems, like blocks of one
(non-PG) file erroneously writing to blocks of another (PG table) file?Well, what about it? Can you offer numbers demonstrating that this risk
is probable enough to justify the effort and runtime cost of a block
CRC?
Rhetorical crap aside, I've had more file system falures (including badly
mapped file data) than I have had disk hardware failures. So, if you are
considering 'bits dropped in transit', you should also be considering data
corruption not related to the hardware.
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
To be perfectly clear: I have actually seen bug reports trace to
problems that I think a block-level CRC might have detected (not
corrected, of course, but at least the user might have realized he had
flaky hardware a little sooner). So I do not say that the upside to
a block CRC is nil. But I am unconvinced that it exceeds the
downside, in development effort, runtime, false failure reports
(is that CRC error really due to hardware trouble, or a software bug
that failed to update the CRC? and how do you get around the CRC error
to get at your data??) etc etc.
Something to remember: currently we update t_infomask (set
HEAP_XMAX_COMMITTED etc) while holding share lock on buffer -
we have to change this before block CRC implementation.
Vadim
Import Notes
Resolved by subject fallback
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
Something to remember: currently we update t_infomask (set
HEAP_XMAX_COMMITTED etc) while holding share lock on buffer -
we have to change this before block CRC implementation.
Yeah, we'd lose some concurrency there.
regards, tom lane