wal_checksum = on (default) | off

Started by Simon Riggsover 19 years ago39 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

In this thread, I outlined an idea for reducing cost of WAL CRC checking
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01299.php

wal_checksum = on (default) | off

Recovery can occur with/without same setting of wal_checksum, to avoid
complications from crashes immediately after turning GUC on.

Patch enclosed here against CVS HEAD, passes make check.

Useful reduction in CPU for both normal operation and recovery.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

wal_checksum.v1.patchtext/x-patch; charset=UTF-8; name=wal_checksum.v1.patchDownload+131-116
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: wal_checksum = on (default) | off

"Simon Riggs" <simon@2ndquadrant.com> writes:

In this thread, I outlined an idea for reducing cost of WAL CRC checking
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01299.php
wal_checksum = on (default) | off

This still seems awfully dangerous to me.

Recovery can occur with/without same setting of wal_checksum, to avoid
complications from crashes immediately after turning GUC on.

Surely not. Otherwise even the "on" setting is not really a defense.

regards, tom lane

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#2)
Re: wal_checksum = on (default) | off

On Thu, 2007-01-04 at 10:00 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

In this thread, I outlined an idea for reducing cost of WAL CRC checking
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01299.php
wal_checksum = on (default) | off

This still seems awfully dangerous to me.

Understood.

Recovery can occur with/without same setting of wal_checksum, to avoid
complications from crashes immediately after turning GUC on.

Surely not. Otherwise even the "on" setting is not really a defense.

Only when the CRC is exactly zero, which happens very very rarely.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#3)
Re: wal_checksum = on (default) | off

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Thu, 2007-01-04 at 10:00 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Recovery can occur with/without same setting of wal_checksum, to avoid
complications from crashes immediately after turning GUC on.

Surely not. Otherwise even the "on" setting is not really a defense.

Only when the CRC is exactly zero, which happens very very rarely.

"It works most of the time" doesn't exactly satisfy me. What's the
use-case for changing the variable on the fly anyway? Seems a better
solution is just to lock down the setting at postmaster start.

regards, tom lane

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#4)
Re: [HACKERS] wal_checksum = on (default) | off

On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Thu, 2007-01-04 at 10:00 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

Recovery can occur with/without same setting of wal_checksum, to avoid
complications from crashes immediately after turning GUC on.

Surely not. Otherwise even the "on" setting is not really a defense.

Only when the CRC is exactly zero, which happens very very rarely.

"It works most of the time" doesn't exactly satisfy me. What's the
use-case for changing the variable on the fly anyway? Seems a better
solution is just to lock down the setting at postmaster start.

That would prevent us from using the secondary checkpoint location, in
the case of a crash effecting the primary checkpoint when it is a
shutdown checkpoint where we changed the setting of wal_checksum. It
seemed safer to allow a very rare error through to the next level of
error checking rather than to close the door so tight that recovery
would not be possible in a very rare case.

If your're good with server start, so am I.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6Florian Weimer
fweimer@bfk.de
In reply to: Simon Riggs (#3)
Re: [HACKERS] wal_checksum = on (default) | off

* Simon Riggs:

Surely not. Otherwise even the "on" setting is not really a defense.

Only when the CRC is exactly zero, which happens very very rarely.

Have you tried switching to Adler32 instead of CRC32?

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Florian Weimer (#6)
Re: [HACKERS] wal_checksum = on (default) | off

On Thu, 2007-01-04 at 17:58 +0100, Florian Weimer wrote:

* Simon Riggs:

Surely not. Otherwise even the "on" setting is not really a defense.

Only when the CRC is exactly zero, which happens very very rarely.

Have you tried switching to Adler32 instead of CRC32?

No. Please explain further.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Weimer (#6)
Re: [HACKERS] wal_checksum = on (default) | off

Florian Weimer <fweimer@bfk.de> writes:

Have you tried switching to Adler32 instead of CRC32?

Is anything known about the error detection capabilities of Adler32?
There's a lot of math behind CRCs but AFAIR Adler's method is pretty
much ad-hoc.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#5)
Re: [HACKERS] wal_checksum = on (default) | off

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote:

"It works most of the time" doesn't exactly satisfy me.

It seemed safer to allow a very rare error through to the next level of
error checking rather than to close the door so tight that recovery
would not be possible in a very rare case.

If a DBA is turning checksums off at all, he's already bought into the
assumption that he's prepared to recover from backups. What you don't
seem to get here is that this "feature" is pretty darn questionable in
the first place, and for it to have a side effect of poking a hole in
the system's reliability even when it's off is more than enough to get
it rejected outright. It's just a No Sale.

I don't believe that the hole is real small, either, as
overwrite-with-zeroes is not exactly an unheard-of failure mode for
filesystems.

regards, tom lane

#10Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#9)
Re: [HACKERS] wal_checksum = on (default) | off

On Thu, 2007-01-04 at 12:13 -0500, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote:

"It works most of the time" doesn't exactly satisfy me.

It seemed safer to allow a very rare error through to the next level of
error checking rather than to close the door so tight that recovery
would not be possible in a very rare case.

If a DBA is turning checksums off at all, he's already bought into the
assumption that he's prepared to recover from backups. What you don't
seem to get here is that this "feature" is pretty darn questionable in
the first place, and for it to have a side effect of poking a hole in
the system's reliability even when it's off is more than enough to get
it rejected outright. It's just a No Sale.

I get it, and I listened. I'm was/am happy to do it the way you
suggested; I was merely explaining that I had considered the issue.

New patch enclosed.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

wal_checksum.v2.patchtext/x-patch; charset=UTF-8; name=wal_checksum.v2.patchDownload+131-116
#11Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Tom Lane (#8)
Re: [HACKERS] wal_checksum = on (default) | off

Tom Lane wrote:

Florian Weimer <fweimer@bfk.de> writes:

Have you tried switching to Adler32 instead of CRC32?

Is anything known about the error detection capabilities of Adler32?
There's a lot of math behind CRCs but AFAIR Adler's method is pretty
much ad-hoc.

As I understand it, it's kinda well studied; but has known
weaknesses in its ability to detect errors under some conditions.

Quoting wikipedia:
"Adler-32 has a weakness for short messages with few hundred bytes,
because the checksums for these messages have a poor coverage of
the 32 available bits...Jonathan Stone discovered in 2001 that Adler-32
has a weakness...An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32...."

I'm not sure if I'm kidding or not here, but I wonder if the not
uncommon requests on the lists of weakening protective features
in postgresql (full-page writes, fsync off, "but mysql says", etc)
suggest that a "dont_protect_against_os_or_hardware_failures" mode
might be in demand for non-critical / development instances.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron Mayer (#11)
Re: [HACKERS] wal_checksum = on (default) | off

Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:

Quoting wikipedia:
"Adler-32 has a weakness for short messages with few hundred bytes,
because the checksums for these messages have a poor coverage of
the 32 available bits...Jonathan Stone discovered in 2001 that Adler-32
has a weakness...An extended explanation can be found in RFC 3309,
which mandates the use of CRC32 instead of Adler-32...."

[ looks at the RFC... ] Yeah, so that pretty much kills it for WAL
entries, which are mostly short.

regards, tom lane

#13Florian Weimer
fw@deneb.enyo.de
In reply to: Tom Lane (#8)
Re: [HACKERS] wal_checksum = on (default) | off

* Tom Lane:

Florian Weimer <fweimer@bfk.de> writes:

Have you tried switching to Adler32 instead of CRC32?

Is anything known about the error detection capabilities of Adler32?
There's a lot of math behind CRCs but AFAIR Adler's method is pretty
much ad-hoc.

Correct me if I'm wrong, but the main reason for the WAL CRC is to
detect partial WAL writes (due to improper caching, for instance).
This means that you're out of the realm of traditional CRC analysis
anyway, because the things you are guarding against are neither burts
errors nor n-bit errors (for small n).

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Weimer (#13)
Re: [HACKERS] wal_checksum = on (default) | off

Florian Weimer <fw@deneb.enyo.de> writes:

* Tom Lane:

There's a lot of math behind CRCs but AFAIR Adler's method is pretty
much ad-hoc.

Correct me if I'm wrong, but the main reason for the WAL CRC is to
detect partial WAL writes (due to improper caching, for instance).

Well, that's *a* reason, but not the only one, and IMHO not one that
gives any particular guidance on what kind of checksum to use.

This means that you're out of the realm of traditional CRC analysis
anyway, because the things you are guarding against are neither burts
errors nor n-bit errors (for small n).

I think short burst errors are fairly likely: the kind of scenario I'm
worried about is a wild store corrupting a word of a WAL entry while
it's waiting around to be written in the WAL buffers. So the CRC math
does give me some comfort that that'll be detected.

regards, tom lane

#15Florian Weimer
fw@deneb.enyo.de
In reply to: Tom Lane (#14)
Re: [HACKERS] wal_checksum = on (default) | off

* Tom Lane:

I think short burst errors are fairly likely: the kind of scenario I'm
worried about is a wild store corrupting a word of a WAL entry while
it's waiting around to be written in the WAL buffers.

Ah, does this mean that each WAL entry gets its own checksum? In this
case, Adler32 is indeed suboptimal because it doesn't use the full 32
bits for short inputs. It might still catch many wild stores, but the
statistics are worse than for CRC32.

(I had assumed that PostgreSQLs WAL checksumming was justified by the
partial write issue. The wild store could easily occur with a heap
page, too, and AFAIK, tuples, aren't checksummed. Which would be an
interesting option, I guess.)

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian Weimer (#15)
Re: [HACKERS] wal_checksum = on (default) | off

Florian Weimer <fw@deneb.enyo.de> writes:

Ah, does this mean that each WAL entry gets its own checksum?

Right.

(I had assumed that PostgreSQLs WAL checksumming was justified by the
partial write issue. The wild store could easily occur with a heap
page, too, and AFAIK, tuples, aren't checksummed. Which would be an
interesting option, I guess.)

We've discussed it but there's never been a pressing reason to do it.

regards, tom lane

#17Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Simon Riggs (#5)
Re: [HACKERS] wal_checksum = on (default) | off

Recovery can occur with/without same setting of wal_checksum, to

avoid

complications from crashes immediately after turning GUC on.

Surely not. Otherwise even the "on" setting is not really a

defense.

Only when the CRC is exactly zero, which happens very very rarely.

"It works most of the time" doesn't exactly satisfy me. What's the

Agreed

use-case for changing the variable on the fly anyway? Seems a

better

solution is just to lock down the setting at postmaster start.

I guess that the use case is more for a WAL based replicate, that
has/wants a different setting. Maybe we want a WAL entry for the change,
or force a log switch (so you can interrupt the replicate, change it's
setting
and proceed with the next log) ?

Maybe a 3rd mode for replicates that ignores 0 CRC's ?

Andreas

#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Zeugswetter Andreas SB SD (#17)
Re: [HACKERS] wal_checksum = on (default) | off

On Fri, 2007-01-05 at 11:01 +0100, Zeugswetter Andreas ADI SD wrote:

What's the use-case for changing the variable on the fly anyway? Seems a

better

solution is just to lock down the setting at postmaster start.

I guess that the use case is more for a WAL based replicate, that
has/wants a different setting. Maybe we want a WAL entry for the change,
or force a log switch (so you can interrupt the replicate, change it's
setting
and proceed with the next log) ?

Maybe a 3rd mode for replicates that ignores 0 CRC's ?

Well, wal_checksum allows you to have this turned ON for the main server
and OFF on a Warm Standby.

The recovery process doesn't check for postgresql.conf reloads, so
setting it at server start is effectively the same thing in that case.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#19Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Simon Riggs (#18)
Re: [HACKERS] wal_checksum = on (default) | off

What's the use-case for changing the variable on the fly anyway?

Seems a

better

solution is just to lock down the setting at postmaster start.

I guess that the use case is more for a WAL based replicate, that
has/wants a different setting. Maybe we want a WAL entry for the

change,

or force a log switch (so you can interrupt the replicate, change

it's

setting
and proceed with the next log) ?

Maybe a 3rd mode for replicates that ignores 0 CRC's ?

Well, wal_checksum allows you to have this turned ON for the main

server

and OFF on a Warm Standby.

Ok, so when you need CRC's on a replicate (but not on the master) you
turn it
off during standby replay, but turn it on when you start the replicate
for normal operation.

Andreas

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#19)
Re: [HACKERS] wal_checksum = on (default) | off

"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:

Ok, so when you need CRC's on a replicate (but not on the master) you
turn it
off during standby replay, but turn it on when you start the replicate
for normal operation.

Thought: even when it's off, the CRC had better be computed for
shutdown-checkpoint records. Else there's no way to turn it on even
with a postmaster restart --- unless we accept the idea of poking a hole
in the normal mode. (Which I still dislike, and even more so if the
special value is zero. Almost any other value would be safer than zero.)

On the whole, though, I still don't want to put this in. I don't think
Simon has thought it through sufficiently, and we haven't even seen any
demonstration of a big speedup.

(Another hole in the implementation as given: pg_resetxlog.)

regards, tom lane

#21Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#20)
#22Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Zeugswetter Andreas SB SD (#19)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#22)
#24Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#23)
#26Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#25)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#28)
#30Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#29)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#31)
#33Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#33)
#35Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#35)
#37Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#36)
#38Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#37)
#39Martijn van Oosterhout
kleptog@svana.org
In reply to: Simon Riggs (#38)