Enabling Checksums
As I understand it, the main part of the remaining work to be done for
the checksums patch (at least the first commit) is to have a better way
to enable/disable them.
For the sake of simplicity (implementation as well as usability), it
seems like there is agreement that checksums should be enabled or
disabled for the entire instance, not per-table.
I don't think a GUC entirely makes sense (in its current form, anyway).
We basically care about 3 states:
1. Off: checksums are not written, nor are they verified. Pages that
are newly dirtied have the checksum information in the header cleared.
2. Enabling: checksums are written for every dirty page, but only
verified for pages where the checksum is present (as determined by
information in the page header).
3. On: checksums are written for every dirty page, and verified for
every page that's read. If a page does not have a checksum, it's
corrupt.
Does it make sense to store this information in pg_control? That doesn't
require adding any new file, and it has the benefit that it's already
checksummed. It's available during recovery and can be made available
pretty easily in the places where we write data.
And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"
And then to get to the "On" state, you have to run a system-wide VACUUM
while in the "Enabling" state. Or, if the above syntax causes problems,
we can make all of these into VACUUM options.
Thoughts?
Regards,
Jeff Davis
Jeff Davis wrote:
And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"And then to get to the "On" state, you have to run a system-wide VACUUM
while in the "Enabling" state. Or, if the above syntax causes problems,
we can make all of these into VACUUM options.
There's no such thing as a system-wide VACUUM. The most you can get is
a database-wide VACUUM, which means you'd have to store the state
per-database somewhere (presumably the pg_database catalog), and perhaps
pg_control could have it as a system-wide value that's computed as the
minimum of all database states (so it stays "enabling" until all
databases have upgraded to "on").
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Friday, November 09, 2012 6:32 AM Jeff Davis wrote:
As I understand it, the main part of the remaining work to be done for
the checksums patch (at least the first commit) is to have a better way
to enable/disable them.For the sake of simplicity (implementation as well as usability), it
seems like there is agreement that checksums should be enabled or
disabled for the entire instance, not per-table.I don't think a GUC entirely makes sense (in its current form, anyway).
We basically care about 3 states:
1. Off: checksums are not written, nor are they verified. Pages that
are newly dirtied have the checksum information in the header cleared.
2. Enabling: checksums are written for every dirty page, but only
verified for pages where the checksum is present (as determined by
information in the page header).
3. On: checksums are written for every dirty page, and verified for
every page that's read. If a page does not have a checksum, it's
corrupt.Does it make sense to store this information in pg_control? That doesn't
require adding any new file, and it has the benefit that it's already
checksummed. It's available during recovery and can be made available
pretty easily in the places where we write data.And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"And then to get to the "On" state, you have to run a system-wide VACUUM
while in the "Enabling" state. Or, if the above syntax causes problems,
we can make all of these into VACUUM options.
I think one thing may needs to be taken care during such a VACUUM operation
is not to allow user to say
CHECKSUM DISABLE.
Also how about following ways :
1. Allow CHECKSUM Enable only during initdb as mentioned by Robert.
Allow user to only do CHECKSUM DISABLE after initdb.
2. Do the Checksum only for particular pages (SRLU) or to do for System
tables only.
With Regards,
Amit Kapila.
On 09/11/12 02:01, Jeff Davis wrote:
As I understand it, the main part of the remaining work to be done for
the checksums patch (at least the first commit) is to have a better way
to enable/disable them.For the sake of simplicity (implementation as well as usability), it
seems like there is agreement that checksums should be enabled or
disabled for the entire instance, not per-table.
I can definately see that simplicity is an argument here, but
I can easily imagine that some performance hungry users
would prefer to be able to disable the functionality on a
per table level. UNCHECKSUMMED TABLES (similar to UNLOGGED TABLES).
I would definately stuff our system in state = 2 in your
description if it was available.
--
Jesper
Jeff,
On 11/09/2012 02:01 AM, Jeff Davis wrote:
For the sake of simplicity (implementation as well as usability), it
seems like there is agreement that checksums should be enabled or
disabled for the entire instance, not per-table.
Agreed. I've quickly thought about making it a per-database setting, but
how about shared system catalogs... Let's keep it simple and have a
single per-cluster instance switch for now.
I don't think a GUC entirely makes sense (in its current form, anyway).
We basically care about 3 states:
1. Off: checksums are not written, nor are they verified. Pages that
are newly dirtied have the checksum information in the header cleared.
2. Enabling: checksums are written for every dirty page, but only
verified for pages where the checksum is present (as determined by
information in the page header).
3. On: checksums are written for every dirty page, and verified for
every page that's read. If a page does not have a checksum, it's
corrupt.
Sounds sane, yes.
And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"
Yet another SQL command doesn't feel like the right thing for such a
switch. Quick googling revealed that CHECKSUM is a system function in MS
SQL and MySQL knows a CHECKSUM TABLE command. And you never know what
the committee is coming up with next.
Apart from that, I'd like something more descriptive that just
"checksums". Block checksums? Heap checksums? Data checksums?
Regards
Markus Wanner
On 11/09/2012 06:18 AM, Jesper Krogh wrote:
I would definately stuff our system in state = 2 in your
description if it was available.
Hm.. that's an interesting statement.
What's probably worst when switching from OFF to ON is the VACUUM run
that needs to touch every page (provided you haven't ever turned
checksumming on before). Maybe you want to save that step and still get
the additional safety for newly dirtied pages, right?
A use case worth supporting?
Regards
Markus Wanner
Jeff,
I don't think a GUC entirely makes sense (in its current form, anyway).
We basically care about 3 states:
Huh? Why would a GUC not make sense? How else would you make sure that
checksums where on when you started the system?
1. Off: checksums are not written, nor are they verified. Pages that
are newly dirtied have the checksum information in the header cleared.
2. Enabling: checksums are written for every dirty page, but only
verified for pages where the checksum is present (as determined by
information in the page header).
3. On: checksums are written for every dirty page, and verified for
every page that's read. If a page does not have a checksum, it's
corrupt.
Well, large databases would tend to be stuck permanently in "Enabling",
becuase the user would never vacuum old cold partitions in order to
checksum them. So we need to be prepared for this to be the end state
for a lot of databases.
In fact, we'd need three settings for the checksum GUC:
OFF -- don't checksum anything, equal to state (1) above
WRITES -- checksum pages which are being written anyway, but ignore
tables which aren't touched. Permanent "Enabling" state.
ALL -- checksum everything you can. particularly, autovacuum would
checksum any table which was not already checksummed at the next vacuum
of that table. Goal is to get to state 3 above.
Does it make sense to store this information in pg_control? That doesn't
require adding any new file, and it has the benefit that it's already
checksummed. It's available during recovery and can be made available
pretty easily in the places where we write data.And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"
Don't like this, please make it a GUC.
And then to get to the "On" state, you have to run a system-wide VACUUM
while in the "Enabling" state. Or, if the above syntax causes problems,
we can make all of these into VACUUM options.
As there's no such thing as system-wide vacuum, we're going to have to
track whether a table is "fully checksummed" in the system catalogs.
We'll also need:
VACUUM ( CHECKSUM ON )
... which would vacuum an entire table, skipping no pages and writing
checksums for every page, unless the table were marked fully checksummed
already, in which case it would do a regular vacuum.
Once a table was flagged as "all checksummed", then the system could
start producing errors (or warnings?) whenever a page with a missing
checksum was found.
Hmmm, better to have a 2nd GUC:
checksum_fail_action = WARNING | ERROR
... since some people want the write or read to fail, and others just
want to see it in the logs.
So, thinking about it, state (3) is never the state of an entire
installation; it's always the state of individual tables.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Fri, 2012-11-09 at 15:42 +0100, Markus Wanner wrote:
On 11/09/2012 06:18 AM, Jesper Krogh wrote:
I would definately stuff our system in state = 2 in your
description if it was available.Hm.. that's an interesting statement.
What's probably worst when switching from OFF to ON is the VACUUM run
that needs to touch every page (provided you haven't ever turned
checksumming on before). Maybe you want to save that step and still get
the additional safety for newly dirtied pages, right?A use case worth supporting?
One problem is telling which pages are protected and which aren't. We
can have a couple bits in the header indicating that a checksum is
present, but it's a little disappointing to have only a few bits
protecting a 16-bit checksum.
Also, I think that people will want to have a way to protect their old
data somehow.
Regards,
Jeff Davis
On Thu, 2012-11-08 at 23:33 -0300, Alvaro Herrera wrote:
There's no such thing as a system-wide VACUUM. The most you can get is
a database-wide VACUUM, which means you'd have to store the state
per-database somewhere (presumably the pg_database catalog), and perhaps
pg_control could have it as a system-wide value that's computed as the
minimum of all database states (so it stays "enabling" until all
databases have upgraded to "on").
That's a good point. Maybe this should be done as an offline operation
using a command-line utility?
Regards,
Jeff Davis
On 11/09/2012 07:53 PM, Jeff Davis wrote:
One problem is telling which pages are protected and which aren't. We
can have a couple bits in the header indicating that a checksum is
present, but it's a little disappointing to have only a few bits
protecting a 16-bit checksum.
Given your description of option 2 I was under the impression that each
page already has a bit indicating whether or not the page is protected
by a checksum. Why do you need more bits than that?
Also, I think that people will want to have a way to protect their old
data somehow.
Well, given that specific set of users is not willing to go through a
rewrite of each and every page of its database, it's hard to see how we
can protect their old data better.
However, we certainly need to provide the option to go through the
rewrite for other users, who are well willing to bite that bullet.
From a users perspective, the trade-off seems to be: if you want your
old data to be covered by checksums, you need to go through such an
expensive VACUUM run that touches every page in your database.
If you don't want to or cannot do that, you can still turn on
checksumming for newly written pages. You won't get full protection and
it's hard to tell what data is protected and what not, but it's still
better than no checksumming at all. Especially for huge databases, that
might be a reasonable compromise.
One could even argue, that this just leads to a prolonged migration and
with time, the remaining VACUUM step becomes less and less frightening.
Do you see any real foot-guns or other show-stoppers for permanently
allowing that in-between-state?
Or do we have other viable options that prolong the migration and thus
spread the load better over time?
Regards
Markus Wanner
On Fri, 2012-11-09 at 20:48 +0100, Markus Wanner wrote:
Given your description of option 2 I was under the impression that each
page already has a bit indicating whether or not the page is protected
by a checksum. Why do you need more bits than that?
The bit indicating that a checksum is present may be lost due to
corruption.
However, we certainly need to provide the option to go through the
rewrite for other users, who are well willing to bite that bullet.
That's the use case that I've been focusing on, but perhaps you are
right that it's not the only important one.
Do you see any real foot-guns or other show-stoppers for permanently
allowing that in-between-state?
The biggest problem that I see is a few bits indicating the presence of
a checksum may be vulnerable to more kinds of corruption.
Regards,
Jeff Davis
On Nov10, 2012, at 00:08 , Jeff Davis <pgsql@j-davis.com> wrote:
On Fri, 2012-11-09 at 20:48 +0100, Markus Wanner wrote:
Given your description of option 2 I was under the impression that each
page already has a bit indicating whether or not the page is protected
by a checksum. Why do you need more bits than that?The bit indicating that a checksum is present may be lost due to
corruption.
Though that concern mostly goes away if instead of a separate bit we use a
special checksum value, say 0xDEAD, to indicate that the page isn't
checksummed, no?
If checksums were always enabled, the probability of a random corruption
going undetected is N/N^2 = 1/N where N is the number of distinct checksum
values, since out of the N^2 equally likely pairs of computed and stored
checksums values, N show two identical values.
With the 0xDEAD-scheme, the probability of a random corruption going
undetected is (N-1 + N)/N^2 = 2/N - 1/N^2, since there are (N-1) pairs
with identical values != 0xDEAD, and N pairs where the stored checksum
value is 0xDEAD.
So instead of a 1 in 65536 chance of a corruption going undetected, the
0xDEAD-schema gives (approximately) a chance of 1 in 32768, i.e the
strength of the checksum is reduced by one bit. That's still acceptable,
I'd say.
In practice, 0xDEAD may be a bad choice because of it's widespread use
as an uninitialized marker for blocks of memory. A randomly picked value
would probably be a better choice.
best regards,
Florian Pflug
On Sat, 2012-11-10 at 14:46 +0100, Florian Pflug wrote:
The bit indicating that a checksum is present may be lost due to
corruption.Though that concern mostly goes away if instead of a separate bit we use a
special checksum value, say 0xDEAD, to indicate that the page isn't
checksummed, no?
Right. But then we have an upgrade impact to set the checksum to 0xDEAD
on all existing pages, which seems to eliminate most of the possible
reason for it.
Also, we'd need to tweak the algorithm to make sure that it never landed
on that magic value. So if we think we might want this in the future, we
should reserve that magic value now.
But I can't think of many reasons for it, unless we expect people to be
turning checksums on and off repeatedly.
Regards,
Jeff Davis
On Fri, 2012-11-09 at 09:57 -0800, Josh Berkus wrote:
Huh? Why would a GUC not make sense? How else would you make sure that
checksums where on when you started the system?
If we stored the information in pg_control, you could check with
pg_controldata. We could have a separate utility, pg_checksums, that can
alter the state and/or do an offline verification. And initdb would take
an option that would start everything out fully protected with
checksums.
The problem with a GUC is that checksums aren't really something you can
change by just changing the variable and restarting, unless you are only
using checksums opportunistically (only write checksums when a page is
dirtied and only verify a checksum if the header indicates that it's
present).
There are also usability issues. If someone has a fully-protected
instance, and turns the GUC off, and starts the server, they'll lose the
"fully-protected" status on the first write, and have to re-read all the
data to get back to fully protected. That just doesn't seem right to me.
Well, large databases would tend to be stuck permanently in "Enabling",
becuase the user would never vacuum old cold partitions in order to
checksum them. So we need to be prepared for this to be the end state
for a lot of databases.
That may be true, but if that's the case, it's more like a 3-bit
checksum than a 16-bit checksum, because of the page-header corruption
problem. I don't know of any way to give those users more than that,
which won't be good enough for the set-at-initdb time users.
In fact, we'd need three settings for the checksum GUC:
OFF -- don't checksum anything, equal to state (1) above
WRITES -- checksum pages which are being written anyway, but ignore
tables which aren't touched. Permanent "Enabling" state.ALL -- checksum everything you can. particularly, autovacuum would
checksum any table which was not already checksummed at the next vacuum
of that table. Goal is to get to state 3 above.
That's slightly more eager, but it's basically the same as the WRITES
state. In order to get to the fully-protected state, you still need to
somehow make sure that all of the old data is checksummed.
And the "fully protected" state is important in my opinion, because
otherwise we aren't protected against corrupt page headers that say they
have no checksum (even when it really should have a checksum).
Does it make sense to store this information in pg_control? That doesn't
require adding any new file, and it has the benefit that it's already
checksummed. It's available during recovery and can be made available
pretty easily in the places where we write data.And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"Don't like this, please make it a GUC.
I'll see if you have ideas about how to resolve the problems with a GUC
that I mentioned above. But if not, then what about using a utility,
perhaps called pg_checksums? That way we wouldn't need new syntax.
As there's no such thing as system-wide vacuum, we're going to have to
track whether a table is "fully checksummed" in the system catalogs.
It seems like this is going down the road of per-table checksums. I'm
not opposed to that, but that has a low chance of making 9.3.
Let's try to do something simpler now that leaves open the possibility
of more flexibility later. I'm inclined to agree with Robert that the
first patch should probably be an initdb-time option. Then, we can allow
a lazy mode (like your WRITES state) and an eager offline check with a
pg_checksums utility. Then we can work towards per-table checksums,
control via VACUUM, protecting the SLRU, treating zero pages as invalid,
protecting temp files (which can be a GUC), replication integration,
etc.
Hmmm, better to have a 2nd GUC:
checksum_fail_action = WARNING | ERROR
... since some people want the write or read to fail, and others just
want to see it in the logs.
Checksums don't introduce new failure modes on writes, only on reads.
And for reads, I think we have a problem doing anything less than an
ERROR. If we allow the read to succeed, we either risk a crash (or
silently corrupting other buffers in shared memory), or we have to put a
zero page in its place. But we already have the zero_damaged_pages
option, which I think is better because reading corrupt data is only
useful for data recovery efforts.
So, thinking about it, state (3) is never the state of an entire
installation; it's always the state of individual tables.
That contradicts the idea of using a GUC then. It would make more sense
to have extra syntax or extra VACUUM modes to accomplish that per-table.
Unfortunately, I'm worried that the per-table approach will not be
completed by 9.3. Do you see something about my proposal that makes it
harder to get where we want to go in the future?
If we do ultimately get per-table checksums, then I agree that a flag in
pg_control may be a bit of a wart, but it's easy enough to remove later.
Regards,
Jeff Davis
Hello
Does it make sense to store this information in pg_control? That doesn't
require adding any new file, and it has the benefit that it's already
checksummed. It's available during recovery and can be made available
pretty easily in the places where we write data.And the next question is what commands to add to change state. Ideas:
CHECKSUMS ENABLE; -- set state to "Enabling"
CHECKSUMS DISABLE; -- set state to "Off"Don't like this, please make it a GUC.
I'll see if you have ideas about how to resolve the problems with a GUC
that I mentioned above. But if not, then what about using a utility,
perhaps called pg_checksums? That way we wouldn't need new syntax.
I don't think so GUC are good for this purpouse, but I don't like
single purpouse statements too.
what do you think about enhancing ALTER DATABASE statement
some like
ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS
Regards
Pavel
On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote:
I don't think so GUC are good for this purpouse, but I don't like
single purpouse statements too.what do you think about enhancing ALTER DATABASE statement
some like
ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS
Per-database does sound easier than per-table. I'd have to think about
how that would affect shared catalogs though.
For now, I'm leaning toward an offline utility to turn checksums on or
off, called pg_checksums. It could do so lazily (just flip a switch to
"enabling" in pg_control), or it could do so eagerly and turn it into a
fully-protected instance.
For the first patch, it might just be an initdb-time option for
simplicity.
Regards,
Jeff Davis
On 11/11/2012 05:52 PM, Jeff Davis wrote:
On Sun, 2012-11-11 at 21:20 +0100, Pavel Stehule wrote:
I don't think so GUC are good for this purpouse, but I don't like
single purpouse statements too.what do you think about enhancing ALTER DATABASE statement
some like
ALTER DATABASE name ENABLE CHECKSUMS and ALTER DATABASE name DISABLE CHECKSUMS
Per-database does sound easier than per-table. I'd have to think about
how that would affect shared catalogs though.For now, I'm leaning toward an offline utility to turn checksums on or
off, called pg_checksums. It could do so lazily (just flip a switch to
"enabling" in pg_control), or it could do so eagerly and turn it into a
fully-protected instance.For the first patch, it might just be an initdb-time option for
simplicity.
+1
I haven't followed this too closely, but I did wonder several days ago
why this wasn't being made an initdb-time decision.
cheers
andrew
On 11/11/12 2:56 PM, Jeff Davis wrote:
We could have a separate utility, pg_checksums, that can
alter the state and/or do an offline verification. And initdb would take
an option that would start everything out fully protected with
checksums.
Adding an initdb option to start out with everything checksummed seems
an uncontroversial good first thing to have available. It seems like a
proper 9.3 target to aim at even if per-table upgrading gets bogged down
in details. I have an argument below that the area between initdb and
per-table upgrades is fundamentally uncertain and therefore not worth
chasing after, based on reasons you already started to outline. There's
not much useful middle ground there.
Won't a pg_checksums program just grow until it looks like a limited
version of vacuum though? It's going to iterate over most of the table;
it needs the same cost controls as autovacuum (and to respect the load
of concurrent autovacuum work) to keep I/O under control; and those cost
control values might change if there's a SIGHUP to reload parameters.
It looks so much like vacuum that I think there needs to be a really
compelling reason to split it into something new. Why can't this be yet
another autovacuum worker that does its thing?
In order to get to the fully-protected state, you still need to
somehow make sure that all of the old data is checksummed.And the "fully protected" state is important in my opinion, because
otherwise we aren't protected against corrupt page headers that say
they have no checksum (even when it really should have a checksum).
I think it's useful to step back for a minute and consider the larger
uncertainty an existing relation has, which amplifies just how ugly this
situation is. The best guarantee I think online checksumming can offer
is to tell the user "after transaction id X, all new data in relation R
is known to be checksummed". Unless you do this at initdb time, any
conversion case is going to have the possibility that a page is
corrupted before you get to it--whether you're adding the checksum as
part of a "let's add them while we're writing anyway" page update or the
conversion tool is hitting it.
That's why I don't think anyone will find online conversion really
useful until they've done a full sweep updating the old pages. And if
you accept that, a flexible checksum upgrade utility, one that co-exists
with autovacuum activity costs, becomes a must.
One of the really common cases I was expecting here is that conversions
are done by kicking off a slow background VACUUM CHECKSUM job that might
run in pieces. I was thinking of an approach like this:
-Initialize a last_checked_block value for each table
-Loop:
--Grab the next block after the last checked one
--When on the last block of the relation, grab an exclusive lock to
protect against race conditions with extension
--If it's marked as checksummed and the checksum matches, skip it
---Otherwise, add a checksum and write it out
--When that succeeds, update last_checked_block
--If that was the last block, save some state saying the whole table is
checkedsummed
With that logic, there is at least a forward moving pointer that removes
the uncertainty around whether pages have been updated or not. It will
keep going usefully if interrupted too. One obvious this way this can
fail is if:
1) A late page in the relation is updated and a checksummed page written
2) The page is corrupted such that the "is this checksummed?" bits are
not consistent anymore, along with other damage to it
3) The conversion process gets to this page eventually
4) The corruption of (2) isn't detected
But I think that this possibility--that a page might get quietly
corrupted after checked once, but still in the middle of checking a
relation--is both impossible to remove and a red herring. How do we
know that this page of the relation wasn't corrupted on disk before we
even started? We don't, and we can't.
The only guarantee I see that we can give for online upgrades is that
after a VACUUM CHECKSUM sweep is done, and every page is known to both
have a valid checksum on it and have its checksum bits set, *then* any
page that doesn't have both set bits and a matching checksum is garbage.
Until reaching that point, any old data is suspect. The idea of
operating in an "we'll convert on write but never convert old pages"
can't come up with any useful guarantees about data integrity that I can
see. As you say, you don't ever gain the ability to tell pages that
were checksummed but have since been corrupted from ones that were
corrupt all along in that path.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
On 12/11/12 05:55, Greg Smith wrote:
The only guarantee I see that we can give for online upgrades is that
after a VACUUM CHECKSUM sweep is done, and every page is known to both
have a valid checksum on it and have its checksum bits set, *then* any
page that doesn't have both set bits and a matching checksum is
garbage. Until reaching that point, any old data is suspect. The
idea of operating in an "we'll convert on write but never convert old
pages" can't come up with any useful guarantees about data integrity
that I can see. As you say, you don't ever gain the ability to tell
pages that were checksummed but have since been corrupted from ones
that were corrupt all along in that path.
You're right about that, but I'd just like some rough guard against
hardware/OS related data corruption.
and that is more likely to hit data-blocks constantly flying in and out
of the system.
I'm currently running a +2TB database and the capabillity to just see
some kind of corruption earlier
rather than later is a major benefit by itself. Currently corruption can
go undetected if it just
happens to hit data-only parts of the database.
But I totally agree that the scheme described with integrating it into a
autovacuum process would
be very close to ideal, even on a database as the one I'l running.
--
Jesper
On 11/12/12 12:55 AM, Jesper Krogh wrote:
I'd just like some rough guard against
hardware/OS related data corruption.
and that is more likely to hit data-blocks constantly flying in and out
of the system.
I get that. I think that some of the design ideas floating around since
this feature was first proposed have been innovating in the hope of
finding a clever halfway point here. Ideally we'd be able to get online
checksum conversion and up running easily, reliably, and without adding
a lot of code. I have given up on that now though.
The approach of doing a heavy per-table conversion with more state
information than we'd like seems unavoidable, if you want to do it right
and allow people to (slowly but surely) reach a trustworthy state. I
think we should stop searching for a clever way around and just do slog
through doing it. I've resigned myself to that now, and recently set
aside a good block of time to beat my head against that particular wall
over the next couple of months.
But I totally agree that the scheme described with integrating it into a
autovacuum process would
be very close to ideal, even on a database as the one I'm running.
I am sadly all too familiar with how challenging it is to keep a 2TB
PostgreSQL database running reliably. One of my recent catch phrases
for talks is "if you have a big Postgres database, you also have a
vacuum problem". I think it's unreasonable to consider online
conversion solutions that don't recognize that, and allow coordinating
the work with the challenges of vacuuming larger systems too.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com