WIP checksums patch
This is just a rebased version of the patch by Simon here:
http://archives.postgresql.org/message-id/CA
+U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g@mail.gmail.com
There are other things I'd like to do, like:
* include page# in checksum, and perhaps relfilenode or object OID (but
those two might cause difficulties)
* use TLI field instead of page version, if that is the consensus
(though it seems some may still disagree with that)
* we might want to make it slightly easier for external utilities, like
for backup/replication, to verify the pages
* optionally protect temporary files and local buffers
* come up with a robust way to determine that all pages in the database
are protected
* protect uninitialized pages
But not all of these will make the first patch. This is just a starting
point to reignite the discussion.
I welcome any feedback.
Regards,
Jeff Davis
On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
This is just a rebased version of the patch by Simon here:
http://archives.postgresql.org/message-id/CA
+U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g@mail.gmail.com
Now with attachment.
Regards,
Jeff Davis
Attachments:
checksums_rebase.patch.gzapplication/x-gzip; name=checksums_rebase.patch.gzDownload+979-340
On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
This is just a rebased version of the patch by Simon here:
I just noticed the following note in the docs for this patch:
The default is <literal>off</> for backwards compatibility and
to allow upgrade. The recommended setting is <literal>on</> though
this should not be enabled until upgrade is successfully complete
with full set of new backups.
I don't understand what that means -- if they have the page_checksums
GUC available, then surely upgrade is complete, right? And what is the
backwards-compatibility issue?
Also, it looks out of date, because the default in guc.c is set to true.
I think we should probably default to true, because it's safer and it
can always be disabled at runtime, but I don't have a strong opinion
about that.
Regards,
Jeff Davis
On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
* we might want to make it slightly easier for external utilities, like
for backup/replication, to verify the pages
Ideally, PageVerificationInfoOK should be available to external
utilities, so that someone might script a background job to verify
pages. Or, perhaps we should just include a new utility,
pg_verify_checksums?
Either way, where is a good place to put the function so that it's
shared between the server and the utility? In src/port somewhere?
Regards,
Jeff Davis
On Sun, Sep 30, 2012 at 07:09:20PM -0700, Jeff Davis wrote:
On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
This is just a rebased version of the patch by Simon here:
I just noticed the following note in the docs for this patch:
The default is <literal>off</> for backwards compatibility and
to allow upgrade. The recommended setting is <literal>on</> though
this should not be enabled until upgrade is successfully complete
with full set of new backups.I don't understand what that means -- if they have the page_checksums
GUC available, then surely upgrade is complete, right? And what is the
backwards-compatibility issue?Also, it looks out of date, because the default in guc.c is set to true.
I think we should probably default to true, because it's safer and it
can always be disabled at runtime, but I don't have a strong opinion
about that.
I think this need to clearly state "pg_upgrade", not a dump/restore
upgrade, which would be fine. It would be interesting to have
pg_upgrade change this setting, or tell the user to change it. I am not
sure enough people are using pg_upgrade to change a default value.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Mon, 2012-10-01 at 10:43 -0400, Bruce Momjian wrote:
The default is <literal>off</> for backwards compatibility and
to allow upgrade. The recommended setting is <literal>on</> though
this should not be enabled until upgrade is successfully complete
with full set of new backups.I don't understand what that means -- if they have the page_checksums
GUC available, then surely upgrade is complete, right? And what is the
backwards-compatibility issue?
I think this need to clearly state "pg_upgrade", not a dump/restore
upgrade, which would be fine. It would be interesting to have
pg_upgrade change this setting, or tell the user to change it. I am not
sure enough people are using pg_upgrade to change a default value.
I still don't understand why pg_upgrade and page_checksums don't mix.
Regards,
Jeff Davis
On Mon, Oct 1, 2012 at 09:25:43AM -0700, Jeff Davis wrote:
On Mon, 2012-10-01 at 10:43 -0400, Bruce Momjian wrote:
The default is <literal>off</> for backwards compatibility and
to allow upgrade. The recommended setting is <literal>on</> though
this should not be enabled until upgrade is successfully complete
with full set of new backups.I don't understand what that means -- if they have the page_checksums
GUC available, then surely upgrade is complete, right? And what is the
backwards-compatibility issue?I think this need to clearly state "pg_upgrade", not a dump/restore
upgrade, which would be fine. It would be interesting to have
pg_upgrade change this setting, or tell the user to change it. I am not
sure enough people are using pg_upgrade to change a default value.I still don't understand why pg_upgrade and page_checksums don't mix.
The heap/index files are copied unmodified from the old cluster, so
there are no checksums on the pages.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
The heap/index files are copied unmodified from the old cluster, so
there are no checksums on the pages.
That's fine though, the patch still reads the old format the same way,
and the pages are treated as though they have no checksum. How is that a
reason for defaulting the GUC to off?
I'm missing something here. Are we worried about users who turn the GUC
on and then expect all of their old data pages to magically be
protected?
Regards,
Jeff Davis
On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
This is just a rebased version of the patch by Simon here:
http://archives.postgresql.org/message-id/CA
+U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g@mail.gmail.com
Another thing I noticed about the design of this patch:
It looks like the checksum will usually be wrong in a backup block in
WAL, because it writes the backup block before calculating the checksum
(which isn't done until the time the block is written out).
I think that's OK, because it's still protected by the WAL CRC, and
there's no expectation that the checksum is correct in shared buffers,
and the correct checksum should be set on the next checkpoint. Just an
observation.
Regards,
Jeff Davis
On 1 October 2012 18:04, Jeff Davis <pgsql@j-davis.com> wrote:
On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
The heap/index files are copied unmodified from the old cluster, so
there are no checksums on the pages.That's fine though, the patch still reads the old format the same way,
and the pages are treated as though they have no checksum.
How is that a
reason for defaulting the GUC to off?
It's not.
Are we worried about users who turn the GUC
on and then expect all of their old data pages to magically be
protected?
Yes, but as you say, that is a separate consideration.
You are missing large parts of the previous thread, giving various
opinions on what the UI should look like for enabling checksums.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 1, 2012 at 10:04:09AM -0700, Jeff Davis wrote:
On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
The heap/index files are copied unmodified from the old cluster, so
there are no checksums on the pages.That's fine though, the patch still reads the old format the same way,
and the pages are treated as though they have no checksum. How is that a
reason for defaulting the GUC to off?
No idea then.
I'm missing something here. Are we worried about users who turn the GUC
on and then expect all of their old data pages to magically be
protected?
Perhaps we don't allow this to be turned per page, but rather per
cluster, and per-cluster would require the entire cluster to be
rewritten.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
I think that's OK, because it's still protected by the WAL CRC, and
there's no expectation that the checksum is correct in shared buffers,
and the correct checksum should be set on the next checkpoint. Just an
observation.
We'd need to document that emphatically. Otherwise folks running on ZFS
and/or FusionIO with atomic writes (and, in the future, BTRFS) will
assume that they can turn "full_page_writes" off and checksums on, and
clearly that won't work with the current code. I think that's an
acceptable limitation, I just think we need to document it carefully,
and maybe throw a warning if people start up in that configuration.
Perhaps we don't allow this to be turned per page, but rather per
cluster, and per-cluster would require the entire cluster to be
rewritten.
We dicussed this last year, and options which require a total rewrite of
the database in order to turn on the option were rejected as impractical
for users.
People did say it was desirable to have a manual option which says
"rewrite this entire table with checksums". However, that's not
required for the initial patch. For that matter, it will become
desirable to turn on checksums only for specific tables by they user
(again, future feature).
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Mon, 2012-10-01 at 18:14 +0100, Simon Riggs wrote:
You are missing large parts of the previous thread, giving various
opinions on what the UI should look like for enabling checksums.
I read through all of the discussion that I could find. There was quite
a lot, so perhaps I have forgotten pieces of it.
But that one section in the docs does look out of date and/or confusing
to me.
I remember there was discussion about a way to ensure that checksums are
set cluster-wide with some kind of special command (perhaps related to
VACUUM) and a magic file to let recovery know whether checksums are set
everywhere or not. That doesn't necessarily conflict with the GUC though
(the GUC could be a way to write checksums lazily, while this new
command could be a way to write checksums eagerly).
If some consensus was reached on the exact user interface, can you
please send me a link?
Regards,
Jeff Davis
On Monday, October 01, 2012 11:11 PM Jeff Davis wrote:
On Mon, 2012-10-01 at 18:14 +0100, Simon Riggs wrote:
You are missing large parts of the previous thread, giving various
opinions on what the UI should look like for enabling checksums.I read through all of the discussion that I could find. There was quite
a lot, so perhaps I have forgotten pieces of it.But that one section in the docs does look out of date and/or confusing
to me.I remember there was discussion about a way to ensure that checksums are
set cluster-wide with some kind of special command (perhaps related to
VACUUM) and a magic file to let recovery know whether checksums are set
everywhere or not. That doesn't necessarily conflict with the GUC though
(the GUC could be a way to write checksums lazily, while this new
command could be a way to write checksums eagerly).If some consensus was reached on the exact user interface, can you
please send me a link?
AFAICT complete consensus has not been reached but one of the discussions can be found on below link:
http://archives.postgresql.org/pgsql-hackers/2012-03/msg00279.php
Here Robert has given suggestions and then further there is more discussion based on that points.
According to me, the main points where more work for this patch is required as per previous discussions is as follows:
1. Performance impact of WAL log for hint-bits needs to be verified for scenario's other than pg_bench (Such as bulk data load (which I
feel there is some way to optimize, but I don't know if that’s part of this patch)).
2. How to put the information in Page header.
I think general direction is use pd_tli.
Storing whether page has checksum should be done or it needs to be maintained at table or db level is not decided.
3. User Interface for Checksum options.
4. Still not there is consensus about locking the buffer.
5. Any more which I have missed?
Apart from above, one of the concern raised by many members is that there should be page format upgrade infrastructure first
and then we should add thinking of checksums(http://archives.postgresql.org/pgsql-hackers/2012-02/msg01517.php).
The major point for upgrade is that it should be an online upgrade and
the problem which I think there is no clear solution yet is hot to ensure that a database will never have more than 2 page formats.
If the general consensus is we should do it without having upgrade, then I think we can pursue discussion about the main points listed above.
With Regards,
Amit Kapila.
On 10/1/12 12:22 PM, Josh Berkus wrote:
Perhaps we don't allow this to be turned per page, but rather per
cluster, and per-cluster would require the entire cluster to be
rewritten.We dicussed this last year, and options which require a total rewrite of
the database in order to turn on the option were rejected as impractical
for users.
For whatever it's worth... we (and presumably others) still use londiste (or Slony) as our upgrade path, so we could tolerate a cluster-wide setting. We'd just set it when building new clusters via londiste and forget about it.
So I'd rather see this get in at a cluster level than not make it at all while we wait for something better.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On 10/1/12 12:22 PM, Josh Berkus wrote:
Perhaps we don't allow this to be turned per page, but rather per
cluster, and per-cluster would require the entire cluster to be
rewritten.We dicussed this last year, and options which require a total rewrite of
the database in order to turn on the option were rejected as impractical
for users.For whatever it's worth... we (and presumably others) still use londiste
(or Slony) as our upgrade path, so we could tolerate a cluster-wide
setting. We'd just set it when building new clusters via londiste and
forget about it.So I'd rather see this get in at a cluster level than not make it at all
while we wait for something better.
Some still do dump/restore between major releases, so there it is also
applicable. (and preferrable).
I do know a lot of things had been discussed last year, an API I could
easily imagine would work:
ALTER TABLE <tablename> SET ENABLE_CHECKSUMMING=YES
(would make the server do the checksums on all writes on table+indices
going forward, here, verification of the checksums would still be enabled,
but only on "already checksummed pages" ).
ALTER TABLE <tablename> set VERIFY_CHECKSUM=YES
(would cause the server to scan table and index, and rewrite the parts
that hadn't an updated with a checksum already.
Perhaps the names are not well-chosen, but it is based on the assumptions
that check-summing overhead is measurable and it is thereby desireable to
be able to dis/enable it on a per-table level.
Then I could let my system go 6-12 months between the above two commands
and the amount of rewrite would hopefully not require a rewrite of the
entire 2.5TB of PGDATA.
Jesper
--
Jesper Krogh
On Mon, Oct 29, 2012 at 4:31 PM, Jim Nasby <jim@nasby.net> wrote:
For whatever it's worth... we (and presumably others) still use londiste (or
Slony) as our upgrade path, so we could tolerate a cluster-wide setting.
We'd just set it when building new clusters via londiste and forget about
it.So I'd rather see this get in at a cluster level than not make it at all
while we wait for something better.
Yeah. I definitely think that we could shed an enormous amount of
complexity by deciding that this is, for now, an option that can only
be selected at initdb time. That would remove approximately 85% of
everything I've ever disliked about this patch - without, I think,
precluding the possibility of improving things later.
It also occurred to me that another way to reduce the scope of this
change would be to have a first version that does CRCs only for SLRU
pages. That would be useful for verifying the integrity of some of
our most critical data (pg_clog) and be a useful building block toward
a more complete implementation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Nov 5, 2012 at 12:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 29, 2012 at 4:31 PM, Jim Nasby <jim@nasby.net> wrote:
For whatever it's worth... we (and presumably others) still use londiste
(or
Slony) as our upgrade path, so we could tolerate a cluster-wide setting.
We'd just set it when building new clusters via londiste and forget about
it.So I'd rather see this get in at a cluster level than not make it at all
while we wait for something better.Yeah. I definitely think that we could shed an enormous amount of
complexity by deciding that this is, for now, an option that can only
be selected at initdb time. That would remove approximately 85% of
everything I've ever disliked about this patch - without, I think,
precluding the possibility of improving things later.
I see one thing to be concerned about, there...
I imagine it would not be a totally happy thing if the only way to switch
it on/off was to use Slony or Londiste to replicate into a database with
the opposite setting. (e.g. - This implies that built-in replication may
only replicate into a database with the identical checksum configuration.)
It's not outrageous for it to be a pretty heavyweight operation to switch
polarities, but there's such a thing as too heavy.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
On Thu, Nov 8, 2012 at 9:17 PM, Christopher Browne <cbbrowne@gmail.com> wrote:
I see one thing to be concerned about, there...
I imagine it would not be a totally happy thing if the only way to switch it
on/off was to use Slony or Londiste to replicate into a database with the
opposite setting. (e.g. - This implies that built-in replication may only
replicate into a database with the identical checksum configuration.)
Sure, I agree. I don't think it should stay that way forever, but
removing the burden of dealing with this issue from the initial commit
would likely allow that commit to happen this release cycle, perhaps
even in the next CommitFest. And then we'd have half a loaf, which is
better than none, and we could deal with the issues of switching it on
and off as a further enhancement.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, 2012-11-05 at 12:19 -0500, Robert Haas wrote:
Yeah. I definitely think that we could shed an enormous amount of
complexity by deciding that this is, for now, an option that can only
be selected at initdb time. That would remove approximately 85% of
everything I've ever disliked about this patch - without, I think,
precluding the possibility of improving things later.
That's certainly true, but it introduces one large problem: upgrading
would not work, which (in the past few releases) we've treated as a
major showstopper for many features.
If there is really no other good way to do it, then that might be
reasonable. But it seems within grasp to at least offer an offline way
to set checksums.
It also occurred to me that another way to reduce the scope of this
change would be to have a first version that does CRCs only for SLRU
pages. That would be useful for verifying the integrity of some of
our most critical data (pg_clog) and be a useful building block toward
a more complete implementation.
That also breaks upgrade, right?
Regards,
Jeff Davis