[PATCH] Add pg_disable_checksums() and supporting infrastructure

Started by David Christensenabout 9 years ago11 messageshackers
Jump to latest
#1David Christensen
david@endpoint.com

Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.

We do a few things:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).

- Add pg_control support for parsing/displaying the new setting.

- Distinguish between the possible checksum states and be specific about whether we care about
checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().

- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
cluster.

I have *not* changed the default in initdb to enable checksums, but this would be trivial.

Attachments:

disable-checksums.patchapplication/octet-stream; name=disable-checksums.patch; x-unix-mode=0644Download+169-29
#2Magnus Hagander
magnus@hagander.net
In reply to: David Christensen (#1)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com>
wrote:

Extracted from a larger patch, this patch provides the basic
infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control
fields to support the
necessary multiple states for handling the transitions.

We do a few things:

- Change "data_checksums" from a simple boolean to "data_checksum_state",
an enum type for all of
the potentially-required states for this feature (as well as enabling).

- Add pg_control support for parsing/displaying the new setting.

- Distinguish between the possible checksum states and be specific about
whether we care about
checksum read failures or write failures at all call sites, turning
DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().

- Create a superuser function pg_disable_checksums() to perform the actual
disabling of this in the
cluster.

I have *not* changed the default in initdb to enable checksums, but this
would be trivial.

Per the point made by somebody else (I think Simon?) on the other thread, I
think it also needs WAL support. Otherwise you turn it off on the master,
but it remains on on a replica which will cause failures once datablocks
without checksum starts replicating.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#3David Christensen
david@endpoint.com
In reply to: Magnus Hagander (#2)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:

Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support. Otherwise you turn it off on the master, but it remains on on a replica which will cause failures once datablocks without checksum starts replicating.

Good point; I’ll send a revision soon.
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4David Christensen
david@endpoint.com
In reply to: Magnus Hagander (#2)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com> wrote:
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.

We do a few things:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).

- Add pg_control support for parsing/displaying the new setting.

- Distinguish between the possible checksum states and be specific about whether we care about
checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().

- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
cluster.

I have *not* changed the default in initdb to enable checksums, but this would be trivial.

Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support. Otherwise you turn it off on the master, but it remains on on a replica which will cause failures once datablocks without checksum starts replicating.

Enclosed is a version which supports WAL logging and proper application of the checksum state change. I have verified it works with a replica as far as applying the updated data_checksum_state, though I had to manually call pg_switch_wal() on the master to get it to show up on the replica. (I don’t know if this is a flaw in my patch or just a natural consequence of testing on a low-volume local cluster.)

--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171

Attachments:

disable-checksums.patchapplication/octet-stream; name=disable-checksums.patch; x-unix-mode=0644Download+335-32
#5Robert Haas
robertmhaas@gmail.com
In reply to: David Christensen (#1)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).

Color me skeptical. I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?

If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.

However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default. Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable. I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work. I think it would be a lot more useful
if we had a way to *enable* checksums online. Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Christensen
david@endpoint.com
In reply to: Robert Haas (#5)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Feb 19, 2017, at 5:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:

- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).

Color me skeptical. I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?

If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.

However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default. Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable. I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work. I think it would be a lot more useful
if we had a way to *enable* checksums online. Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.

Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.

My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]

Best,

David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Christensen (#6)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On 2/19/17 11:02 AM, David Christensen wrote:

My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]

A couple notes:

- AFAIK unlogged tables get checksummed today if checksums are enabled;
the same should hold true if someone enables checksums on the whole cluster.

- Shared relations should be handled as well; you don't mention them.

- If an entire cluster is going to be considered as checksummed, then
even databases that don't allow connections would need to get enabled.

I like the idea of revalidation, but I'd suggest leaving that off of the
first pass.

It might be easier on a first pass to look at supporting per-database
checksums (in this case, essentially treating shared catalogs as their
own database). All normal backends do per-database stuff (such as
setting current_database) during startup anyway. That doesn't really
help for things like recovery and replication though. :/ And there's
still the question of SLRUs (or are those not checksum'd today??).

BTW, it occurs to me that this is related to the problem we have with
trying to make changes that break page binary compatibility. If we had a
method for handling that it would probably be useful for enabling
checksums as well. You'd essentially treat an un-checksum'd page as if
it was an "old page version". The biggest problem there is dealing with
the potential that the new page needs to be larger than the old one was,
but maybe there's some useful progress to be had in this area before
tackling the "page too small" problem.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8David Christensen
david@endpoint.com
In reply to: Jim Nasby (#7)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Feb 19, 2017, at 8:14 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

On 2/19/17 11:02 AM, David Christensen wrote:

My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]

A couple notes:

- AFAIK unlogged tables get checksummed today if checksums are enabled; the same should hold true if someone enables checksums on the whole cluster.

Agreed; AFAIK this should already be working if it’s using the PageIsVerified() API, since I just effectively modified the logic there, depending on state.

- Shared relations should be handled as well; you don't mention them.

I agree that they should be handled as well; I thought I had mentioned it later in the design doc, though TBH I’m not sure if there is more involved than just visiting the global relations in pg_class. In addition we need to visit all forks of each relation. I’m not certain if loading those into shared_buffers would be sufficient; I think so, but I’d be glad to be informed otherwise. (As long as they’re already using the PageIsVerified() API call they get this logic for free.

- If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.

Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.

I like the idea of revalidation, but I'd suggest leaving that off of the first pass.

Yeah, agreed.

It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).

So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.

What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.

BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.

I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure. That being said, what I could see being a general case is the piece which basically walks all pages in the cluster; as long as the page checksum/format validation happens at Page read time, we could do page version checks/conversions at the same time, and the only special code we’d need is to keep track of which files still need to be visited and how to minimize the impact on the cluster via some sort of throttling/leveling. (It’s also similar to vacuum in that way, however we have been going out of our way to make vacuum smart enough to *avoid* visiting every page, so I think it is a different enough use case that we can’t tie the two systems together, nor do I feel like taking that project on.)

We could call the checksum_cycle something else (page_walk_cycle? bikeshed time!) and basically have the infrastructure to initiate online/gradual conversion/processing of all pages for free.

Thoughts?

David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Christensen (#8)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On 2/20/17 11:22 AM, David Christensen wrote:

- If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.

Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.

The problem with ignoring datallowconn is any database where that's
false is fair game for CREATE DATABASE. I think just enforcing that
everything's connectable is good enough for now.

I like the idea of revalidation, but I'd suggest leaving that off of the first pass.

Yeah, agreed.

It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).

So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.

Something like that, yeah.

What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.

I'm specifically worried about the entire cluster not being complete.
That makes it harder for replicas to know what blocks they can and can't
verify the checksum on.

That *might* still be simpler than trying to handle converting the
entire cluster in one shot. If it's not simpler I certainly wouldn't do
it right now.

BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.

I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure.

Yeah, I was just mentioning it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: David Christensen (#6)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:

Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.

Ah, sorry, I had missed that patch.

My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]

I think the concept of a checksum cycle might be overkill. It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that. If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.

While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10. We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category. I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high. I would rather wait for v11 and have a little more time to try
to get everything right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11David Christensen
david@endpoint.com
In reply to: Robert Haas (#10)
Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

On Mar 9, 2017, at 1:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:

Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.

Ah, sorry, I had missed that patch.

My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com

I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]

I think the concept of a checksum cycle might be overkill. It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that. If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.

I hear what you are saying, however a boolean on pg_class/pg_database to show if the relation in question is insufficient if we allow arbitrary enabling/disabling while enabling is in progress. In particular, if we disable checksums while the enabling was in progress and had only a boolean to indicate whether the checksums are complete for a relation there will have been a window when new pages in a relation will *not* be checksummed but the system table flag will indicate that the checksum is up-to-date, which is false. This would lead to checksum failures when those pages are encountered. Similar failures will occur as well when doing a pg_upgrade of an in-progress enabling. Saying you can’t disable/cancel the checksum process while it’s running seems undesirable too, as what happens if you have a system failure mid-process.

We could certainly avoid this problem by just saying that we have to start over with the entire cluster and process every page from scratch rather than trying to preserve/track state that we know is good, perhaps only dirtying buffers which have a non-matching checksum while we’re in the conversion state, but this seems undesirable to me, plus if we made it work in a single session we’d have a long-running snapshot open (presumably) which would wreak havoc while it processes the entire database (as someone had suggested by doing it in a single function that just hangs while it’s running)

I think we still need a way to short-circuit the process/incrementally update and note which tables have been checksummed and which ones haven’t. (Perhaps I could make the code which currently checks DataChecksumsEnabled() a little smarter, depending on the relation it’s looking at.)

As per one of Jim’s suggestions, I’ve been looking at making the data_checkum_state localized per database at postinit time, but really it probably doesn’t matter to anything but the buffer code whether it’s a global setting, a per-database setting or what.

While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10. We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category. I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high. I would rather wait for v11 and have a little more time to try
to get everything right.

Makes sense, but to that end I think we need to have at least some agreement on how this should work ahead of time. Obviously it’s easier to critique existing code, but I don’t want to go out in left field of a particular approach is going to be obviously unworkable per some of the more experienced devs here.
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers