Speed up pg_checksums in cases where checksum already set

Started by Greg Sabino Mullanealmost 5 years ago15 messageshackers
Jump to latest
#1Greg Sabino Mullane
greg@turnstep.com

The attached patch makes an optimization to pg_checksums which prevents
rewriting the block if the checksum is already what we expect. This can
lead to much faster runs in cases where it is already set (e.g. enabled ->
disabled -> enable, external helper process, interrupted runs, future
parallel processes). There is also an effort to not sync the data directory
if no changes were written. Finally, added a bit more output on how many
files were actually changed, e.g.:

Checksum operation completed
Files scanned: 1236
Blocks scanned: 23283
Files modified: 38
Blocks modified: 19194
pg_checksums: syncing data directory
pg_checksums: updating control file
Checksums enabled in cluster

Cheers,
Greg

Attachments:

pg_checksums.optimize.writes.patchapplication/octet-stream; name=pg_checksums.optimize.writes.patchDownload+25-3
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Greg Sabino Mullane (#1)
Re: Speed up pg_checksums in cases where checksum already set

On Thu, May 27, 2021 at 5:24 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:

The attached patch makes an optimization to pg_checksums which prevents rewriting the block if the checksum is already what we expect. This can lead to much faster runs in cases where it is already set (e.g. enabled -> disabled -> enable, external helper process, interrupted runs, future parallel processes). There is also an effort to not sync the data directory if no changes were written. Finally, added a bit more output on how many files were actually changed, e.g.:

I don't know how often this will actually help as probably people
aren't toggling the checksum state that often, but it seems like a
good idea overall. The patch looks sensible to me.

#3Michael Paquier
michael@paquier.xyz
In reply to: Greg Sabino Mullane (#1)
Re: Speed up pg_checksums in cases where checksum already set

On Wed, May 26, 2021 at 05:23:55PM -0400, Greg Sabino Mullane wrote:

The attached patch makes an optimization to pg_checksums which prevents
rewriting the block if the checksum is already what we expect. This can
lead to much faster runs in cases where it is already set (e.g. enabled ->
disabled -> enable, external helper process, interrupted runs, future
parallel processes).

Makes sense.

There is also an effort to not sync the data directory
if no changes were written. Finally, added a bit more output on how many
files were actually changed, e.g.:

-    if (do_sync)
+    if (do_sync && total_files_modified)
     {
 	 pg_log_info("syncing data directory");
         fsync_pgdata(DataDir, PG_VERSION_NUM);

Here, I am on the edge. It could be an advantage to force a flush of
the data folder anyway, no? Say, all the pages have a correct
checksum and they are in the OS cache, but they may not have been
flushed yet. That would emulate what initdb -S does already.

Checksum operation completed
Files scanned: 1236
Blocks scanned: 23283
Files modified: 38
Blocks modified: 19194
pg_checksums: syncing data directory
pg_checksums: updating control file
Checksums enabled in cluster

The addition of the number of files modified looks like an advantage.
--
Michael

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Greg Sabino Mullane (#1)
Re: Speed up pg_checksums in cases where checksum already set

In one of the checksum patches, there was an understanding that the pages
should be written even if the checksum is correct, to handle replicas.

From the v19 patch:
https://www.postgresql.org/message-id/F7AFCFCD-8F77-4546-8D42-C7F675A4B680%40yesql.se
+                * Mark the buffer as dirty and force a full page write.  We have to
+                * re-write the page to WAL even if the checksum hasn't changed,
+                * because if there is a replica it might have a slightly different
+                * version of the page with an invalid checksum, caused by unlogged
+                * changes (e.g. hintbits) on the master happening while checksums
+                * were off. This can happen if there was a valid checksum on the page
+                * at one point in the past, so only when checksums are first on, then
+                * off, and then turned on again.

pg_checksums(1) says:

| When using a replication setup with tools which perform direct copies of relation file blocks (for example pg_rewind(1)), enabling or disabling checksums can lead to page
| corruptions in the shape of incorrect checksums if the operation is not done consistently across all nodes. When enabling or disabling checksums in a replication setup, it
| is thus recommended to stop all the clusters before switching them all consistently. Destroying all standbys, performing the operation on the primary and finally recreating
| the standbys from scratch is also safe.

Does your patch complicate things for the "stop all the clusters before
switching them all" case?

--
Justin

#5Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#4)
Re: Speed up pg_checksums in cases where checksum already set

On Wed, May 26, 2021 at 09:29:43PM -0500, Justin Pryzby wrote:

In one of the checksum patches, there was an understanding that the pages
should be written even if the checksum is correct, to handle replicas.

From the v19 patch:
https://www.postgresql.org/message-id/F7AFCFCD-8F77-4546-8D42-C7F675A4B680%40yesql.se
+                * Mark the buffer as dirty and force a full page write.  We have to
+                * re-write the page to WAL even if the checksum hasn't changed,
+                * because if there is a replica it might have a slightly different
+                * version of the page with an invalid checksum, caused by unlogged
+                * changes (e.g. hintbits) on the master happening while checksums
+                * were off. This can happen if there was a valid checksum on the page
+                * at one point in the past, so only when checksums are first on, then
+                * off, and then turned on again.

I am not really following the line of argument here. pg_checksums
relies on the fact that the cluster has been safely shut down before
running. So, if this comes to standbys, they would have reached a
consistent point, and the shutdown makes sure that all pages are
flushed.
--
Michael

#6Greg Sabino Mullane
greg@turnstep.com
In reply to: Michael Paquier (#3)
Re: Speed up pg_checksums in cases where checksum already set

Thanks for the quick replies, everyone.

On Wed, May 26, 2021 at 10:17 PM Michael Paquier <michael@paquier.xyz>
wrote:

-    if (do_sync)
+    if (do_sync && total_files_modified)
Here, I am on the edge.  It could be an advantage to force a flush of
the data folder anyway, no?

I was originally on the fence about including this as well, but it seems
like since the database is shut down and already in a consistent state,
there seems no advantage to syncing if we have not made any changes. Things
are no better or worse than when we arrived. However, the real-world use
case of running pg_checksums --enable and getting no changed blocks is
probably fairly rare, so if there is a strong objection, I'm happy
reverting to just (do_sync). (I'm not sure how cheap a sync is, I assume
it's low impact as the database is shut down, I guess it becomes a "might
as well while we are here"?)

On Wed, May 26, 2021 at 10:29 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

In one of the checksum patches, there was an understanding that the pages
should be written even if the checksum is correct, to handle replicas.
...
Does your patch complicate things for the "stop all the clusters before
switching them all" case?

I cannot imagine how it would, but, like Michael, I'm not really
understanding the reasoning here. We only run when safely shutdown, so no
WAL or dirty buffers need concern us :). Of course, once the postmaster is
up and running, fiddling with checksums becomes vastly more complicated, as
evidenced by that thread. I'm happy sticking to and speeding up the offline
version for now.

Cheers,
Greg

#7Michael Paquier
michael@paquier.xyz
In reply to: Greg Sabino Mullane (#6)
Re: Speed up pg_checksums in cases where checksum already set

On Thu, May 27, 2021 at 10:29:14AM -0400, Greg Sabino Mullane wrote:

I was originally on the fence about including this as well, but it seems
like since the database is shut down and already in a consistent state,
there seems no advantage to syncing if we have not made any changes. Things
are no better or worse than when we arrived. However, the real-world use
case of running pg_checksums --enable and getting no changed blocks is
probably fairly rare, so if there is a strong objection, I'm happy
reverting to just (do_sync). (I'm not sure how cheap a sync is, I assume
it's low impact as the database is shut down, I guess it becomes a "might
as well while we are here"?)

I understand that this should be rare, but I don't want to take any
bets either. With this patch, we could finish with cases where some
pages are still in the OS cache but don't get flushed because a
previous cancellation let the cluster in a state where all the page
checksums have been written out but a portion of the files were not
synced. A follow-up run of pg_checksums would see that all the pages
are correct, but would think that no sync is required, incorrectly.
--
Michael

#8Greg Sabino Mullane
greg@turnstep.com
In reply to: Michael Paquier (#7)
Re: Speed up pg_checksums in cases where checksum already set

Fair enough; thanks for the feedback. Attached is a new version that does
an unconditional sync (well, unless do_sync is false, a flag I am not
particularly fond of).

Cheers,
Greg

Attachments:

pg_checksums.optimize.writes.always.sync.patchapplication/octet-stream; name=pg_checksums.optimize.writes.always.sync.patchDownload+24-2
#9Greg Sabino Mullane
greg@turnstep.com
In reply to: Greg Sabino Mullane (#8)
Re: Speed up pg_checksums in cases where checksum already set

Newer version attach that adds a small documentation tweak as well.

Cheers,
Greg

Attachments:

pg_checksums.optimize.writes.and.always.sync.patchapplication/octet-stream; name=pg_checksums.optimize.writes.and.always.sync.patchDownload+26-3
#10Michael Paquier
michael@paquier.xyz
In reply to: Greg Sabino Mullane (#9)
Re: Speed up pg_checksums in cases where checksum already set

On Wed, Jun 02, 2021 at 05:09:36PM -0400, Greg Sabino Mullane wrote:

Newer version attach that adds a small documentation tweak as well.

-   enabling checksums, every file in the cluster is rewritten in-place.
+   enabling checksums, every file in the cluster with a changed checksum is
+   rewritten in-place.

This doc addition is a bit confusing, as it could mean that each file
has just one single checksum. We could be more precise, say:
"When enabling checksums, each relation file block with a changed
checksum is rewritten in place."

Should we also mention that the sync happens even if no blocks are
rewritten based on the reasoning of upthread (aka we'd better do the
final flush as an interrupted pg_checksums may let a portion of the
files as not flushed)?
--
Michael

#11Greg Sabino Mullane
greg@turnstep.com
In reply to: Michael Paquier (#10)
Re: Speed up pg_checksums in cases where checksum already set

On Fri, Jun 18, 2021 at 1:57 AM Michael Paquier <michael@paquier.xyz> wrote:

This doc addition is a bit confusing, as it could mean that each file
has just one single checksum. We could be more precise, say:
"When enabling checksums, each relation file block with a changed
checksum is rewritten in place."

Agreed, I like that wording. New patch attached.

Should we also mention that the sync happens even if no blocks are
rewritten based on the reasoning of upthread (aka we'd better do the
final flush as an interrupted pg_checksums may let a portion of the
files as not flushed)?

I don't know that we need to bother: the default is already to sync and one
has to go out of one's way using the -N argument to NOT sync, so I think
it's a pretty safe assumption to everyone (except those who read my first
version of my patch!) that syncing always happens.

Cheers,
Greg

Attachments:

003.pg_checksums.optimize.writes.and.always.sync.patchapplication/octet-stream; name=003.pg_checksums.optimize.writes.and.always.sync.patchDownload+26-3
#12Michael Paquier
michael@paquier.xyz
In reply to: Greg Sabino Mullane (#11)
Re: Speed up pg_checksums in cases where checksum already set

On Fri, Jun 18, 2021 at 08:01:17PM -0400, Greg Sabino Mullane wrote:

I don't know that we need to bother: the default is already to sync and one
has to go out of one's way using the -N argument to NOT sync, so I think
it's a pretty safe assumption to everyone (except those who read my first
version of my patch!) that syncing always happens.

Perhaps you are right to keep it simple. If people would like to
document that more precisely, it could always be changed if
necessary. What you have here sounds pretty much right to me.
--
Michael

#13Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#12)
Re: Speed up pg_checksums in cases where checksum already set

On Wed, Jun 23, 2021 at 09:39:37AM +0900, Michael Paquier wrote:

Perhaps you are right to keep it simple. If people would like to
document that more precisely, it could always be changed if
necessary. What you have here sounds pretty much right to me.

So, I was looking at this one today, and got confused with the name of
the counters once the patch was in place as this leads to having
things like "blocks" and "total_blocks_modified", which is a bit
confusing as "blocks" stands for the number of blocks scanned,
including new pages. I have simply suffixed "files" and "blocks" with
"_scanned" to be more consistent with the new counters that are named
"_written", giving the attached. We still need to have the per-file
counter in scan_file() with the global counters updated at the end of
a file scan for the sake of the file counter, of course.

Does that look fine to you?
--
Michael

Attachments:

004.pg_checksums.optimize.writes.and.always.sync.patchtext/x-diff; charset=us-asciiDownload+29-7
#14Greg Sabino Mullane
greg@turnstep.com
In reply to: Michael Paquier (#13)
Re: Speed up pg_checksums in cases where checksum already set

On Tue, Jun 29, 2021 at 2:59 AM Michael Paquier <michael@paquier.xyz> wrote:

Does that look fine to you?

Looks great, I appreciate the renaming.

Cheers,
Greg

#15Michael Paquier
michael@paquier.xyz
In reply to: Greg Sabino Mullane (#14)
Re: Speed up pg_checksums in cases where checksum already set

On Tue, Jun 29, 2021 at 09:10:30AM -0400, Greg Sabino Mullane wrote:

Looks great, I appreciate the renaming.

Applied, then.
--
Michael