pg_upgrade: Support for upgrading to checksums enabled
I'm reposting this patch in a separate thread so I can make a separate
commitfest entry for it. The previous discussion is mixed in with [0]/messages/by-id/CAKAnmmKwiMHik5AHmBEdf5vqzbOBbcwEPHo4-PioWeAbzwcTOQ@mail.gmail.com.
The purpose of this patch is to allow using pg_upgrade between clusters
that have different checksum settings. When upgrading between instances
with different checksum settings, the --copy (default) mode
automatically sets (or unsets) the checksum on the fly.
This would be particularly useful if we switched to enabling checksums
by default, as [0]/messages/by-id/CAKAnmmKwiMHik5AHmBEdf5vqzbOBbcwEPHo4-PioWeAbzwcTOQ@mail.gmail.com proposes, but it's also useful without that.
Some discussion points:
- We have added a bunch of different transfer modes to pg_upgrade in
order to give the user control over the expected file transfer
performance. Here, I have added this checksum rewriting to the existing
--copy mode, and I have measured about a 5% overhead. An alternative
would be to make this an explicit mode (--copy-slow,
--copy-and-make-adjustments).
- Windows has a separate code path in the --copy mode. I don't know the
reasons or advantages of that. So it's not clear how the checksum
rewriting mode should be handled in that case. We could switch to the
non-Windows code path in that case, but then the performance difference
between the normal path and the checksum-rewriting path is even more
unclear.
[0]: /messages/by-id/CAKAnmmKwiMHik5AHmBEdf5vqzbOBbcwEPHo4-PioWeAbzwcTOQ@mail.gmail.com
/messages/by-id/CAKAnmmKwiMHik5AHmBEdf5vqzbOBbcwEPHo4-PioWeAbzwcTOQ@mail.gmail.com
Attachments:
v1-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchtext/plain; charset=UTF-8; name=v1-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchDownload+38-17
Hi Peter,
I've applied and tested your patch, it works at least on MacOS with
meson build. A couple of thoughts about this patch inline below.
On Mon, Aug 26, 2024 at 8:23 AM Peter Eisentraut <peter@eisentraut.org> wrote:
The purpose of this patch is to allow using pg_upgrade between clusters
that have different checksum settings. When upgrading between instances
with different checksum settings, the --copy (default) mode
automatically sets (or unsets) the checksum on the fly.
I think the entire idea of this patch is great because it allows us to
remove an additional step in upgrade procedure, i.e. enabling
checksums before upgrade. A part about which I am not quite sure, is
"automatically". It is sufficient in most cases, but maybe also to
have an explicit flag would be a nice option as well.
in the patch itself:
- * We might eventually allow upgrades from checksum to no-checksum - * clusters. - */ - if (oldctrl->data_checksum_version == 0 && - newctrl->data_checksum_version != 0) - pg_fatal("old cluster does not use data checksums but the new one does"); - else if (oldctrl->data_checksum_version != 0 && - newctrl->data_checksum_version == 0) - pg_fatal("old cluster uses data checksums but the new one does not"); - else if (oldctrl->data_checksum_version != newctrl->data_checksum_version) - pg_fatal("old and new cluster pg_controldata checksum versions do not match"); + if (oldctrl->data_checksum_version != newctrl->data_checksum_version) + { + if (user_opts.transfer_mode != TRANSFER_MODE_COPY) + pg_fatal("when upgrading between clusters with different data checksum settings, transfer mode --copy must be used"); + }
I've tried to recall when I see the previous error message "old and
new cluster pg_controldata checksum versions do not match" at most. It
was almost always pg_upgrade with --link option, because we mostly use
pg_upgrade when copy is barely an option. Previous error message gave
a clear statement: go one step behind and enable checksums. The new
one gives imo a wrong message: "your only option now is to use copy".
Would it be better to polish wording in direction "when upgrading
between clusters with different data checksum settings, transfer mode
--copy must be used to enable checksum automatically" or "when
upgrading between clusters with different data checksum settings,
transfer mode --copy must be used or data checksum settings of the old
cluster must be changed manually before upgrade"?
Some discussion points:
- We have added a bunch of different transfer modes to pg_upgrade in
order to give the user control over the expected file transfer
performance. Here, I have added this checksum rewriting to the existing
--copy mode, and I have measured about a 5% overhead. An alternative
would be to make this an explicit mode (--copy-slow,
--copy-and-make-adjustments).
Maybe a separate -k flag to enable this behavior explicitly?
best regards,
Ilya
--
Ilya Kosmodemiansky
CEO, Founder
Data Egret GmbH
Your remote PostgreSQL DBA team
T.: +49 6821 919 3297
ik@dataegret.com
On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
The purpose of this patch is to allow using pg_upgrade between clusters that
have different checksum settings. When upgrading between instances with
different checksum settings, the --copy (default) mode automatically sets
(or unsets) the checksum on the fly.This would be particularly useful if we switched to enabling checksums by
default, as [0] proposes, but it's also useful without that.
Given enabling checksums can be rather expensive, I think it makes sense to
add a way to do it during pg_upgrade versus asking folks to run
pg_checksums separately. I'd anticipate arguments against enabling
checksums automatically, but as you noted, we can move it to a separate
option (e.g., --copy --enable-checksums). Disabling checksums with
pg_checksums is fast because it just updates pg_control, so I don't see any
need for --disable-checkums in pg_upgrade.
- Windows has a separate code path in the --copy mode. I don't know the
reasons or advantages of that. So it's not clear how the checksum rewriting
mode should be handled in that case. We could switch to the non-Windows
code path in that case, but then the performance difference between the
normal path and the checksum-rewriting path is even more unclear.
AFAICT the separate Windows path dates back to before pg_upgrade was first
added to the Postgres tree, and unfortunately I couldn't find any
discussion about it.
--
nathan
On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
The purpose of this patch is to allow using pg_upgrade between clusters that
have different checksum settings. When upgrading between instances with
different checksum settings, the --copy (default) mode automatically sets
(or unsets) the checksum on the fly.This would be particularly useful if we switched to enabling checksums by
default, as [0] proposes, but it's also useful without that.Given enabling checksums can be rather expensive, I think it makes sense to
add a way to do it during pg_upgrade versus asking folks to run
pg_checksums separately. I'd anticipate arguments against enabling
checksums automatically, but as you noted, we can move it to a separate
option (e.g., --copy --enable-checksums). Disabling checksums with
pg_checksums is fast because it just updates pg_control, so I don't see any
need for --disable-checkums in pg_upgrade.
The repercussions of either enabling or disabling checksums on
accident is quite high (not for pg_upgrade, but for $futureDBA), so
ISTM an explicit flag for BOTH is the right way to go. In that
scenario, pg_upgrade would check to make sure the clusters match and
then make the appropriate suggestion. In the case someone did
something like --enable-checksums and --link, again, we'd toss an
error that --copy mode is required to --enable-checksums.
Robert Treat
https://xzilla.net
On 21.02.25 00:41, Robert Treat wrote:
On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
The purpose of this patch is to allow using pg_upgrade between clusters that
have different checksum settings. When upgrading between instances with
different checksum settings, the --copy (default) mode automatically sets
(or unsets) the checksum on the fly.This would be particularly useful if we switched to enabling checksums by
default, as [0] proposes, but it's also useful without that.Given enabling checksums can be rather expensive, I think it makes sense to
add a way to do it during pg_upgrade versus asking folks to run
pg_checksums separately. I'd anticipate arguments against enabling
checksums automatically, but as you noted, we can move it to a separate
option (e.g., --copy --enable-checksums). Disabling checksums with
pg_checksums is fast because it just updates pg_control, so I don't see any
need for --disable-checkums in pg_upgrade.The repercussions of either enabling or disabling checksums on
accident is quite high (not for pg_upgrade, but for $futureDBA), so
ISTM an explicit flag for BOTH is the right way to go. In that
scenario, pg_upgrade would check to make sure the clusters match and
then make the appropriate suggestion. In the case someone did
something like --enable-checksums and --link, again, we'd toss an
error that --copy mode is required to --enable-checksums.
Here is an updated patch that works more along those lines. It adds a
pg_upgrade option --update-checksums, which activates the code to
rewrite the checksums. You must specify this option if the source and
target clusters have different checksum settings.
Note that this also works to hypothetically upgrade between future
different checksum versions (hence "--update-*", not "--enable-*").
Also, as the patch is currently written, it is also required to specify
this option to downgrade from checksums to no-checksums. (It will then
write a zero into the checksum place, as it would look if you had never
used checksums.) Also, you can optionally specify this option even if
the checksum settings are the same, then it will recalculate the
checksums. Probably not all of this is useful, but perhaps a subset of
it. Thoughts?
Also, I still don't know what to do about the Windows code path in
copyFile(). We could just not support this feature on Windows? Or
maybe the notionally correct thing to do would be to move that code into
copyFileByRange(). But then we'd need a different default on Windows
and it would require more documentation. I don't know what to do here
and I don't have enough context to make a suggestion. But if we don't
answer this, I don't think we can move ahead with this feature.
Attachments:
v2-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchtext/plain; charset=UTF-8; name=v2-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchDownload+49-17
On 11.03.25 11:42, Peter Eisentraut wrote:
Here is an updated patch that works more along those lines. It adds a
pg_upgrade option --update-checksums, which activates the code to
rewrite the checksums. You must specify this option if the source and
target clusters have different checksum settings.Note that this also works to hypothetically upgrade between future
different checksum versions (hence "--update-*", not "--enable-*").
Also, as the patch is currently written, it is also required to specify
this option to downgrade from checksums to no-checksums. (It will then
write a zero into the checksum place, as it would look if you had never
used checksums.) Also, you can optionally specify this option even if
the checksum settings are the same, then it will recalculate the
checksums. Probably not all of this is useful, but perhaps a subset of
it. Thoughts?Also, I still don't know what to do about the Windows code path in
copyFile(). We could just not support this feature on Windows? Or
maybe the notionally correct thing to do would be to move that code into
copyFileByRange(). But then we'd need a different default on Windows
and it would require more documentation. I don't know what to do here
and I don't have enough context to make a suggestion. But if we don't
answer this, I don't think we can move ahead with this feature.
I'm not sensing much enthusiasm for this feature or for working out the
remaining problems, so I'm closing this commitfest entry.
On Thu, Apr 3, 2025 at 3:25 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 11.03.25 11:42, Peter Eisentraut wrote:
Here is an updated patch that works more along those lines. It adds a
pg_upgrade option --update-checksums, which activates the code to
rewrite the checksums. You must specify this option if the source and
target clusters have different checksum settings.Note that this also works to hypothetically upgrade between future
different checksum versions (hence "--update-*", not "--enable-*").
Also, as the patch is currently written, it is also required to specify
this option to downgrade from checksums to no-checksums. (It will then
write a zero into the checksum place, as it would look if you had never
used checksums.) Also, you can optionally specify this option even if
the checksum settings are the same, then it will recalculate the
checksums. Probably not all of this is useful, but perhaps a subset of
it. Thoughts?Also, I still don't know what to do about the Windows code path in
copyFile(). We could just not support this feature on Windows? Or
maybe the notionally correct thing to do would be to move that code into
copyFileByRange(). But then we'd need a different default on Windows
and it would require more documentation. I don't know what to do here
and I don't have enough context to make a suggestion. But if we don't
answer this, I don't think we can move ahead with this feature.I'm not sensing much enthusiasm for this feature or for working out the
remaining problems, so I'm closing this commitfest entry.
That's unfortunate; I think there is enthusiasm for the feature, just
not enough to overcome the questions around Windows support.
Robert Treat
https://xzilla.net