Enable data checksums by default
Please find attached a patch to enable data checksums by default.
Currently, initdb only enables data checksums if passed the
--data-checksums or -k argument. There was some hesitation years ago when
this feature was first added, leading to the current situation where the
default is off. However, many years later, there is wide consensus that
this is an extraordinarily safe, desirable setting. Indeed, most (if not
all) of the major commercial and open source Postgres systems currently
turn this on by default. I posit you would be hard-pressed to find many
systems these days in which it has NOT been turned on. So basically we have
a de-facto standard, and I think it's time we flipped the switch to make it
on by default.
The patch is simple enough: literally flipping the boolean inside of
initdb.c, and adding a new argument '--no-data-checksums' for those
instances that truly want the old behavior. One place that still needs the
old behavior is our internal tests for pg_checksums and pg_amcheck, so I
added a new argument to init() in PostgreSQL/Test/Cluster.pm to allow those
to still pass their tests.
This is just the default - people are still more than welcome to turn it
off with the new flag. The pg_checksums program is another option that
actually argues for having the default "on", as switching to "off" once
initdb has been run is trivial.
Yes, I am aware of the previous discussions on this, but the world moves
fast - wal compression is better than in the past, vacuum is better now,
and data-checksums being on is such a complete default in the wild, it
feels weird and a disservice that we are not running all our tests like
that.
Cheers,
Greg
Attachments:
0001-Make-initdb-enable-data-checksums-by-default.patchapplication/x-patch; name=0001-Make-initdb-enable-data-checksums-by-default.patchDownload
From 12ce067f5ba64414d1d14c5f2e763d04cdfacd13 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Tue, 6 Aug 2024 18:18:56 -0400
Subject: [PATCH] Make initdb enable data checksums by default
---
doc/src/sgml/ref/initdb.sgml | 4 ++-
src/bin/initdb/initdb.c | 6 ++++-
src/bin/initdb/t/001_initdb.pl | 31 +++++++++++++++++------
src/bin/pg_amcheck/t/003_check.pl | 2 +-
src/bin/pg_amcheck/t/004_verify_heapam.pl | 2 +-
src/bin/pg_checksums/t/002_actions.pl | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 7 +++++
7 files changed, 41 insertions(+), 13 deletions(-)
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index bdd613e77f..511f489d34 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -267,12 +267,14 @@ PostgreSQL documentation
<para>
Use checksums on data pages to help detect corruption by the
I/O system that would otherwise be silent. Enabling checksums
- may incur a noticeable performance penalty. If set, checksums
+ may incur a small performance penalty. If set, checksums
are calculated for all objects, in all databases. All checksum
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
See <xref linkend="checksums" /> for details.
+ As of version 18, checksums are enabled by default. They can be
+ disabled by use of <option>--no-data-checksums</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a015..ce7d3e99e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
static bool do_sync = true;
static bool sync_only = false;
static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
@@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
{"waldir", required_argument, NULL, 'X'},
{"wal-segsize", required_argument, NULL, 12},
{"data-checksums", no_argument, NULL, 'k'},
+ {"no-data-checksums", no_argument, NULL, 20},
{"allow-group-access", no_argument, NULL, 'g'},
{"discard-caches", no_argument, NULL, 14},
{"locale-provider", required_argument, NULL, 15},
@@ -3319,6 +3320,9 @@ main(int argc, char *argv[])
if (!parse_sync_method(optarg, &sync_method))
exit(1);
break;
+ case 20:
+ data_checksums = false;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 06a35ac0b7..64395ec531 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
}
}
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
command_like(
[ 'pg_controldata', $datadir ],
- qr/Data page checksum version:.*0/,
- 'checksums are disabled in control file');
-# pg_checksums fails with checksums disabled by default. This is
-# not part of the tests included in pg_checksums to save from
-# the creation of an extra instance.
-command_fails([ 'pg_checksums', '-D', $datadir ],
- "pg_checksums fails with data checksum disabled");
+ qr/Data page checksum version:.*1/,
+ 'checksums are enabled in control file');
command_ok([ 'initdb', '-S', $datadir ], 'sync only');
command_fails([ 'initdb', $datadir ], 'existing data directory');
@@ -268,4 +263,24 @@ ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+# Test the no-data-checksums flag
+
+my $datadir_nochecksums = "$tempdir/data_no_checksums";
+
+command_ok([ 'initdb', '--no-data-checksums', $datadir_nochecksums ],
+ 'successful creation without data checksums');
+
+# Control file should tell that data checksums are disabled.
+command_like(
+ [ 'pg_controldata', $datadir_nochecksums ],
+ qr/Data page checksum version:.*0/,
+ 'checksums are disabled in control file');
+
+# pg_checksums fails with checksums disabled. This is
+# not part of the tests included in pg_checksums to save from
+# the creation of an extra instance.
+command_fails(
+ [ 'pg_checksums', '-D', $datadir_nochecksums ],
+ "pg_checksums fails with data checksum disabled");
+
done_testing();
diff --git a/src/bin/pg_amcheck/t/003_check.pl b/src/bin/pg_amcheck/t/003_check.pl
index 4b16bda6a4..a45b14662f 100644
--- a/src/bin/pg_amcheck/t/003_check.pl
+++ b/src/bin/pg_amcheck/t/003_check.pl
@@ -120,7 +120,7 @@ sub perform_all_corruptions()
# Test set-up
$node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->start;
$port = $node->port;
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index f6d2c5f787..2c17f7d068 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -181,7 +181,7 @@ my $aborted_xid;
# autovacuum workers visiting the table could crash the backend.
# Disable autovacuum so that won't happen.
my $node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->append_conf('postgresql.conf', 'max_prepared_transactions=10');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 33e7fb53c5..c136febbbb 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -88,7 +88,7 @@ sub check_relation_corruption
# Initialize node with checksums disabled.
my $node = PostgreSQL::Test::Cluster->new('node_checksum');
-$node->init();
+$node->init(no_checksums => 1);
my $pgdata = $node->data_dir;
# Control file should know that checksums are disabled.
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 32ee98aebc..632a2c9ebc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -504,6 +504,8 @@ On Windows, we use SSPI authentication to ensure the same (by pg_regress
WAL archiving can be enabled on this node by passing the keyword parameter
has_archiving => 1. This is disabled by default.
+Data checksums can be forced off by passing no_checksums => 1.
+
postgresql.conf can be set up for replication by passing the keyword
parameter allows_streaming => 'logical' or 'physical' (passing 1 will also
suffice for physical replication) depending on type of replication that
@@ -530,6 +532,11 @@ sub init
$params{force_initdb} = 0 unless defined $params{force_initdb};
$params{has_archiving} = 0 unless defined $params{has_archiving};
+ if (defined $params{no_checksums})
+ {
+ push @{ $params{extra} }, '--no-data-checksums';
+ }
+
my $initdb_extra_opts_env = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
if (defined $initdb_extra_opts_env)
{
--
Hi,
On Tue, Aug 06, 2024 at 06:46:52PM -0400, Greg Sabino Mullane wrote:
Please find attached a patch to enable data checksums by default.
Currently, initdb only enables data checksums if passed the
--data-checksums or -k argument. There was some hesitation years ago when
this feature was first added, leading to the current situation where the
default is off. However, many years later, there is wide consensus that
this is an extraordinarily safe, desirable setting. Indeed, most (if not
all) of the major commercial and open source Postgres systems currently
turn this on by default. I posit you would be hard-pressed to find many
systems these days in which it has NOT been turned on. So basically we have
a de-facto standard, and I think it's time we flipped the switch to make it
on by default.
[...]
Yes, I am aware of the previous discussions on this, but the world moves
fast - wal compression is better than in the past, vacuum is better now,
and data-checksums being on is such a complete default in the wild, it
feels weird and a disservice that we are not running all our tests like
that.
I agree.
Some review on the patch:
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index bdd613e77f..511f489d34 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -267,12 +267,14 @@ PostgreSQL documentation <para> Use checksums on data pages to help detect corruption by the I/O system that would otherwise be silent. Enabling checksums - may incur a noticeable performance penalty. If set, checksums + may incur a small performance penalty. If set, checksums are calculated for all objects, in all databases. All checksum
I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
covered elsewhere in the documentation (I just looked at the patch), but
if not, it probably should be added here as a word of caution.
failures will be reported in the <link linkend="monitoring-pg-stat-database-view"> <structname>pg_stat_database</structname></link> view. See <xref linkend="checksums" /> for details. + As of version 18, checksums are enabled by default. They can be + disabled by use of <option>--no-data-checksums</option>.
I think we usually do not mention when a feature was added/changed, do
we? So I'd just write "(default: enabled)" or whatever is the style of
the surrounding options.
</para> </listitem> </varlistentry> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index f00718a015..ce7d3e99e5 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -164,7 +164,7 @@ static bool noinstructions = false; static bool do_sync = true; static bool sync_only = false; static bool show_setting = false; -static bool data_checksums = false; +static bool data_checksums = true; static char *xlog_dir = NULL; static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024); static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; @@ -3121,6 +3121,7 @@ main(int argc, char *argv[]) {"waldir", required_argument, NULL, 'X'}, {"wal-segsize", required_argument, NULL, 12}, {"data-checksums", no_argument, NULL, 'k'}, + {"no-data-checksums", no_argument, NULL, 20},
Does it make sense to add -K (capital k) as a short-cut for this? I
think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
maybe that is not wider project policy.
Michael
On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:
I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
covered elsewhere in the documentation (I just looked at the patch), but
if not, it probably should be added here as a word of caution.
Yeah, that seems something beyond this patch? Certainly we should mention
wal_compression in the release notes if the default changes. I mean, I feel
wal_log_hints should probably default to on as well, but I've honestly
never really given it much thought because my fingers are trained to type
"initdb -k". I've been using data checksums for roughly a decade now. I
think the only time I've NOT used checksums was when I was doing checksum
overhead measurements, or hacking on the pg_checksums program.
I think we usually do not mention when a feature was added/changed, do
we? So I'd just write "(default: enabled)" or whatever is the style of
the surrounding options.
+1
+ {"no-data-checksums", no_argument, NULL, 20},
Does it make sense to add -K (capital k) as a short-cut for this? I
think this is how we distinguish on/off for pg_dump (-t/-T etc.) but
maybe that is not wider project policy.
I'd rather not. Better to keep it explicit rather than some other weird
letter that has no mnemonic value.
Cheers,
Greg
On 07.08.24 00:46, Greg Sabino Mullane wrote:
Currently, initdb only enables data checksums if passed the
--data-checksums or -k argument. There was some hesitation years ago
when this feature was first added, leading to the current situation
where the default is off. However, many years later, there is wide
consensus that this is an extraordinarily safe, desirable setting.
Indeed, most (if not all) of the major commercial and open source
Postgres systems currently turn this on by default. I posit you would be
hard-pressed to find many systems these days in which it has NOT been
turned on. So basically we have a de-facto standard, and I think it's
time we flipped the switch to make it on by default.
I'm sympathetic to this proposal, but I want to raise some concerns.
My understanding was that the reason for some hesitation about adopting
data checksums was the performance impact. Not the checksumming itself,
but the overhead from hint bit logging. The last time I looked into
that, you could get performance impacts on the order of 5% tps. Maybe
that's acceptable, and you of course can turn it off if you want the
extra performance. But I think this should be discussed in this thread.
About the claim that it's already the de-facto standard. Maybe that is
approximately true for "serious" installations. But AFAICT, the popular
packagings don't enable checksums by default, so there is likely a
significant middle tier between "just trying it out" and serious
production use that don't have it turned on.
For those uses, this change would render pg_upgrade useless for upgrades
from an old instance with default settings to a new instance with
default settings. And then users would either need to re-initdb with
checksums turned back off, or I suppose run pg_checksums on the old
instance before upgrading? This is significant additional complication.
And packagers who have built abstractions on top of pg_upgrade (such
as Debian pg_upgradecluster) would also need to implement something to
manage this somehow.
So I think we need to think through the upgrade experience a bit more.
Unfortunately, pg_checksums hasn't gotten to the point that we were
perhaps once hoping for that you could enable checksums on a live
system. I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.
On Thu, Aug 08, 2024 at 12:11:38PM +0200, Peter Eisentraut wrote:
So I think we need to think through the upgrade experience a bit more.
Unfortunately, pg_checksums hasn't gotten to the point that we were perhaps
once hoping for that you could enable checksums on a live system. I'm
thinking pg_upgrade could have a mode where it adds the checksum during the
upgrade as it copies the files (essentially a subset of pg_checksums). I
think that would be useful for that middle tier of users who just want a
good default experience.
Well that, or, as a first less ambitious step, pg_upgrade could carry
over the data_checksums setting from the old to the new instance by
essentially disabling it via pg_checksums -d (which is fast) if it the
current default (off) is set on the old instance and the new instance
was created with the new onw (checksums on).
Probably should include a warning or something in that case, though I
guess a lot of users will read just past it. But at least they are not
worse off than before.
Michael
On 8 Aug 2024, at 12:11, Peter Eisentraut <peter@eisentraut.org> wrote:
My understanding was that the reason for some hesitation about adopting data checksums was the performance impact. Not the checksumming itself, but the overhead from hint bit logging. The last time I looked into that, you could get performance impacts on the order of 5% tps. Maybe that's acceptable, and you of course can turn it off if you want the extra performance. But I think this should be discussed in this thread.
That's been my experience as well, the overhead of the checksumming is
negligible but the overhead in WAL can be (having hint bits WAL logged does
carry other benefits as well to be fair).
I think we need to think through the upgrade experience a bit more.
+1
Unfortunately, pg_checksums hasn't gotten to the point that we were perhaps once hoping for that you could enable checksums on a live system.
I don't recall there being any work done (or plans for) using pg_checksums on a
live system. Anyone interested in enabling checksums on a live cluster can
however review the patch for that in:
/messages/by-id/E07A611B-9CF3-4FDB-8CE8-A221E39040EC@yesql.se
I'm thinking pg_upgrade could have a mode where it adds the checksum during the upgrade as it copies the files (essentially a subset of pg_checksums). I think that would be useful for that middle tier of users who just want a good default experience.
As a side-note, I implemented this in pg_upgrade at Greenplum (IIRC it was
submitted to -hackers at the time as well) and it worked well with not a lot of
code.
--
Daniel Gustafsson
Thank you for the feedback. Please find attached three separate patches.
One to add a new flag to initdb (--no-data-checksums), one to adjust the
tests to use this flag as needed, and the final to make the actual switch
of the default value (along with tests and docs).
Cheers,
Greg
Attachments:
0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patchapplication/octet-stream; name=0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patchDownload
From 4039a5ee4ee302d6b24c9decec14988b63b2be48 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Thu, 8 Aug 2024 12:20:03 -0400
Subject: [PATCH 1/3] Add new initdb argument "--no-data-checksums" to force
checksums off.
---
src/bin/initdb/initdb.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a015..8ead7780f5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -3121,6 +3121,7 @@ main(int argc, char *argv[])
{"waldir", required_argument, NULL, 'X'},
{"wal-segsize", required_argument, NULL, 12},
{"data-checksums", no_argument, NULL, 'k'},
+ {"no-data-checksums", no_argument, NULL, 20},
{"allow-group-access", no_argument, NULL, 'g'},
{"discard-caches", no_argument, NULL, 14},
{"locale-provider", required_argument, NULL, 15},
@@ -3319,6 +3320,9 @@ main(int argc, char *argv[])
if (!parse_sync_method(optarg, &sync_method))
exit(1);
break;
+ case 20:
+ data_checksums = false;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
--
2.30.2
0002-Allow-tests-to-force-checksums-off-when-calling-init.patchapplication/octet-stream; name=0002-Allow-tests-to-force-checksums-off-when-calling-init.patchDownload
From 7729d5f7dc580f8f78088d55af93a82c3d218a77 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Thu, 8 Aug 2024 12:26:51 -0400
Subject: [PATCH 2/3] Allow tests to force checksums off when calling init()
Adjust tests that need non-checksums clusters to work.
---
src/bin/pg_amcheck/t/003_check.pl | 2 +-
src/bin/pg_amcheck/t/004_verify_heapam.pl | 2 +-
src/bin/pg_checksums/t/002_actions.pl | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 7 +++++++
4 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_amcheck/t/003_check.pl b/src/bin/pg_amcheck/t/003_check.pl
index 4b16bda6a4..a45b14662f 100644
--- a/src/bin/pg_amcheck/t/003_check.pl
+++ b/src/bin/pg_amcheck/t/003_check.pl
@@ -120,7 +120,7 @@ sub perform_all_corruptions()
# Test set-up
$node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->start;
$port = $node->port;
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index f6d2c5f787..2c17f7d068 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -181,7 +181,7 @@ my $aborted_xid;
# autovacuum workers visiting the table could crash the backend.
# Disable autovacuum so that won't happen.
my $node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->append_conf('postgresql.conf', 'max_prepared_transactions=10');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 33e7fb53c5..c136febbbb 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -88,7 +88,7 @@ sub check_relation_corruption
# Initialize node with checksums disabled.
my $node = PostgreSQL::Test::Cluster->new('node_checksum');
-$node->init();
+$node->init(no_checksums => 1);
my $pgdata = $node->data_dir;
# Control file should know that checksums are disabled.
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 32ee98aebc..632a2c9ebc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -504,6 +504,8 @@ On Windows, we use SSPI authentication to ensure the same (by pg_regress
WAL archiving can be enabled on this node by passing the keyword parameter
has_archiving => 1. This is disabled by default.
+Data checksums can be forced off by passing no_checksums => 1.
+
postgresql.conf can be set up for replication by passing the keyword
parameter allows_streaming => 'logical' or 'physical' (passing 1 will also
suffice for physical replication) depending on type of replication that
@@ -530,6 +532,11 @@ sub init
$params{force_initdb} = 0 unless defined $params{force_initdb};
$params{has_archiving} = 0 unless defined $params{has_archiving};
+ if (defined $params{no_checksums})
+ {
+ push @{ $params{extra} }, '--no-data-checksums';
+ }
+
my $initdb_extra_opts_env = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
if (defined $initdb_extra_opts_env)
{
--
2.30.2
0003-Change-initdb-to-default-to-using-data-checksums.patchapplication/octet-stream; name=0003-Change-initdb-to-default-to-using-data-checksums.patchDownload
From 8860636a97f49cbca88d45d26ce6955f9610460d Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Thu, 8 Aug 2024 12:32:17 -0400
Subject: [PATCH 3/3] Change initdb to default to using data checksums.
Also adjust the documentation and tests.
---
doc/src/sgml/ref/initdb.sgml | 4 +++-
src/bin/initdb/initdb.c | 2 +-
src/bin/initdb/t/001_initdb.pl | 32 ++++++++++++++++++++++++--------
3 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index bdd613e77f..2bd6e05df1 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -267,12 +267,14 @@ PostgreSQL documentation
<para>
Use checksums on data pages to help detect corruption by the
I/O system that would otherwise be silent. Enabling checksums
- may incur a noticeable performance penalty. If set, checksums
+ may incur a small performance penalty. If set, checksums
are calculated for all objects, in all databases. All checksum
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
See <xref linkend="checksums" /> for details.
+ Data checksums are enabled by default. They can be
+ disabled by use of <option>--no-data-checksums</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 8ead7780f5..ce7d3e99e5 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
static bool do_sync = true;
static bool sync_only = false;
static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 06a35ac0b7..7345e66665 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
}
}
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
command_like(
[ 'pg_controldata', $datadir ],
- qr/Data page checksum version:.*0/,
- 'checksums are disabled in control file');
-# pg_checksums fails with checksums disabled by default. This is
-# not part of the tests included in pg_checksums to save from
-# the creation of an extra instance.
-command_fails([ 'pg_checksums', '-D', $datadir ],
- "pg_checksums fails with data checksum disabled");
+ qr/Data page checksum version:.*1/,
+ 'checksums are enabled in control file');
command_ok([ 'initdb', '-S', $datadir ], 'sync only');
command_fails([ 'initdb', $datadir ], 'existing data directory');
@@ -268,4 +263,25 @@ ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+# Test the no-data-checksums flag
+
+my $datadir_nochecksums = "$tempdir/data_no_checksums";
+
+command_ok([ 'initdb', '--no-data-checksums', $datadir_nochecksums ],
+ 'successful creation without data checksums');
+
+# Control file should tell that data checksums are disabled.
+command_like(
+ [ 'pg_controldata', $datadir_nochecksums ],
+ qr/Data page checksum version:.*0/,
+ 'checksums are disabled in control file');
+
+# pg_checksums fails with checksums disabled. This is
+# not part of the tests included in pg_checksums to save from
+# the creation of an extra instance.
+command_fails(
+ [ 'pg_checksums', '-D', $datadir_nochecksums ],
+ "pg_checksums fails with data checksum disabled");
+
+
done_testing();
--
2.30.2
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
About the claim that it's already the de-facto standard. Maybe that is
approximately true for "serious" installations. But AFAICT, the popular
packagings don't enable checksums by default, so there is likely a
significant middle tier between "just trying it out" and serious
production use that don't have it turned on.
+1.
I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.
That would be very nice.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 8/8/24 19:42, Robert Haas wrote:
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
About the claim that it's already the de-facto standard. Maybe that is
approximately true for "serious" installations. But AFAICT, the popular
packagings don't enable checksums by default, so there is likely a
significant middle tier between "just trying it out" and serious
production use that don't have it turned on.+1.
I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.That would be very nice.
Yeah, but it might also disable checksums on the new cluster, which
would work for link mode too. So we'd probably want multiple modes, one
to enable checksums during file copy, one to disable checksums, and one
to just fail for incompatible clusters.
--
Tomas Vondra
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org>
wrote:
My understanding was that the reason for some hesitation about adopting
data checksums was the performance impact. Not the checksumming itself,
but the overhead from hint bit logging. The last time I looked into that,
you could get performance impacts on the order of 5% tps. Maybe that's
acceptable, and you of course can turn it off if you want the extra
performance. But I think this should be discussed in this thread.
Fair enough. I think the performance impact is acceptable, as evidenced by
the large number of people that turn it on. And it is easy enough to turn
it off again, either via --no-data-checksums or pg_checksums --disable.
I've come across people who have regretted not throwing a -k into their
initial initdb, but have not yet come across someone who has the opposite
regret. When I did some measurements some time ago, I found numbers much
less than 5%, but of course it depends on a lot of factors.
About the claim that it's already the de-facto standard. Maybe that is
approximately true for "serious" installations. But AFAICT, the popular
packagings don't enable checksums by default, so there is likely a
significant middle tier between "just trying it out" and serious
production use that don't have it turned on.
I would push back on that "significant" a good bit. The number of Postgres
installations in the cloud is very likely to dwarf the total package
installations. Maybe not 10 years ago, but now? Maybe someone from Amazon
can share some numbers. Not that we have any way to compare against package
installs :) But anecdotally the number of people who mention RDS etc. on
the various fora has exploded.
For those uses, this change would render pg_upgrade useless for upgrades
from an old instance with default settings to a new instance with default
settings. And then users would either need to re-initdb with checksums
turned back off, or I suppose run pg_checksums on the old instance before
upgrading? This is significant additional complication.
Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
And packagers who have built abstractions on top of pg_upgrade (such as
Debian pg_upgradecluster) would also need to implement something to manage
this somehow.
How does it deal with clusters with checksums enabled now?
I'm thinking pg_upgrade could have a mode where it adds the checksum
during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.
Hm...might be a bad experience if it forces a switch out of --link mode.
Perhaps a warning at the end of pg_upgrade that suggests running
pg_checksums on your new cluster if you want to enable checksums?
Cheers,
Greg
On Tue, Aug 13, 2024 at 10:42 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it on. And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable.
When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of factors.
I think the bad case is when you have a write workload that is
significantly bigger than shared_buffers but still small enough to fit
comfortably in the OS cache. When everything fits in shared_buffers,
you only need to write dirty buffers once per checkpoint cycle, so
making it more expensive isn't necessarily a big deal. When you're
constantly going to disk, that's so expensive that you don't notice
the computational overhead. But when you're in that middle zone where
you keep evicting buffers from PG but not actually having to write
them down to the disk, then I think it's pretty noticeable.
I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who has the opposite regret.
I don't think this is really a fair comparison, because everything
being a little slower all the time is not something that people are
likely to "regret" in the same way that they regret it when a data
corruption issue goes undetected. An undetected data corruption issue
is a single, very painful event that people are likely to notice,
whereas a small performance loss over time kind of blends into the
background. You don't really regret that kind of thing in the same way
that you regret a bad event that happens at a particular moment in
time.
And it's not like we have statistics anywhere that you can look at to
see how much CPU time you spent computing checksums, so if a user DOES
have a performance problem that would not have occurred if checksums
had been disabled, they'll probably never know it.
For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings to a new instance with default settings. And then users would either need to re-initdb with checksums turned back off, or I suppose run pg_checksums on the old instance before upgrading? This is significant additional complication.
Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
I tend to agree with that, but I would also like to see the sort of
improvements that Peter mentions. It's a lot less work to say "let's
just change the default" and then get mad at anyone who disagrees than
it is to do the engineering to make changing the default less of a
problem. But that kind of engineering really adds a lot of value
compared to just changing the default.
None of that is to say that I'm totally hostile to this change.
Checksums don't actually prevent your data from getting corrupted, or
let you recover it after it does. They just tell you about the
problem, and very often you would have found out anyway. However, they
do have peace-of-mind value. If you've got checksums turned on, you
can verify your checksums regularly and see that they're OK, and
people like that. Whether that's worth the overhead for everyone, I'm
not quite sure.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 08.08.24 19:42, Robert Haas wrote:
I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.That would be very nice.
Here is a demo patch for that. It turned out to be quite simple.
I wrote above about a separate mode for that (like
--copy-and-make-adjustments), but it was just as easy to stick it into
the existing --copy mode.
It would be useful to check what the performance overhead of this is
versus a copy that does not have to make adjustments. I expect it's
very little.
A drawback is that as written this does not work on Windows, because
Windows uses a different code path in copyFile(). I don't know the
reasons for that. But it would need to be figured out.
Attachments:
v0-0001-pg_upgrade-support-for-upgrading-to-checksums-ena.patchtext/plain; charset=UTF-8; name=v0-0001-pg_upgrade-support-for-upgrading-to-checksums-ena.patchDownload
From 306c827c876378ddc128b030b8838a8e811743f2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 15 Aug 2024 08:27:54 +0200
Subject: [PATCH v0] pg_upgrade support for upgrading to checksums enabled
---
src/bin/pg_upgrade/controldata.c | 17 +++++------------
src/bin/pg_upgrade/file.c | 21 ++++++++++++++++++++-
src/bin/pg_upgrade/pg_upgrade.h | 3 ++-
src/bin/pg_upgrade/relfilenumber.c | 2 +-
4 files changed, 28 insertions(+), 15 deletions(-)
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 854c6887a23..16d0b84b337 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -693,18 +693,11 @@ check_control_data(ControlData *oldctrl,
* check_for_isn_and_int8_passing_mismatch().
*/
- /*
- * We might eventually allow upgrades from checksum to no-checksum
- * clusters.
- */
- if (oldctrl->data_checksum_version == 0 &&
- newctrl->data_checksum_version != 0)
- pg_fatal("old cluster does not use data checksums but the new one does");
- else if (oldctrl->data_checksum_version != 0 &&
- newctrl->data_checksum_version == 0)
- pg_fatal("old cluster uses data checksums but the new one does not");
- else if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
- pg_fatal("old and new cluster pg_controldata checksum versions do not match");
+ if (user_opts.transfer_mode != TRANSFER_MODE_COPY)
+ {
+ if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
+ pg_fatal("when upgrading between clusters with different data checksum settings, transfer mode --copy must be used");
+ }
}
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 73932504cae..cef33b8f6eb 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -80,12 +80,14 @@ cloneFile(const char *src, const char *dst,
*/
void
copyFile(const char *src, const char *dst,
- const char *schemaName, const char *relName)
+ const char *schemaName, const char *relName,
+ int segno)
{
#ifndef WIN32
int src_fd;
int dest_fd;
char *buffer;
+ BlockNumber blocknum;
if ((src_fd = open(src, O_RDONLY | PG_BINARY, 0)) < 0)
pg_fatal("error while copying relation \"%s.%s\": could not open file \"%s\": %m",
@@ -101,6 +103,8 @@ copyFile(const char *src, const char *dst,
buffer = (char *) pg_malloc(COPY_BUF_SIZE);
+ blocknum = segno * old_cluster.controldata.largesz;
+
/* perform data copying i.e read src source, write to destination */
while (true)
{
@@ -113,6 +117,19 @@ copyFile(const char *src, const char *dst,
if (nbytes == 0)
break;
+ if (new_cluster.controldata.data_checksum_version == 1)
+ {
+ for (int i = 0; i < 50 && i * BLCKSZ < nbytes; i++)
+ {
+ uint16 csum;
+ char *block = buffer + i * BLCKSZ;
+ PageHeader header = (PageHeader) block;
+
+ csum = pg_checksum_page(block, blocknum + i);
+ header->pd_checksum = csum;
+ }
+ }
+
errno = 0;
if (write(dest_fd, buffer, nbytes) != nbytes)
{
@@ -122,6 +139,8 @@ copyFile(const char *src, const char *dst,
pg_fatal("error while copying relation \"%s.%s\": could not write file \"%s\": %m",
schemaName, relName, dst);
}
+
+ blocknum += 50;
}
pg_free(buffer);
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index cdb6e2b7597..c6a4ade53a7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -404,7 +404,8 @@ bool pid_lock_file_exists(const char *datadir);
void cloneFile(const char *src, const char *dst,
const char *schemaName, const char *relName);
void copyFile(const char *src, const char *dst,
- const char *schemaName, const char *relName);
+ const char *schemaName, const char *relName,
+ int segno);
void copyFileByRange(const char *src, const char *dst,
const char *schemaName, const char *relName);
void linkFile(const char *src, const char *dst,
diff --git a/src/bin/pg_upgrade/relfilenumber.c b/src/bin/pg_upgrade/relfilenumber.c
index 1d3054d78bd..2ae69d71159 100644
--- a/src/bin/pg_upgrade/relfilenumber.c
+++ b/src/bin/pg_upgrade/relfilenumber.c
@@ -250,7 +250,7 @@ transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_fro
case TRANSFER_MODE_COPY:
pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"",
old_file, new_file);
- copyFile(old_file, new_file, map->nspname, map->relname);
+ copyFile(old_file, new_file, map->nspname, map->relname, segno);
break;
case TRANSFER_MODE_COPY_FILE_RANGE:
pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\" with copy_file_range",
--
2.46.0
On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:
I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is
[..]
Yeah, that seems something beyond this patch? Certainly we should mention wal_compression in the release notes if the default changes. I mean, I feel wal_log_hints should probably default to on as well, but I've honestly never really given it much thought because my fingers are trained to type "initdb -k". I've been using data checksums for roughly a decade now. I think the only time I've NOT used checksums was when I was doing checksum overhead measurements, or hacking on the pg_checksums program.
Maybe I don't understand something, but just to be clear:
wal_compression (mentioned above) is not turning wal_log_hints on,
just the wal_log_hints needs to be on when using data checksums
(implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
was thinking about the wal_log_hints earlier (?)
-J.
Hi Greg and others
On Tue, Aug 13, 2024 at 4:42 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Thu, Aug 8, 2024 at 6:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
My understanding was that the reason for some hesitation about adopting data checksums was the performance impact. Not the checksumming itself, but the overhead from hint bit logging. The last time I looked into that, you could get performance impacts on the order of 5% tps. Maybe that's acceptable, and you of course can turn it off if you want the extra performance. But I think this should be discussed in this thread.
Fair enough. I think the performance impact is acceptable, as evidenced by the large number of people that turn it on. And it is easy enough to turn it off again, either via --no-data-checksums or pg_checksums --disable. I've come across people who have regretted not throwing a -k into their initial initdb, but have not yet come across someone who has the opposite regret. When I did some measurements some time ago, I found numbers much less than 5%, but of course it depends on a lot of factors.
Same here, and +1 to data_checksums=on by default for new installations.
The best public measurement of the impact was posted in [1]/messages/by-id/20190330192543.GH4719@development in 2019 by
Tomas to the best of my knowledge, where he explicitly mentioned the
problem with more WAL with hints/checksums: SATA disks (low IOPS). My
take: now we have 2024, and most people are using at least SSDs or
slow-SATA (but in cloud they could just change the class of I/O if
required to get IOPS to avoid too much throttling), therefore the
price of IOPS dropped significantly.
About the claim that it's already the de-facto standard. Maybe that is approximately true for "serious" installations. But AFAICT, the popular packagings don't enable checksums by default, so there is likely a significant middle tier between "just trying it out" and serious
production use that don't have it turned on.I would push back on that "significant" a good bit. The number of Postgres installations in the cloud is very likely to dwarf the total package installations. Maybe not 10 years ago, but now? Maybe someone from Amazon can share some numbers. Not that we have any way to compare against package installs :) But anecdotally the number of people who mention RDS etc. on the various fora has exploded.
Same here. If it helps the case the: 43% of all PostgreSQL DBs
involved in any support case or incident in EDB within last year had
data_checksums=on (at least if they had collected the data using our )
. That's a surprisingly high number (for something that's off by
default), and it makes me think this is because plenty of customers
are either managed by DBAs who care, or assisted by consultants when
deploying, or simply using TPAexec [2]https://www.enterprisedb.com/docs/pgd/4/deployments/tpaexec/ which has this on by default.
Another thing is plenty of people run with wal_log_hints=on (without
data_checksums=off) just to have pg_rewind working. As this is a
strictly standby related tool it means they don't have WAL/network
bandwidth problems, so the WAL rate is not that high in the wild to
cause problems. I found 1 or 2 cases within last year where we would
mention that high WAL generation was attributed to
wal_log_hints=on/XLOG_FPI and they still didn't disable it apparently
(we have plenty of cases related to too much WAL, but it's mostly due
to other basic reasons)
-J.
[1]: /messages/by-id/20190330192543.GH4719@development
[2]: https://www.enterprisedb.com/docs/pgd/4/deployments/tpaexec/
On Thu, Aug 15, 2024 at 09:49:04AM +0200, Jakub Wartak wrote:
On Wed, Aug 7, 2024 at 4:18 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Wed, Aug 7, 2024 at 4:43 AM Michael Banck <mbanck@gmx.net> wrote:
I think the last time we dicussed this the consensus was that
computational overhead of computing the checksums is pretty small for
most systems (so the above change seems warranted regardless of whether
we switch the default), but turning on wal_compression also turns on
wal_log_hints, which can increase WAL by quite a lot. Maybe this is[..]
Yeah, that seems something beyond this patch? Certainly we should
mention wal_compression in the release notes if the default changes.
I mean, I feel wal_log_hints should probably default to on as well,
but I've honestly never really given it much thought because my
fingers are trained to type "initdb -k". I've been using data
checksums for roughly a decade now. I think the only time I've NOT
used checksums was when I was doing checksum overhead measurements,
or hacking on the pg_checksums program.Maybe I don't understand something, but just to be clear:
wal_compression (mentioned above) is not turning wal_log_hints on,
just the wal_log_hints needs to be on when using data checksums
(implicitly, by the XLogHintBitIsNeeded() macro). I suppose Michael
was thinking about the wal_log_hints earlier (?)
Uh, I am pretty sure I meant to say "turning on data_checksums als turns
on wal_log_hints", sorry about the confusion.
I guess the connection is that if you turn on wal_lot_hints (either
directly or via data_checksums) then the number FPIs goes up (possibly
signficantly), and enabling wal_compression could (partly) remedy that.
But I agree with Greg that such a discussion is probably out-of-scope
for this default change.
Michael
Hi all,
On Tue, Aug 13, 2024 at 10:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
And it's not like we have statistics anywhere that you can look at to
see how much CPU time you spent computing checksums, so if a user DOES
have a performance problem that would not have occurred if checksums
had been disabled, they'll probably never know it.
In worst case, per second and per-pid CPU time consumption could be
quantified by having eBPF which is the standard on distros now
(requires kernel headers and bpfcc-tools installed), e.g. here 7918
was PID doing pgbench-related -c 4 workload with checksum=on (sorry
for formatting, but I don't want to use HTML here):
# funclatency-bpfcc --microseconds -i 1 -p 7918
/usr/lib/postgresql/16/bin/postgres:pg_checksum_page
Tracing 1 functions for
"/usr/lib/postgresql/16/bin/postgres:pg_checksum_page"... Hit Ctrl-C
to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 238 |************* |
4 -> 7 : 714 |****************************************|
8 -> 15 : 2 | |
16 -> 31 : 5 | |
32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 0 | |
256 -> 511 : 1 | |
512 -> 1023 : 1 | |
avg = 6 usecs, total: 6617 usecs, count: 962
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 241 |************* |
4 -> 7 : 706 |****************************************|
8 -> 15 : 11 | |
16 -> 31 : 10 | |
32 -> 63 : 1 | |
avg = 5 usecs, total: 5639 usecs, count: 969
[..refreshes every 1s here..]
So the above can tell us e.g. that this pg_checksum_page() took 5639
us out of 1s full sample time (and with 100% CPU pegged core so that's
gives again ~5% CPU util per this routine; I'm ignoring the WAL/log
hint impact for sure). One could also write a small script using
bpftrace instead, too. Disassembly on Debian version and stock PGDG is
telling me it's ful SSE2 instruction-set, so that's nice and optimal
too.
For those uses, this change would render pg_upgrade useless for upgrades from an old instance with default settings to a new instance with default settings. And then users would either need to re-initdb with checksums turned back off, or I suppose run pg_checksums on the old instance before upgrading? This is significant additional complication.
Meh, re-running initdb with --no-data-checksums seems a fairly low hurdle.
I tend to agree with that, but I would also like to see the sort of
improvements that Peter mentions.
[..]
None of that is to say that I'm totally hostile to this change.
[.,.]
Whether that's worth the overhead for everyone, I'm not quite sure.
Without data checksums there's a risk that someone receives silent-bit
corruption and no one will notice. Shouldn't integrity of data stand
above performance by default, in this case? (and performance could be
opt-in, if someone really wants it).
-J.
On 15.08.24 08:38, Peter Eisentraut wrote:
On 08.08.24 19:42, Robert Haas wrote:
I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.That would be very nice.
Here is a demo patch for that. It turned out to be quite simple.
I wrote above about a separate mode for that (like
--copy-and-make-adjustments), but it was just as easy to stick it into
the existing --copy mode.It would be useful to check what the performance overhead of this is
versus a copy that does not have to make adjustments. I expect it's
very little.A drawback is that as written this does not work on Windows, because
Windows uses a different code path in copyFile(). I don't know the
reasons for that. But it would need to be figured out.
Here is an updated patch for this. I simplified the logic a bit and
also handle the case where the read() reads less than a round number of
blocks. I did some performance testing. The overhead of computing the
checksums versus a straight --copy without checksum adjustments appears
to be around 5% wall clock time, which seems ok to me. I also looked
around the documentation to see if there is anything to update, but
didn't find anything.
I think if we can work out what to do on Windows, this could be a useful
little feature for facilitating $subject.
Attachments:
v1-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchtext/plain; charset=UTF-8; name=v1-0001-pg_upgrade-Support-for-upgrading-to-checksums-ena.patchDownload
From a050f2d857b2e5321b9e1110f6a41185d91c51ee Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Thu, 22 Aug 2024 07:56:04 +0200
Subject: [PATCH v1] pg_upgrade: Support for upgrading to checksums enabled
When upgrading between instances with different checksum settings, the
--copy (default) mode automatically sets (or unsets) the checksum on
the fly.
TODO: What to do about the Windows code path?
---
src/bin/pg_upgrade/controldata.c | 17 +++++-----------
src/bin/pg_upgrade/file.c | 31 +++++++++++++++++++++++++++++-
src/bin/pg_upgrade/pg_upgrade.h | 3 ++-
src/bin/pg_upgrade/relfilenumber.c | 2 +-
4 files changed, 38 insertions(+), 15 deletions(-)
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 854c6887a23..583cbf109cf 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -693,18 +693,11 @@ check_control_data(ControlData *oldctrl,
* check_for_isn_and_int8_passing_mismatch().
*/
- /*
- * We might eventually allow upgrades from checksum to no-checksum
- * clusters.
- */
- if (oldctrl->data_checksum_version == 0 &&
- newctrl->data_checksum_version != 0)
- pg_fatal("old cluster does not use data checksums but the new one does");
- else if (oldctrl->data_checksum_version != 0 &&
- newctrl->data_checksum_version == 0)
- pg_fatal("old cluster uses data checksums but the new one does not");
- else if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
- pg_fatal("old and new cluster pg_controldata checksum versions do not match");
+ if (oldctrl->data_checksum_version != newctrl->data_checksum_version)
+ {
+ if (user_opts.transfer_mode != TRANSFER_MODE_COPY)
+ pg_fatal("when upgrading between clusters with different data checksum settings, transfer mode --copy must be used");
+ }
}
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 73932504cae..0a16ad79237 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -80,12 +80,14 @@ cloneFile(const char *src, const char *dst,
*/
void
copyFile(const char *src, const char *dst,
- const char *schemaName, const char *relName)
+ const char *schemaName, const char *relName,
+ int segno)
{
#ifndef WIN32
int src_fd;
int dest_fd;
char *buffer;
+ BlockNumber blocknum;
if ((src_fd = open(src, O_RDONLY | PG_BINARY, 0)) < 0)
pg_fatal("error while copying relation \"%s.%s\": could not open file \"%s\": %m",
@@ -101,6 +103,8 @@ copyFile(const char *src, const char *dst,
buffer = (char *) pg_malloc(COPY_BUF_SIZE);
+ blocknum = segno * old_cluster.controldata.largesz;
+
/* perform data copying i.e read src source, write to destination */
while (true)
{
@@ -113,6 +117,31 @@ copyFile(const char *src, const char *dst,
if (nbytes == 0)
break;
+ /*
+ * Update checksums if upgrading between different settings
+ */
+ if (old_cluster.controldata.data_checksum_version != new_cluster.controldata.data_checksum_version)
+ {
+ /* must deal in whole blocks */
+ nbytes = nbytes / BLCKSZ * BLCKSZ;
+
+ for (int i = 0; i * BLCKSZ < nbytes; i++)
+ {
+ char *page = buffer + i * BLCKSZ;
+ PageHeader header = (PageHeader) page;
+
+ if (PageIsNew(page))
+ continue;
+
+ if (new_cluster.controldata.data_checksum_version == 1)
+ header->pd_checksum = pg_checksum_page(page, blocknum);
+ else
+ header->pd_checksum = 0;
+
+ blocknum++;
+ }
+ }
+
errno = 0;
if (write(dest_fd, buffer, nbytes) != nbytes)
{
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index cdb6e2b7597..c6a4ade53a7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -404,7 +404,8 @@ bool pid_lock_file_exists(const char *datadir);
void cloneFile(const char *src, const char *dst,
const char *schemaName, const char *relName);
void copyFile(const char *src, const char *dst,
- const char *schemaName, const char *relName);
+ const char *schemaName, const char *relName,
+ int segno);
void copyFileByRange(const char *src, const char *dst,
const char *schemaName, const char *relName);
void linkFile(const char *src, const char *dst,
diff --git a/src/bin/pg_upgrade/relfilenumber.c b/src/bin/pg_upgrade/relfilenumber.c
index 1d3054d78bd..2ae69d71159 100644
--- a/src/bin/pg_upgrade/relfilenumber.c
+++ b/src/bin/pg_upgrade/relfilenumber.c
@@ -250,7 +250,7 @@ transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_fro
case TRANSFER_MODE_COPY:
pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"",
old_file, new_file);
- copyFile(old_file, new_file, map->nspname, map->relname);
+ copyFile(old_file, new_file, map->nspname, map->relname, segno);
break;
case TRANSFER_MODE_COPY_FILE_RANGE:
pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\" with copy_file_range",
base-commit: 9bb842f95ef3384f0822c386a4c569780e613e4e
--
2.46.0
On Thu, Aug 22, 2024 at 8:11 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 15.08.24 08:38, Peter Eisentraut wrote:
On 08.08.24 19:42, Robert Haas wrote:
I'm thinking pg_upgrade could have a mode where it adds the
checksum during the upgrade as it copies the files (essentially a subset
of pg_checksums). I think that would be useful for that middle tier of
users who just want a good default experience.That would be very nice.
Here is a demo patch for that. It turned out to be quite simple.
I wrote above about a separate mode for that (like
--copy-and-make-adjustments), but it was just as easy to stick it into
the existing --copy mode.It would be useful to check what the performance overhead of this is
versus a copy that does not have to make adjustments. I expect it's
very little.A drawback is that as written this does not work on Windows, because
Windows uses a different code path in copyFile(). I don't know the
reasons for that. But it would need to be figured out.Here is an updated patch for this. I simplified the logic a bit and
also handle the case where the read() reads less than a round number of
blocks. I did some performance testing. The overhead of computing the
checksums versus a straight --copy without checksum adjustments appears
to be around 5% wall clock time, which seems ok to me. I also looked
around the documentation to see if there is anything to update, but
didn't find anything.I think if we can work out what to do on Windows, this could be a useful
little feature for facilitating $subject.
My take:
1. I wonder if we should or should not by default calculate/enable the
checksums when doing pg_upgrade --copy from cluster with
checksums=off. Maybe we should error on that like we are doing now.
There might be still people want to have them off, but they would use
the proposed-new-defaults-of-initdb with checksums on blindly (so this
should be opt-in via some switch like with let's say
--copy-and-enable-checksums; so the user is in full control).
2. WIN32's copyFile() could then stay as it is, and then that new
--copy-and-enable-checksums on WIN32 would have to fallback to classic
loop.
-J.
On 08.08.24 19:19, Greg Sabino Mullane wrote:
Thank you for the feedback. Please find attached three separate patches.
One to add a new flag to initdb (--no-data-checksums), one to adjust the
tests to use this flag as needed, and the final to make the actual
switch of the default value (along with tests and docs).
I think we can get started with the initdb --no-data-checksums option.
The 0001 patch is missing documentation and --help output for this
option. Also, some of the tests for the option that are in patch 0003
might be better in patch 0001.
Separately, this
- may incur a noticeable performance penalty. If set, checksums
+ may incur a small performance penalty. If set, checksums
should perhaps be committed separately. I don't think the patch 0003
really changes the performance penalty. ;-)
Rebased and reworked patches attached:
v2-0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patch
- Adds --no-data-checksums to initdb.c, adjusts --help and sgml docs, adds
simple tests
v2-0002-Allow-tests-to-force-checksums-off-when-calling-init.patch
- Adjusts the Perl tests to use the new flag as needed
v2-0003-Change-initdb-to-default-to-using-data-checksums.patch
- Flips the boolean to true, adjusts the tests to match it, tweaks docs
v2-0004-Tweak-docs-to-reduce-possible-impact-of-data-checksums.patch
- Changes "noticeable" penalty to "small" penalty
Cheers,
Greg
Attachments:
v2-0004-Tweak-docs-to-reduce-possible-impact-of-data-checksums.patchapplication/octet-stream; name=v2-0004-Tweak-docs-to-reduce-possible-impact-of-data-checksums.patchDownload
From 09b5fcfe7fb7dec0a89a16864535cbf1d9bab57d Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Fri, 23 Aug 2024 09:42:12 -0400
Subject: [PATCH v2 4/4] Tweak docs to reduce possible impact of data checksums
---
doc/src/sgml/ref/initdb.sgml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index afc6643412..e7343e9e39 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -267,7 +267,7 @@ PostgreSQL documentation
<para>
Use checksums on data pages to help detect corruption by the
I/O system that would otherwise be silent. Enabling checksums
- may incur a noticeable performance penalty. If set, checksums
+ may incur a small performance penalty. If set, checksums
are calculated for all objects, in all databases. All checksum
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
--
2.30.2
v2-0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patchapplication/octet-stream; name=v2-0001-Add-new-initdb-argument-no-data-checksums-to-force-checksums-off.patchDownload
From ccbbb268374074f767b6cb2daa7124ca7ec3df6b Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Fri, 23 Aug 2024 09:36:35 -0400
Subject: [PATCH v2 1/4] Add new initdb argument "--no-data-checksums" to force
checksums off
---
doc/src/sgml/ref/initdb.sgml | 9 +++++++++
src/bin/initdb/initdb.c | 5 +++++
src/bin/initdb/t/001_initdb.pl | 12 ++++++++++++
3 files changed, 26 insertions(+)
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index bdd613e77f..a854b32e54 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -277,6 +277,15 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry id="app-initdb-no-data-checksums" xreflabel="no data checksums">
+ <term><option>--no-data-checksums</option></term>
+ <listitem>
+ <para>
+ Prevents initdb from enabling data checksums.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="app-initdb-option-locale">
<term><option>--locale=<replaceable>locale</replaceable></option></term>
<listitem>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a015..1f36e59286 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2456,6 +2456,7 @@ usage(const char *progname)
printf(_(" --icu-locale=LOCALE set ICU locale ID for new databases\n"));
printf(_(" --icu-rules=RULES set additional ICU collation rules for new databases\n"));
printf(_(" -k, --data-checksums use data page checksums\n"));
+ printf(_(" --no-data-checksums do not use data page checksums\n"));
printf(_(" --locale=LOCALE set default locale for new databases\n"));
printf(_(" --lc-collate=, --lc-ctype=, --lc-messages=LOCALE\n"
" --lc-monetary=, --lc-numeric=, --lc-time=LOCALE\n"
@@ -3121,6 +3122,7 @@ main(int argc, char *argv[])
{"waldir", required_argument, NULL, 'X'},
{"wal-segsize", required_argument, NULL, 12},
{"data-checksums", no_argument, NULL, 'k'},
+ {"no-data-checksums", no_argument, NULL, 20},
{"allow-group-access", no_argument, NULL, 'g'},
{"discard-caches", no_argument, NULL, 14},
{"locale-provider", required_argument, NULL, 15},
@@ -3319,6 +3321,9 @@ main(int argc, char *argv[])
if (!parse_sync_method(optarg, &sync_method))
exit(1);
break;
+ case 20:
+ data_checksums = false;
+ break;
default:
/* getopt_long already emitted a complaint */
pg_log_error_hint("Try \"%s --help\" for more information.", progname);
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 06a35ac0b7..8072adb97f 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -268,4 +268,16 @@ ok($conf !~ qr/^WORK_MEM = /m, "WORK_MEM should not be configured");
ok($conf !~ qr/^Work_Mem = /m, "Work_Mem should not be configured");
ok($conf =~ qr/^work_mem = 512/m, "work_mem should be in config");
+# Test the no-data-checksums flag
+my $datadir_nochecksums = "$tempdir/data_no_checksums";
+
+command_ok([ 'initdb', '--no-data-checksums', $datadir_nochecksums ],
+ 'successful creation without data checksums');
+
+# Control file should tell that data checksums are disabled.
+command_like(
+ [ 'pg_controldata', $datadir_nochecksums ],
+ qr/Data page checksum version:.*0/,
+ 'checksums are disabled in control file');
+
done_testing();
--
2.30.2
v2-0003-Change-initdb-to-default-to-using-data-checksums.patchapplication/octet-stream; name=v2-0003-Change-initdb-to-default-to-using-data-checksums.patchDownload
From c0609d1af1748632532e7cd6d851744ee79a6385 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Fri, 23 Aug 2024 09:41:32 -0400
Subject: [PATCH v2 3/4] Change initdb to default to using data checksums.
Also adjust the documentation and tests.
---
doc/src/sgml/ref/initdb.sgml | 2 ++
src/bin/initdb/initdb.c | 2 +-
src/bin/initdb/t/001_initdb.pl | 18 ++++++++++--------
3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index a854b32e54..afc6643412 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -273,6 +273,8 @@ PostgreSQL documentation
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
See <xref linkend="checksums" /> for details.
+ Data checksums are enabled by default. They can be
+ disabled by use of <option>--no-data-checksums</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 1f36e59286..1b7d867934 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
static bool do_sync = true;
static bool sync_only = false;
static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 8072adb97f..7520d3d0dd 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
}
}
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
command_like(
[ 'pg_controldata', $datadir ],
- qr/Data page checksum version:.*0/,
- 'checksums are disabled in control file');
-# pg_checksums fails with checksums disabled by default. This is
-# not part of the tests included in pg_checksums to save from
-# the creation of an extra instance.
-command_fails([ 'pg_checksums', '-D', $datadir ],
- "pg_checksums fails with data checksum disabled");
+ qr/Data page checksum version:.*1/,
+ 'checksums are enabled in control file');
command_ok([ 'initdb', '-S', $datadir ], 'sync only');
command_fails([ 'initdb', $datadir ], 'existing data directory');
@@ -280,4 +275,11 @@ command_like(
qr/Data page checksum version:.*0/,
'checksums are disabled in control file');
+# pg_checksums fails with checksums disabled. This is
+# not part of the tests included in pg_checksums to save from
+# the creation of an extra instance.
+command_fails(
+ [ 'pg_checksums', '-D', $datadir_nochecksums ],
+ "pg_checksums fails with data checksum disabled");
+
done_testing();
--
2.30.2
v2-0002-Allow-tests-to-force-checksums-off-when-calling-init.patchapplication/octet-stream; name=v2-0002-Allow-tests-to-force-checksums-off-when-calling-init.patchDownload
From 261b804a96946f0a60b1cfafdc4cc0be8fc5fe45 Mon Sep 17 00:00:00 2001
From: Greg Sabino Mullane <greg@turnstep.com>
Date: Fri, 23 Aug 2024 09:37:59 -0400
Subject: [PATCH v2 2/4] Allow tests to force checksums off when calling init()
---
src/bin/pg_amcheck/t/003_check.pl | 2 +-
src/bin/pg_amcheck/t/004_verify_heapam.pl | 2 +-
src/bin/pg_checksums/t/002_actions.pl | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 7 +++++++
4 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_amcheck/t/003_check.pl b/src/bin/pg_amcheck/t/003_check.pl
index 4b16bda6a4..a45b14662f 100644
--- a/src/bin/pg_amcheck/t/003_check.pl
+++ b/src/bin/pg_amcheck/t/003_check.pl
@@ -120,7 +120,7 @@ sub perform_all_corruptions()
# Test set-up
$node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->start;
$port = $node->port;
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index f6d2c5f787..2c17f7d068 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -181,7 +181,7 @@ my $aborted_xid;
# autovacuum workers visiting the table could crash the backend.
# Disable autovacuum so that won't happen.
my $node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->append_conf('postgresql.conf', 'max_prepared_transactions=10');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 33e7fb53c5..c136febbbb 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -88,7 +88,7 @@ sub check_relation_corruption
# Initialize node with checksums disabled.
my $node = PostgreSQL::Test::Cluster->new('node_checksum');
-$node->init();
+$node->init(no_checksums => 1);
my $pgdata = $node->data_dir;
# Control file should know that checksums are disabled.
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index fe6ebf10f7..b8f08363d3 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -504,6 +504,8 @@ On Windows, we use SSPI authentication to ensure the same (by pg_regress
WAL archiving can be enabled on this node by passing the keyword parameter
has_archiving => 1. This is disabled by default.
+Data checksums can be forced off by passing no_checksums => 1.
+
postgresql.conf can be set up for replication by passing the keyword
parameter allows_streaming => 'logical' or 'physical' (passing 1 will also
suffice for physical replication) depending on type of replication that
@@ -530,6 +532,11 @@ sub init
$params{force_initdb} = 0 unless defined $params{force_initdb};
$params{has_archiving} = 0 unless defined $params{has_archiving};
+ if (defined $params{no_checksums})
+ {
+ push @{ $params{extra} }, '--no-data-checksums';
+ }
+
my $initdb_extra_opts_env = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
if (defined $initdb_extra_opts_env)
{
--
2.30.2
On Fri, Aug 23, 2024 at 03:17:17PM +0200, Peter Eisentraut wrote:
On 08.08.24 19:19, Greg Sabino Mullane wrote:
Thank you for the feedback. Please find attached three separate patches.
One to add a new flag to initdb (--no-data-checksums), one to adjust the
tests to use this flag as needed, and the final to make the actual
switch of the default value (along with tests and docs).I think we can get started with the initdb --no-data-checksums option.
The 0001 patch is missing documentation and --help output for this option.
Also, some of the tests for the option that are in patch 0003 might be
better in patch 0001.Separately, this
- may incur a noticeable performance penalty. If set, checksums + may incur a small performance penalty. If set, checksumsshould perhaps be committed separately. I don't think the patch 0003 really
changes the performance penalty. ;-)
I think "might" would be more precise than "may" above.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
In general, +1 for $SUBJECT.
printf(_(" -k, --data-checksums use data page checksums\n"));
+ printf(_(" --no-data-checksums do not use data page checksums\n"));
Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.
nitpick: these 4 patches are small enough that they could likely be
combined and committed together.
I think it's fair to say we should make the pg_upgrade experience nicer
once the default changes, but IMHO that needn't block actually changing the
default.
--
nathan
On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.
Hmmm, that is a good question. We have never (to my recollection) flipped a
default quite like this before. I'm inclined to leave it as "last one
wins", as I can see automated systems appending their desired selection to
the end of the arg list, and expecting it to work.
nitpick: these 4 patches are small enough that they could likely be
combined and committed together.
This was split per request upthread, which I do agree with.
I think it's fair to say we should make the pg_upgrade experience nicer
once the default changes, but IMHO that needn't block actually changing the
default.
+1
Cheers,
Greg
On 27.08.24 15:44, Greg Sabino Mullane wrote:
On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com
<mailto:nathandbossart@gmail.com>> wrote:Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.Hmmm, that is a good question. We have never (to my recollection)
flipped a default quite like this before. I'm inclined to leave it as
"last one wins", as I can see automated systems appending their desired
selection to the end of the arg list, and expecting it to work.
Yes, last option wins is the normal expected behavior.
On Tue, Aug 27, 2024 at 05:16:51PM +0200, Peter Eisentraut wrote:
On 27.08.24 15:44, Greg Sabino Mullane wrote:
On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com
<mailto:nathandbossart@gmail.com>> wrote:Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.Hmmm, that is a good question. We have never (to my recollection)
flipped a default quite like this before. I'm inclined to leave it as
"last one wins", as I can see automated systems appending their desired
selection to the end of the arg list, and expecting it to work.Yes, last option wins is the normal expected behavior.
WFM
001_verify_heapam fails with this patch set. I think you may need to use
--no-data-checksums in that test, too. Otherwise, it looks pretty good to
me.
--
nathan
On 27.08.24 17:26, Nathan Bossart wrote:
On Tue, Aug 27, 2024 at 05:16:51PM +0200, Peter Eisentraut wrote:
On 27.08.24 15:44, Greg Sabino Mullane wrote:
On Mon, Aug 26, 2024 at 3:46 PM Nathan Bossart <nathandbossart@gmail.com
<mailto:nathandbossart@gmail.com>> wrote:Should we error if both --data-checksum and --no-data-checksums are
specified? IIUC with 0001, we'll use whichever is specified last.Hmmm, that is a good question. We have never (to my recollection)
flipped a default quite like this before. I'm inclined to leave it as
"last one wins", as I can see automated systems appending their desired
selection to the end of the arg list, and expecting it to work.Yes, last option wins is the normal expected behavior.
WFM
001_verify_heapam fails with this patch set. I think you may need to use
--no-data-checksums in that test, too. Otherwise, it looks pretty good to
me.
I have committed 0001 (the new option) and 0004 (the docs tweak). I
think there is consensus for the rest, too, but I'll leave it for a few
more days to think about. I guess the test failure has to be addressed.
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I think
there is consensus for the rest, too, but I'll leave it for a few more days
to think about. I guess the test failure has to be addressed.
Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.
--
nathan
Attachments:
v3-0001-use-data-checksums-by-default.patchtext/plain; charset=us-asciiDownload
From 912894763686e29e907a5f5600e0caabca961dad Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Thu, 3 Oct 2024 16:01:19 -0500
Subject: [PATCH v3 1/1] use data checksums by default
---
contrib/amcheck/t/001_verify_heapam.pl | 2 +-
doc/src/sgml/ref/initdb.sgml | 2 ++
src/bin/initdb/initdb.c | 2 +-
src/bin/initdb/t/001_initdb.pl | 18 ++++++++++--------
src/bin/pg_amcheck/t/003_check.pl | 2 +-
src/bin/pg_amcheck/t/004_verify_heapam.pl | 2 +-
src/bin/pg_checksums/t/002_actions.pl | 2 +-
src/test/perl/PostgreSQL/Test/Cluster.pm | 7 +++++++
8 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 481e4dbe4f..b380d1b034 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -15,7 +15,7 @@ my $node;
# Test set-up
#
$node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->start;
$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index a0ce261999..f9f008888c 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -273,6 +273,8 @@ PostgreSQL documentation
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
See <xref linkend="checksums" /> for details.
+ Data checksums are enabled by default. They can be
+ disabled by use of <option>--no-data-checksums</option>.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index eac6cbd8e5..9eff46e799 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -164,7 +164,7 @@ static bool noinstructions = false;
static bool do_sync = true;
static bool sync_only = false;
static bool show_setting = false;
-static bool data_checksums = false;
+static bool data_checksums = true;
static char *xlog_dir = NULL;
static int wal_segment_size_mb = (DEFAULT_XLOG_SEG_SIZE) / (1024 * 1024);
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 8072adb97f..7520d3d0dd 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -69,16 +69,11 @@ mkdir $datadir;
}
}
-# Control file should tell that data checksums are disabled by default.
+# Control file should tell that data checksums are enabled by default.
command_like(
[ 'pg_controldata', $datadir ],
- qr/Data page checksum version:.*0/,
- 'checksums are disabled in control file');
-# pg_checksums fails with checksums disabled by default. This is
-# not part of the tests included in pg_checksums to save from
-# the creation of an extra instance.
-command_fails([ 'pg_checksums', '-D', $datadir ],
- "pg_checksums fails with data checksum disabled");
+ qr/Data page checksum version:.*1/,
+ 'checksums are enabled in control file');
command_ok([ 'initdb', '-S', $datadir ], 'sync only');
command_fails([ 'initdb', $datadir ], 'existing data directory');
@@ -280,4 +275,11 @@ command_like(
qr/Data page checksum version:.*0/,
'checksums are disabled in control file');
+# pg_checksums fails with checksums disabled. This is
+# not part of the tests included in pg_checksums to save from
+# the creation of an extra instance.
+command_fails(
+ [ 'pg_checksums', '-D', $datadir_nochecksums ],
+ "pg_checksums fails with data checksum disabled");
+
done_testing();
diff --git a/src/bin/pg_amcheck/t/003_check.pl b/src/bin/pg_amcheck/t/003_check.pl
index d99b094dba..c140b1622b 100644
--- a/src/bin/pg_amcheck/t/003_check.pl
+++ b/src/bin/pg_amcheck/t/003_check.pl
@@ -120,7 +120,7 @@ sub perform_all_corruptions()
# Test set-up
$node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->start;
$port = $node->port;
diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl
index f6d2c5f787..2c17f7d068 100644
--- a/src/bin/pg_amcheck/t/004_verify_heapam.pl
+++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl
@@ -181,7 +181,7 @@ my $aborted_xid;
# autovacuum workers visiting the table could crash the backend.
# Disable autovacuum so that won't happen.
my $node = PostgreSQL::Test::Cluster->new('test');
-$node->init;
+$node->init(no_checksums => 1);
$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->append_conf('postgresql.conf', 'max_prepared_transactions=10');
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 33e7fb53c5..c136febbbb 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -88,7 +88,7 @@ sub check_relation_corruption
# Initialize node with checksums disabled.
my $node = PostgreSQL::Test::Cluster->new('node_checksum');
-$node->init();
+$node->init(no_checksums => 1);
my $pgdata = $node->data_dir;
# Control file should know that checksums are disabled.
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 90a842f96a..748f21c4a4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -504,6 +504,8 @@ On Windows, we use SSPI authentication to ensure the same (by pg_regress
WAL archiving can be enabled on this node by passing the keyword parameter
has_archiving => 1. This is disabled by default.
+Data checksums can be forced off by passing no_checksums => 1.
+
postgresql.conf can be set up for replication by passing the keyword
parameter allows_streaming => 'logical' or 'physical' (passing 1 will also
suffice for physical replication) depending on type of replication that
@@ -530,6 +532,11 @@ sub init
$params{force_initdb} = 0 unless defined $params{force_initdb};
$params{has_archiving} = 0 unless defined $params{has_archiving};
+ if (defined $params{no_checksums})
+ {
+ push @{ $params{extra} }, '--no-data-checksums';
+ }
+
my $initdb_extra_opts_env = $ENV{PG_TEST_INITDB_EXTRA_OPTS};
if (defined $initdb_extra_opts_env)
{
--
2.39.5 (Apple Git-154)
On 3 Oct 2024, at 23:13, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I think
there is consensus for the rest, too, but I'll leave it for a few more days
to think about. I guess the test failure has to be addressed.Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.
+ Data checksums are enabled by default. They can be
+ disabled by use of <option>--no-data-checksums</option>.
Nitpick, but I think this should be an xref like how we link to --no-locale in
the -E docs: <xref linkend="app-initdb-no-data-checksums"/> instead.
LGTM otherwise.
--
Daniel Gustafsson
On 03.10.24 23:13, Nathan Bossart wrote:
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I think
there is consensus for the rest, too, but I'll leave it for a few more days
to think about. I guess the test failure has to be addressed.Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.
I have committed the test changes (patch 0002). (I renamed the option
to no_data_checksums to keep the wording consistent with the initdb option.)
Right now, with checksums off by default, this doesn't do much, but you
can test this like
PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
and everything will pass. To make that work, I had to adjust the order
of how the initdb options are assembled in Cluster.pm a bit.
I will work on the patch that flips the default next.
On 14.10.24 11:28, Peter Eisentraut wrote:
On 03.10.24 23:13, Nathan Bossart wrote:
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I
think
there is consensus for the rest, too, but I'll leave it for a few
more days
to think about. I guess the test failure has to be addressed.Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.I have committed the test changes (patch 0002). (I renamed the option
to no_data_checksums to keep the wording consistent with the initdb
option.)Right now, with checksums off by default, this doesn't do much, but you
can test this likePG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
and everything will pass. To make that work, I had to adjust the order
of how the initdb options are assembled in Cluster.pm a bit.I will work on the patch that flips the default next.
The patch that flips the default has been committed.
I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.
On 16.10.24 08:54, Peter Eisentraut wrote:
On 14.10.24 11:28, Peter Eisentraut wrote:
On 03.10.24 23:13, Nathan Bossart wrote:
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I
think
there is consensus for the rest, too, but I'll leave it for a few
more days
to think about. I guess the test failure has to be addressed.Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.I have committed the test changes (patch 0002). (I renamed the option
to no_data_checksums to keep the wording consistent with the initdb
option.)Right now, with checksums off by default, this doesn't do much, but
you can test this likePG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
and everything will pass. To make that work, I had to adjust the
order of how the initdb options are assembled in Cluster.pm a bit.I will work on the patch that flips the default next.
The patch that flips the default has been committed.
I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.
Ah yes, and the upgrade tests on the buildfarm don't like this. What
shall we do about this? Maybe adjust the buildfarm scripts to use
--no-data-checksums?
On 16 Oct 2024, at 09:49, Peter Eisentraut <peter@eisentraut.org> wrote:
Ah yes, and the upgrade tests on the buildfarm don't like this. What shall we do about this? Maybe adjust the buildfarm scripts to use --no-data-checksums?
I can't see many other options, and it represents a reasonable use-case, so +1.
--
Daniel Gustafsson
I have just noticed that since this patch was committed as 04bec894a04c,
pg_upgrade's "make check" action is unusable when given the
"olddump/oldinstall" options. We now need to inject '-k' to the initdb
line for old servers, and we don't, so all upgrade tests fail. I think
this patch should be enough to fix it.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
(Carlos Caszeli)
Attachments:
upgrade.patchtext/x-diff; charset=utf-8Download
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 9b51f9e666..27aa27bc8e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -180,6 +180,10 @@ if ($oldnode->pg_version >= 15)
}
}
+# Since checksums are now enabled by default, and weren't before 18,
+# pass '-k' to initdb on old versions so that upgrades work.
+push @initdb_params, '-k' if $oldnode->pg_version < 18;
+
$node_params{extra} = \@initdb_params;
$oldnode->init(%node_params);
$oldnode->start;
On 07.11.24 19:38, Alvaro Herrera wrote:
I have just noticed that since this patch was committed as 04bec894a04c,
pg_upgrade's "make check" action is unusable when given the
"olddump/oldinstall" options. We now need to inject '-k' to the initdb
line for old servers, and we don't, so all upgrade tests fail. I think
this patch should be enough to fix it.
Yes, this fix looks correct. (Or the other way around: Disable
checksums on the new node.)
Is this not being covered by the build farm? Are the upgrade tests
there not using this?
On 2024-Nov-13, Peter Eisentraut wrote:
On 07.11.24 19:38, Alvaro Herrera wrote:
I have just noticed that since this patch was committed as 04bec894a04c,
pg_upgrade's "make check" action is unusable when given the
"olddump/oldinstall" options. We now need to inject '-k' to the initdb
line for old servers, and we don't, so all upgrade tests fail. I think
this patch should be enough to fix it.Yes, this fix looks correct.
Thanks, pushed.
(Or the other way around: Disable checksums on the new node.)
Yeah, I thought about that too, but, I think it'd be less realistic,
because the world is become one where checksums are enabled, not the
other way around.
Is this not being covered by the build farm? Are the upgrade tests there
not using this?
Nope, the buildfarm has separate code to test cross-version upgrades,
https://github.com/PGBuildFarm/client-code/blob/main/PGBuild/Modules/TestUpgradeXversion.pm
This predates our in-core test support. Maybe buildfarm's could be
simplified, but I'm not sure if it's worth it.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)
On 10/16/24 08:54, Peter Eisentraut wrote:
On 14.10.24 11:28, Peter Eisentraut wrote:
On 03.10.24 23:13, Nathan Bossart wrote:
On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote:
I have committed 0001 (the new option) and 0004 (the docs tweak). I
think
there is consensus for the rest, too, but I'll leave it for a few
more days
to think about. I guess the test failure has to be addressed.Here is a rebased patch with the test fix (for cfbot). I have made no
other changes.I have committed the test changes (patch 0002). (I renamed the option
to no_data_checksums to keep the wording consistent with the initdb
option.)Right now, with checksums off by default, this doesn't do much, but
you can test this likePG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ...
and everything will pass. To make that work, I had to adjust the
order of how the initdb options are assembled in Cluster.pm a bit.I will work on the patch that flips the default next.
The patch that flips the default has been committed.
I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.
Regarding the open item, can someone explain what exactly are we
planning to evaluate mid-beta?
We went through the open items on the RMT team meeting today, and my
impression was the questions are mostly about performance of having
checksums by default, but now I realize the thread talks about "upgrade
experience" which seems fairly wide.
So, what kind of data we expect to gather in order to evaluate this?
Who's expected to collect it and evaluate this?
regards
--
Tomas Vondra
Re: Tomas Vondra
We went through the open items on the RMT team meeting today, and my
impression was the questions are mostly about performance of having
checksums by default, but now I realize the thread talks about "upgrade
experience" which seems fairly wide.
Fwiw, Debian's pg_upgradecluster already knows about that change. With
the default "dump" upgrade method, clusters will be transitioned to
checksums enabled, while "upgrade" (= pg_upgrade) upgrades will stick
with whatever was there before.
Christoph
On 23.04.25 00:24, Tomas Vondra wrote:
The patch that flips the default has been committed.
I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.Regarding the open item, can someone explain what exactly are we
planning to evaluate mid-beta?
If you have a PG <=17 installation without checksums (the default), and
you do the usual upgrade procedure to PG 18 involving initdb +
pg_upgrade, then pg_upgrade will reject the upgrade, because the
checksum settings don't match. The workaround is to run initdb with
--no-data-checksums and try again.
That's probably not all that bad, but if this is all below a bunch of
layers of scripts, users will have to do some work on their end to get
this working smoothly.
The point of the open item was (a) to make sure this is adequately
documented, for instance in the release notes, (b) to think about
technological solutions to simplify this, such as [0]/messages/by-id/57957aca-3eae-4106-afb2-3008122b9950@eisentraut.org, and (c) to just
check the general feedback.
Nothing from [0]/messages/by-id/57957aca-3eae-4106-afb2-3008122b9950@eisentraut.org ended up being committed, so that part of obsolete.
The action for beta1 is (a). And then for (c) perhaps monitor the
feedback between beta1 and beta2.
[0]: /messages/by-id/57957aca-3eae-4106-afb2-3008122b9950@eisentraut.org
/messages/by-id/57957aca-3eae-4106-afb2-3008122b9950@eisentraut.org
On 24/04/2025 14:26, Peter Eisentraut wrote:
On 23.04.25 00:24, Tomas Vondra wrote:
The patch that flips the default has been committed.
I also started a PG18 open items page and made a note that we follow up
on the upgrade experience, as was discussed in this thread.Regarding the open item, can someone explain what exactly are we
planning to evaluate mid-beta?If you have a PG <=17 installation without checksums (the default), and
you do the usual upgrade procedure to PG 18 involving initdb +
pg_upgrade, then pg_upgrade will reject the upgrade, because the
checksum settings don't match. The workaround is to run initdb with --
no-data-checksums and try again.That's probably not all that bad, but if this is all below a bunch of
layers of scripts, users will have to do some work on their end to get
this working smoothly.The point of the open item was (a) to make sure this is adequately
documented, for instance in the release notes, (b) to think about
technological solutions to simplify this, such as [0], and (c) to just
check the general feedback.Nothing from [0] ended up being committed, so that part of obsolete. The
action for beta1 is (a). And then for (c) perhaps monitor the feedback
between beta1 and beta2.[0]: /messages/by-id/57957aca-3eae-4106-
afb2-3008122b9950%40eisentraut.org
Ping: It's time to do something about this open item. (Or decide to do
nothing I guess). We're already in beta, but at the same time, we're
still early in the beta and now is the last chance for code changes
before 18 is shipped.
Aside from just documenting it, I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you
need to run initdb with the new version yourself, when there's really
only one correct way to do it. pg_upgrade has all the checks to verify
that you did it right, so why doesn't it just do it itself? I think
that'd be a good long-term solution. Might be too late for 18, but I'm
not sure. If someone wrote the patch we could evaluate it. To use that
mode, the scripts calling pg_upgrade would need to be changed, though,
so we'd perhaps want to do #2 or something else in addition to this.
2. If the new cluster has checksums enabled, but the old one has them
disabled, have pg_upgrade disable checksums in the new cluster.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 23.05.25 10:10, Heikki Linnakangas wrote:
The point of the open item was (a) to make sure this is adequately
documented, for instance in the release notes, (b) to think about
technological solutions to simplify this, such as [0], and (c) to just
check the general feedback.Nothing from [0] ended up being committed, so that part of obsolete.
The action for beta1 is (a). And then for (c) perhaps monitor the
feedback between beta1 and beta2.[0]: /messages/by-id/57957aca-3eae-4106-
afb2-3008122b9950%40eisentraut.orgPing: It's time to do something about this open item. (Or decide to do
nothing I guess). We're already in beta, but at the same time, we're
still early in the beta and now is the last chance for code changes
before 18 is shipped.Aside from just documenting it,
We don't currently have anything in the release notes that calls this
out as a potential upgrading issue, so I propose the attached patch.
I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you
need to run initdb with the new version yourself, when there's really
only one correct way to do it. pg_upgrade has all the checks to verify
that you did it right, so why doesn't it just do it itself? I think
that'd be a good long-term solution. Might be too late for 18, but I'm
not sure. If someone wrote the patch we could evaluate it. To use that
mode, the scripts calling pg_upgrade would need to be changed, though,
so we'd perhaps want to do #2 or something else in addition to this.2. If the new cluster has checksums enabled, but the old one has them
disabled, have pg_upgrade disable checksums in the new cluster.
These would alter the pg_upgrade workflow in significant ways, so I
don't think this would be appropriate to change now. So far I haven't
heard any feedback about this, so I'm content with a documentation change.
Attachments:
0001-doc-PG-18-relnotes-Add-incompatibility-note-about-ch.patchtext/plain; charset=UTF-8; name=0001-doc-PG-18-relnotes-Add-incompatibility-note-about-ch.patchDownload
From e16a2db21688b14bfc9394a22d7bff2e2d445c5b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 23 May 2025 11:16:04 +0200
Subject: [PATCH] doc PG 18 relnotes: Add incompatibility note about checksums
now default
---
doc/src/sgml/release-18.sgml | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/doc/src/sgml/release-18.sgml b/doc/src/sgml/release-18.sgml
index 246e49ce740..d0867b981c0 100644
--- a/doc/src/sgml/release-18.sgml
+++ b/doc/src/sgml/release-18.sgml
@@ -191,6 +191,27 @@ <title>Migration to Version 18</title>
<para>
These were previously zero-based.
</para>
+</listitem>
+
+<!--
+Author: Peter Eisentraut <peter@eisentraut.org>
+2024-10-16 [04bec894a04] initdb: Change default to using data checksums.
+-->
+
+<listitem>
+<para>
+initdb defaults to enabling data checksums
+<ulink url="&commit_baseurl;04bec894a04">§</ulink>
+</para>
+
+<para>
+The previous default behavior (checksums disabled) can be obtained using the
+new option --no-data-checksums. Note that pg_upgrade will reject upgrading
+between clusters with different checksum settings, so if the old cluster was
+initialized without checksums (the previous default), then the new cluster
+will need to be initialized with --no-data-checksums in order to allow
+pg_upgrade to succeed.
+</para>
</listitem>
</itemizedlist>
--
2.49.0
On 23 May 2025, at 10:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Aside from just documenting it, I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you need to run initdb with the new version yourself, when there's really only one correct way to do it. pg_upgrade has all the checks to verify that you did it right, so why doesn't it just do it itself? I think that'd be a good long-term solution. Might be too late for 18, but I'm not sure. If someone wrote the patch we could evaluate it. To use that mode, the scripts calling pg_upgrade would need to be changed, though, so we'd perhaps want to do #2 or something else in addition to this.
I can see this being desired longer term, but as you mention there is likely to
be many moving parts outside of our immediate control making it much harder
than just adding the call to initdb. It doesn't seem like a post-beta patch to
me given the implications for packagers and others in the ecosystem.
2. If the new cluster has checksums enabled, but the old one has them disabled, have pg_upgrade disable checksums in the new cluster.
IF we do this it should be Very visible, since a user otherwise might think
that their upgraded cluster will have checksums since they added them in
initdb.
I think we should document how to deal with checksums in upgrades, and perhaps
even tweak the errormessage in the pg_upgrade check with explanatory comments
if needed, and leave the functionality as is today.
--
Daniel Gustafsson
On 5/23/25 11:22, Peter Eisentraut wrote:
On 23.05.25 10:10, Heikki Linnakangas wrote:
The point of the open item was (a) to make sure this is adequately
documented, for instance in the release notes, (b) to think about
technological solutions to simplify this, such as [0], and (c) to
just check the general feedback.Nothing from [0] ended up being committed, so that part of obsolete.
The action for beta1 is (a). And then for (c) perhaps monitor the
feedback between beta1 and beta2.[0]: /messages/by-id/57957aca-3eae-4106-
afb2-3008122b9950%40eisentraut.orgPing: It's time to do something about this open item. (Or decide to do
nothing I guess). We're already in beta, but at the same time, we're
still early in the beta and now is the last chance for code changes
before 18 is shipped.Aside from just documenting it,
We don't currently have anything in the release notes that calls this
out as a potential upgrading issue, so I propose the attached patch.
Seems reasonable, although maybe it should say
... so if the old cluster does not have checksums enabled ...
instead of
... so if the old cluster was initialized without checksums ...
I mean, what matters is the current state, not how it was initialized.
I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you
need to run initdb with the new version yourself, when there's really
only one correct way to do it. pg_upgrade has all the checks to verify
that you did it right, so why doesn't it just do it itself? I think
that'd be a good long-term solution. Might be too late for 18, but I'm
not sure. If someone wrote the patch we could evaluate it. To use that
mode, the scripts calling pg_upgrade would need to be changed, though,
so we'd perhaps want to do #2 or something else in addition to this.2. If the new cluster has checksums enabled, but the old one has them
disabled, have pg_upgrade disable checksums in the new cluster.These would alter the pg_upgrade workflow in significant ways, so I
don't think this would be appropriate to change now. So far I haven't
heard any feedback about this, so I'm content with a documentation change.
How would #2 change pg_upgrade workflow? Isn't the whole point of that
change to keep the current workflow working?
Also, I'm not sure if "no feedback about this" is reliable. I have no
clue if people did any significant testing. Maybe people did a lot of
testing and the current state is fine. But it's more likely there was
little testing, in which case "no feedback" says nothing.
FWIW I would be +0.5 to just let pg_upgrade disable checksums.
regards
--
Tomas Vondra
On 5/23/25 11:25, Daniel Gustafsson wrote:
On 23 May 2025, at 10:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Aside from just documenting it, I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you need to run initdb with the new version yourself, when there's really only one correct way to do it. pg_upgrade has all the checks to verify that you did it right, so why doesn't it just do it itself? I think that'd be a good long-term solution. Might be too late for 18, but I'm not sure. If someone wrote the patch we could evaluate it. To use that mode, the scripts calling pg_upgrade would need to be changed, though, so we'd perhaps want to do #2 or something else in addition to this.
I can see this being desired longer term, but as you mention there is likely to
be many moving parts outside of our immediate control making it much harder
than just adding the call to initdb. It doesn't seem like a post-beta patch to
me given the implications for packagers and others in the ecosystem.2. If the new cluster has checksums enabled, but the old one has them disabled, have pg_upgrade disable checksums in the new cluster.
IF we do this it should be Very visible, since a user otherwise might think
that their upgraded cluster will have checksums since they added them in
initdb.
What counts as "very visible"? Would it be fine if the pg_upgrade docs
say this clearly, and pg_upgrade prints a warning? To me that seems
sufficient.
TBH I can't quite imagine people expecting checksums to just magically
appear after upgrade.
I think we should document how to deal with checksums in upgrades, and perhaps
even tweak the errormessage in the pg_upgrade check with explanatory comments
if needed, and leave the functionality as is today.
Isn't that just an unnecessary breakage of existing tooling? I mean,
there's pretty much just one thing the user can do to make it work, and
that's disabling checksums. Sure, they might also enable checksums on
the old cluster, but that makes the upgrade much longer, and presumably
they use pg_upgrade to upgrade quickly.
That being said, I don't feel very strongly about this, so if the
consensus is to just error-out, so be it.
regards
--
Tomas Vondra
On 23 May 2025, at 11:55, Tomas Vondra <tomas@vondra.me> wrote:
On 5/23/25 11:25, Daniel Gustafsson wrote:
On 23 May 2025, at 10:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Aside from just documenting it, I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you need to run initdb with the new version yourself, when there's really only one correct way to do it. pg_upgrade has all the checks to verify that you did it right, so why doesn't it just do it itself? I think that'd be a good long-term solution. Might be too late for 18, but I'm not sure. If someone wrote the patch we could evaluate it. To use that mode, the scripts calling pg_upgrade would need to be changed, though, so we'd perhaps want to do #2 or something else in addition to this.
I can see this being desired longer term, but as you mention there is likely to
be many moving parts outside of our immediate control making it much harder
than just adding the call to initdb. It doesn't seem like a post-beta patch to
me given the implications for packagers and others in the ecosystem.2. If the new cluster has checksums enabled, but the old one has them disabled, have pg_upgrade disable checksums in the new cluster.
IF we do this it should be Very visible, since a user otherwise might think
that their upgraded cluster will have checksums since they added them in
initdb.What counts as "very visible"? Would it be fine if the pg_upgrade docs
say this clearly, and pg_upgrade prints a warning? To me that seems
sufficient.
I was thinking about a warning during processing.
TBH I can't quite imagine people expecting checksums to just magically
appear after upgrade.
It would not be surprised if users expect checksums to be on after reading
(variations of) "checksums are now on by default" messaging.
I think we should document how to deal with checksums in upgrades, and perhaps
even tweak the errormessage in the pg_upgrade check with explanatory comments
if needed, and leave the functionality as is today.Isn't that just an unnecessary breakage of existing tooling? I mean,
there's pretty much just one thing the user can do to make it work, and
that's disabling checksums. Sure, they might also enable checksums on
the old cluster, but that makes the upgrade much longer, and presumably
they use pg_upgrade to upgrade quickly.
We already expect the new cluster to be created in the Right Way (which I agree
isn't very userfriendly and should be improved upon) so requiring this to be
Right is in line with existing tooling IMHO (for better or worse). My concern
is that users will think data checksums are enabled after the upgrade, and will
be annoyed when finding out they're not.
--
Daniel Gustafsson
On Fri, May 23, 2025 at 11:10:47AM +0300, Heikki Linnakangas wrote:
On 24/04/2025 14:26, Peter Eisentraut wrote:
action for beta1 is (a). And then for (c) perhaps monitor the feedback
between beta1 and beta2.Ping: It's time to do something about this open item. (Or decide to do
nothing I guess). We're already in beta, but at the same time, we're still
early in the beta and now is the last chance for code changes before 18 is
shipped.Aside from just documenting it, I see two things we could do:
1. Have pg_upgrade run initdb for you. It's always felt silly that you need
to run initdb with the new version yourself, when there's really only one
correct way to do it. pg_upgrade has all the checks to verify that you did
it right, so why doesn't it just do it itself? I think that'd be a good
long-term solution. Might be too late for 18, but I'm not sure. If someone
wrote the patch we could evaluate it. To use that mode, the scripts calling
pg_upgrade would need to be changed, though, so we'd perhaps want to do #2
or something else in addition to this.
The pg_upgrade docs suggest why having pg_upgrade run initdb could be
limiting:
5. Install extension shared object files
Many extensions and custom modules, whether from contrib or another
source, use shared object files (or DLLs), e.g., pgcrypto.so. If
the old cluster used these, shared object files matching the new
server binary must be installed in the new cluster, usually via
operating system commands. Do not load the schema definitions,
e.g., CREATE EXTENSION pgcrypto, because these will be duplicated
from the old cluster. If extension updates are available,
pg_upgrade will report this and create a script that can be run
later to update them.
6. Copy custom full-text search files
Copy any custom full text search files (dictionary, synonym,
thesaurus, stop words) from the old to the new cluster.
7. Adjust authentication
pg_upgrade will connect to the old and new servers several times,
so you might want to set authentication to peer in pg_hba.conf
or use a ~/.pgpass file (see Section 32.16).
I have also heard of cases where postgresql.conf must be modified for
pg_upgrade to succeed, and of course initdb installs postgresql.conf.
Should #5 be after #6?
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On 23.05.25 11:43, Tomas Vondra wrote:
We don't currently have anything in the release notes that calls this
out as a potential upgrading issue, so I propose the attached patch.Seems reasonable, although maybe it should say
... so if the old cluster does not have checksums enabled ...
instead of
... so if the old cluster was initialized without checksums ...
I mean, what matters is the current state, not how it was initialized.
committed with that change
Re: Heikki Linnakangas
1. Have pg_upgrade run initdb for you. It's always felt silly that you need
to run initdb with the new version yourself, when there's really only one
correct way to do it.
Fwiw, Debian's pg_upgradecluster script would be happy if that problem
would be solved. Currently, we have to do all sorts of inspection of
the old cluster to figure out the locale used, data checksums, and we
are likely missing some more properties that could carried over (like
wal segment size etc). Some of the info isn't even available from the
control file but needs a connection to the running server.
The problem exists for both the pg_upgrade and dump-restore upgrade
methods, so an "initdb --like /path/to/other/cluster" mode would be
handy to have. Perhaps including a "initdb --like --print-command"
mode.
The pg_{backup,restore}cluster scripts have the same problem. It's
unnecessarily complex to recreate an existing setup.
Maybe it would be enough if the initdb options used to create a
cluster would be stored in some file in the data dir. (Perhaps in a
new line in PG_VERSION? As part of the control file?)
Christoph
On 05.06.25 12:37, Christoph Berg wrote:
Maybe it would be enough if the initdb options used to create a
cluster would be stored in some file in the data dir.
That probably wouldn't help if the default behavior changed, as in the
current case.
On Fri, Jun 6, 2025 at 01:40:40PM +0200, Peter Eisentraut wrote:
On 05.06.25 12:37, Christoph Berg wrote:
Maybe it would be enough if the initdb options used to create a
cluster would be stored in some file in the data dir.That probably wouldn't help if the default behavior changed, as in the
current case.
Well, technically we would know the default from the old version, so we
could infer the desired behavior.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
Hi!
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.
We don't have a strong opinion either way, but there doesn't seem to be
much happening on this thread. So if someone feels we should rethink and
disable the checksums by default for PG18, he/she needs to articulate
that proposal. The sooner the better.
The commit that flipped the default (04bec894a04c) is from Peter
Eisentraut, and while that doesn't make him solely responsible for the
change, we think it'd be good to know his opinion on this. But it seems
more like a change requiring a wider consensus, and I'm aware e.g.
Andres expressed some doubts about enabling checksums [2]/messages/by-id/brdaw5wke274lubirrl4v2k4qdacylvgwwqztifn7m27pkth3s@rh7wie47pfcp.
In any case, the default is "on" at the moment, and unless someone
advocates for changing it soon, that's what we'll get in PG19.
My personal opinion is keeping checksums enabled by default seems OK.
It's not perfect, but it does seem like a reasonable trade off to me.
I'm aware of three issues people mention as arguments against checksums:
1) performance
It does have performance impact, and it'll always have performance
impact. Even if we make the checksums infinitely fast, we'll still have
to do more writes etc. There's no way around that. And I think that's
fine/acceptable.
I'm sure there are cases of checksums causing unnecessary regressions.
Commit e6eed40e44419 [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6eed40e44419e3268d01fe0d2daec08a7df68f7 is a good example of such case. I don't know
how many such cases exist, but I don't think it's very many. We need to
find and fix those cases - we just never did, because checksums were off
by default.
In any case, the cost seems reasonable to me.
2) upgrade user experience
I think this was the other concern in this thread - that it'll be hard
for users to do upgrades (using pg_upgrade). But I believe this should
have been sorted out now, am I right?
3) checksums user experience
I'm aware dealing with checksums can be rough. That is, we don't have
great tooling to verify checksums on-line, and once you get a checksum
error, you're mostly on your own. Or rather, the solution is to fail
over to a standby or restore from a backup. Which is fine, and I don't
have a great idea/proposal what else to do.
If we require improving this before enabling checksums by default,
someone probably needs to describe what improvements are needed to make
that happen.
regards
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=04bec894a04cb0d32533f1522ab81b7016141ff1
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=04bec894a04cb0d32533f1522ab81b7016141ff1
[2]: /messages/by-id/brdaw5wke274lubirrl4v2k4qdacylvgwwqztifn7m27pkth3s@rh7wie47pfcp
/messages/by-id/brdaw5wke274lubirrl4v2k4qdacylvgwwqztifn7m27pkth3s@rh7wie47pfcp
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6eed40e44419e3268d01fe0d2daec08a7df68f7
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6eed40e44419e3268d01fe0d2daec08a7df68f7
--
Tomas Vondra
On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote:
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.
I don't have a strong opinion, but I lean towards having them on
by default. Yes, that has a performance impact (e.g. WAL logging hints),
and people with reliable storage may be better off disabling checksums.
But I think it is better to set the default values so that they are
good for those people who don't have a lot of PostgreSQL knowledge and
are running less demanding installations on mediocre hardware.
Yours,
Laurenz Albe
On 30 Jul 2025, at 11:58, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote:
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.I don't have a strong opinion, but I lean towards having them on
by default.
I agree with that, while there might be a lot of cases where disabling
checksums is the right move it's still a sane default.
--
Daniel Gustafsson
On Jul 30, 2025, at 8:09 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
On 30 Jul 2025, at 11:58, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote:
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.I don't have a strong opinion, but I lean towards having them on
by default.I agree with that, while there might be a lot of cases where disabling
checksums is the right move it's still a sane default.--
Daniel Gustafsson
I realize I’m late to the conversation, I’ve been lurking...
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.
I recall a conversation at the last PGConf.dev (2025) with a representative
from Intel and Jeff Davis (CC’ed) that had to do with checksums and a vast
performance difference between Intel and AMD the latter winning by a mile.
I forget the details, maybe Jeff remembers more than I do. I’m not
suggesting that we disable Intel by default or trying to derail this
conversation (which appears to be reaching consensus), just raising
awareness.
best,
-greg
On 7/31/25 15:39, Greg Burd wrote:
On Jul 30, 2025, at 8:09 AM, Daniel Gustafsson <daniel@yesql.se> wrote:
On 30 Jul 2025, at 11:58, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2025-07-29 at 20:24 +0200, Tomas Vondra wrote:
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.I don't have a strong opinion, but I lean towards having them on
by default.I agree with that, while there might be a lot of cases where disabling
checksums is the right move it's still a sane default.--
Daniel GustafssonI realize I’m late to the conversation, I’ve been lurking...
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.I recall a conversation at the last PGConf.dev (2025) with a representative
from Intel and Jeff Davis (CC’ed) that had to do with checksums and a vast
performance difference between Intel and AMD the latter winning by a mile.
I forget the details, maybe Jeff remembers more than I do. I’m not
suggesting that we disable Intel by default or trying to derail this
conversation (which appears to be reaching consensus), just raising
awareness.
I don't know the Intel vs. AMD situation exactly, but e.g. [1]https://www.phoronix.com/news/Linux-CRC32C-VPCLMULQDQ does not
suggest AMD wins by a mile. In fact, it suggests Intel does much better
in this particular benchmark (with AVX-512 improvements). Of course,
this is a fairly recent *kernel* improvement, maybe it wouldn't work for
our data checksums that well.
However, I don't think the cost of the checksum calculation itself is
the main concern. It's probably negligible compared to all the other
costs, triggered by checksums - having to WAL-log hint bits, doing more
expensive checks (that's what the btree regression was about), etc.
[1]: https://www.phoronix.com/news/Linux-CRC32C-VPCLMULQDQ
cheers
--
Tomas Vondra
On Thu, 2025-07-31 at 09:39 -0400, Greg Burd wrote:
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.
Having checksums on does not improve data integrity...
Yours,
Laurenz Albe
On Thu, 2025-07-31 at 09:39 -0400, Greg Burd wrote:
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.
Having checksums on does not improve data integrity...
Yours,
Laurenz Albe
On Thu, 2025-07-31 at 09:39 -0400, Greg Burd wrote:
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.
Having checksums on does not improve data integrity...
Yours,
Laurenz Albe
On Thu, Jul 31, 2025 at 3:21 PM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
Having checksums on does not improve data integrity...
Maybe not directly, but in the same way that a smoke detector does not
directly prevent fire damage. Checksums do allow you to detect problems and
fix things (esp. bad hardware) before they get worse.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Thu, 2025-07-31 at 17:21 +0200, Tomas Vondra wrote:
On 7/31/25 15:39, Greg Burd wrote:
I recall a conversation at the last PGConf.dev (2025) with a
representative
from Intel and Jeff Davis (CC’ed) that had to do with checksums and
a vast
performance difference between Intel and AMD the latter winning by
a mile.I don't know the Intel vs. AMD situation exactly, but e.g. [1] does
not
suggest AMD wins by a mile. In fact, it suggests Intel does much
better
in this particular benchmark (with AVX-512 improvements). Of course,
this is a fairly recent *kernel* improvement, maybe it wouldn't work
for
our data checksums that well.However, I don't think the cost of the checksum calculation itself is
the main concern. It's probably negligible compared to all the other
costs, triggered by checksums - having to WAL-log hint bits, doing
more
expensive checks (that's what the btree regression was about), etc.
The issue Greg and I discussed, explained to me earlier by Andres, was
a memory bandwidth issue.
IIRC (Andres please correct me): The new IO infrastructure enables us
to bypass a memory copy (from userspace to kernel space) when writing
out a page. Unfortunately, checksums require reading the data to
calculate the checksum, which effectively defeats that optimization.
Those memory copies mostly happen in the bgwriter, where the page isn't
generally in the cache, which means that memory bandwidth can become
the bottleneck. Intel seems to have poor per-core memory bandwidth
compared with AMD:
so it's more likely to become the bottleneck on Intel.
That lead to an interesting discussion about calculating the checksum
on a page in the backend eagerly when it dirties a page, while it's
still in cache. As you point out, that's quite cheap.
Regards,
Jeff Davis
On Jul 31, 2025, at 3:21 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2025-07-31 at 09:39 -0400, Greg Burd wrote:
I agree that enabling checksums by default is the sane default. Databases
should always make a best effort for data integrity, checksums are a
positive step in that direction.Having checksums on does not improve data integrity...
Yours,
Laurenz Albe
Hello, thanks for your reply. I agree they don't improve integrity, but they
do improve the ability to detect loss of integrity (corruption), which is a
good thing for databases to do by default. Apologies, my phrasing could have
been better.
best.
-greg
On Jul 31, 2025, at 6:10 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-07-31 at 17:21 +0200, Tomas Vondra wrote:
On 7/31/25 15:39, Greg Burd wrote:
I recall a conversation at the last PGConf.dev (2025) with a
representative
from Intel and Jeff Davis (CC’ed) that had to do with checksums and
a vast
performance difference between Intel and AMD the latter winning by
a mile.I don't know the Intel vs. AMD situation exactly, but e.g. [1] does
not
suggest AMD wins by a mile. In fact, it suggests Intel does much
better
in this particular benchmark (with AVX-512 improvements). Of course,
this is a fairly recent *kernel* improvement, maybe it wouldn't work
for
our data checksums that well.However, I don't think the cost of the checksum calculation itself is
the main concern. It's probably negligible compared to all the other
costs, triggered by checksums - having to WAL-log hint bits, doing
more
expensive checks (that's what the btree regression was about), etc.The issue Greg and I discussed, explained to me earlier by Andres, was
a memory bandwidth issue.IIRC (Andres please correct me): The new IO infrastructure enables us
to bypass a memory copy (from userspace to kernel space) when writing
out a page. Unfortunately, checksums require reading the data to
calculate the checksum, which effectively defeats that optimization.Those memory copies mostly happen in the bgwriter, where the page isn't
generally in the cache, which means that memory bandwidth can become
the bottleneck. Intel seems to have poor per-core memory bandwidth
compared with AMD:so it's more likely to become the bottleneck on Intel.
That lead to an interesting discussion about calculating the checksum
on a page in the backend eagerly when it dirties a page, while it's
still in cache. As you point out, that's quite cheap.Regards,
Jeff Davis
Thanks Jeff for filling in the gaps in my memory. :)
-greg
On Thu, 31 Jul 2025 at 18:21, Tomas Vondra <tomas@vondra.me> wrote:
I don't know the Intel vs. AMD situation exactly, but e.g. [1] does not
suggest AMD wins by a mile. In fact, it suggests Intel does much better
in this particular benchmark (with AVX-512 improvements). Of course,
this is a fairly recent *kernel* improvement, maybe it wouldn't work for
our data checksums that well.
Page checksums are not CRC, but a custom FNV inspired algorithm that
rearranges the calculation into 32 parallel ones to extract more
instruction level parallelism. With recent improvements in execution
capability this is still instruction latency bound - Zen 5 could
execute it 3x faster if we widened it 4x. It is especially bound on
Intel, as they decided soon after we implemented this algorithm to
increase the latency of vpmulld to 10, compared to 3 on AMD. This
requires compiling for a target that supports the wide instructions,
so it could really use runtime CPU detection to switch between
different SIMD width implementations.
Even if we made the checksum algorithm itself faster, the main issue
is actually memory bandwidth. Intel server CPUs have about half the
bandwidth of AMD ones. A checksum has to pull in the whole page in a
few hundred cycles. Without checksums only a part of the page might be
accessed and the accesses are spread over a longer time, making them
easier to hide by out-of-order execution.
But all the above still ends up at being a few hundred nanoseconds per
buffer read. Basically this ends up only mattering measurably for
in-RAM but out of shared buffers workloads. And the easy workaround is
to increase shared buffers. As you said, the main issue is the other
overheads that checksums pull in.
--
Ants Aasma
On Fri, Aug 1, 2025 at 6:37 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
Even if we made the checksum algorithm itself faster, the main issue
is actually memory bandwidth. Intel server CPUs have about half the
bandwidth of AMD ones. A checksum has to pull in the whole page in a
few hundred cycles. Without checksums only a part of the page might be
accessed and the accesses are spread over a longer time, making them
easier to hide by out-of-order execution.But all the above still ends up at being a few hundred nanoseconds per
buffer read. Basically this ends up only mattering measurably for
in-RAM but out of shared buffers workloads. And the easy workaround is
to increase shared buffers. As you said, the main issue is the other
overheads that checksums pull in.
I want to point out that at some point in time there might well be demand
for checksumming pages living in shared_buffers. Modern storage systems
assume that the durable media is going to have errors and already have
robust ways to detect that. But they also assume that ECC memory is
bulletproof (it's not), and that's the biggest benefit to Postgres
checksums: they protect data in the filesystem cache[1]. You obviously lose
that if you size shared_buffers to consume most of available memory.
Obviously trying to address that is way beyond the scope of what's being
discussed here. I'm honestly unsure of how relevant it is, but I wanted to
make sure folks were aware of it.
1: I can't go into details, but I have seen a case where Postgres checksums
led to an investigation that ultimately revealed a memory-related issue. In
other words, data was actually getting corrupted while in the filesystem
cache. Obviously data could (and likely was) also get corrupted in shared
buffers, but the corruption in the FS cache was what prompted the
investigation that ultimately found the hardware issue. Fortunately
shared_buffers was small enough to make it more likely that corruption
would happen outside of Postgres, so it could be detected.
On 7/29/25 20:24, Tomas Vondra wrote:
Hi!
So, what should we do with the PG18 open item? We (the RMT team) would
like to know if we shall keep the checksums enabled by default, and if
there's something that still needs to be done for PG18.We don't have a strong opinion either way, but there doesn't seem to be
much happening on this thread. So if someone feels we should rethink and
disable the checksums by default for PG18, he/she needs to articulate
that proposal. The sooner the better.The commit that flipped the default (04bec894a04c) is from Peter
Eisentraut, and while that doesn't make him solely responsible for the
change, we think it'd be good to know his opinion on this. But it seems
more like a change requiring a wider consensus, and I'm aware e.g.
Andres expressed some doubts about enabling checksums [2].In any case, the default is "on" at the moment, and unless someone
advocates for changing it soon, that's what we'll get in PG19.
I've moved the item to the "resolved before rc1" section.
Given the recent inactivity on this thread, and lack of proposals to
change the default for PG18 (to disable checksums by default), I think
we're going to keep the current default.
We may rethink, but there's not much time to consider such proposal.
regards
--
Tomas Vondra