Offline enabling/disabling of data checksums

Started by Michael Banckabout 7 years ago149 messages
#1Michael Banck
michael.banck@credativ.de
1 attachment(s)

Hi,

the attached patch adds offline enabling/disabling of checksums to
pg_verify_checksums. It is based on independent work both Michael
(Paquier) and me did earlier this year and takes changes from both, see
https://github.com/credativ/pg_checksums and
https://github.com/michaelpq/pg_plugins/tree/master/pg_checksums

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

offline-activation-of-checksums_V1.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 6444fc9ca4..65d6195509 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies/enables/disables page level checksums in an offline cluster
  *
  *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
  *
@@ -13,15 +13,16 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
-
 
 static int64 files = 0;
 static int64 blocks = 0;
@@ -31,16 +32,32 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_NONE,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE,
+	PG_ACTION_VERIFY
+} ChecksumAction;
+
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_NONE;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -A, --action   action to take on the cluster, can be set as\n"));
+	printf(_("                 \"verify\", \"enable\" and \"disable\"\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -80,6 +97,80 @@ skipfile(const char *fn)
 }
 
 static void
+updateControlFile(char *DataDir, ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_DISABLE);
+
+	/*
+	 * For good luck, apply the same static assertions as in backend's
+	 * WriteControlFile().
+	 */
+#if PG_VERSION_NUM >= 100000
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+					 "pg_control is too large for atomic disk writes");
+#endif
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+				(char *) ControlFile,
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+	unlink(ControlFilePath);
+
+	fd = open(ControlFilePath,
+			  O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
+			  pg_file_create_mode);
+	if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open pg_control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	errno = 0;
+	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		fprintf(stderr, _("%s: could not write pg_control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+
+	if (close(fd) < 0)
+	{
+		fprintf(stderr, _("%s: could not close control file: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
 scan_file(const char *fn, BlockNumber segmentno)
 {
 	PGAlignedBlock buf;
@@ -87,7 +178,10 @@ scan_file(const char *fn, BlockNumber segmentno)
 	int			f;
 	BlockNumber blockno;
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_VERIFY);
+
+	f = open(fn, O_RDWR | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -117,18 +211,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_VERIFY)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_VERIFY)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -230,6 +353,7 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"action", required_argument, NULL, 'A'},
 		{"pgdata", required_argument, NULL, 'D'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
@@ -258,10 +382,31 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "A:D:r:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'A':
+				/* Check for redundant options */
+				if (action != PG_ACTION_NONE)
+				{
+					fprintf(stderr, _("%s: action already specified.\n"), progname);
+					exit(1);
+				}
+
+				if (strcmp(optarg, "verify") == 0)
+					action = PG_ACTION_VERIFY;
+				else if (strcmp(optarg, "disable") == 0)
+					action = PG_ACTION_DISABLE;
+				else if (strcmp(optarg, "enable") == 0)
+					action = PG_ACTION_ENABLE;
+				else
+				{
+					fprintf(stderr, _("%s: incorrect action \"%s\" specified.\n"),
+									progname, optarg);
+					exit(1);
+				}
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -282,6 +427,21 @@ main(int argc, char *argv[])
 		}
 	}
 
+	/*
+	 * Don't allow pg_checksums to be run as root, to avoid overwriting the
+	 * ownership of files in the data directory. We need only check for root
+	 * -- any other user won't have sufficient permissions to modify files in
+	 * the data directory.  This does not matter for the "verify" mode, but
+	 * let's be consistent.
+	 */
+#ifndef WIN32
+	if (geteuid() == 0)
+	{
+		fprintf(stderr, _("%s: cannot be executed by \"root\"\n"), progname);
+		exit(1);
+	}
+#endif
+
 	if (DataDir == NULL)
 	{
 		if (optind < argc)
@@ -308,6 +468,25 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Complain if no action has been requested */
+	if (action == PG_ACTION_NONE)
+	{
+		fprintf(stderr, _("%s: no action specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+						progname);
+		exit(1);
+	}
+
+	/* Relfilenode checking only works in verify mode */
+	if (action != PG_ACTION_VERIFY &&
+		only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with verify action\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+						progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -319,29 +498,74 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_VERIFY)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
+	/*
+	 * When disabling data checksums, only update the control file and call it
+	 * a day.
+	 */
+	if (action == PG_ACTION_DISABLE)
+	{
+		ControlFile->data_checksum_version = 0;
+		updateControlFile(DataDir, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		printf(_("Checksums disabled in cluster\n"));
+		return 0;
+	}
+
+	/* Operate on all files */
 	scan_directory(DataDir, "global");
 	scan_directory(DataDir, "base");
 	scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Checksum operation completed\n"));
 	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
 	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+	if (action == PG_ACTION_VERIFY)
+	{
+		printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
+		if (badblocks > 0)
 		return 1;
+	}
+
+	/*
+	 * When enabling checksums, wait until the end the operation has completed
+	 * to do the switch.
+	 */
+	if (action == PG_ACTION_ENABLE)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+		updateControlFile(DataDir, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		printf(_("Checksums enabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 5250b5a728..0dba764a23 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 50;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_verify_checksums',  '-A', 'verify', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,16 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_verify_checksums', '-A', 'verify', '-D',
+							  $pgdata, '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-A', 'verify', '-D',
+							  $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +68,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_verify_checksums', '-A', 'verify', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +101,27 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums
+command_ok(['pg_verify_checksums', '-A', 'enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_verify_checksums', '-A', 'verify', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Specific relation files cannot be requested when action is disable
+command_fails(['pg_verify_checksums', '-A', 'disable', '-r', '1234', '-D',
+			  $pgdata],
+			  "fails when relfilnodes are requested and action is not verify");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_verify_checksums', '-A', 'verify', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +148,8 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-A', 'verify', '-D',
+						  $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#1)
Re: Offline enabling/disabling of data checksums

On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

There are two discussion points which deserve attention here:
1) Do we want to rename pg_verify_checksums to something else, like
pg_checksums. I like a lot if we would do a simple renaming of the
tool, which should be the first step taken.
2) Which kind of interface do we want to use? When I did my own
flavor of pg_checksums, I used an --action switch able to use the
following values:
- enable
- disable
- verify
The switch cannot be specified twice (perhaps we could enforce the
last value as other binaries do in the tree, not sure if that's
adapted here). A second type of interface is to use one switch per
action. For both interfaces if no action is specify then the tool
fails. Vote is open.

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

Check. That sounds right to me.

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

The renaming makes sense, as scan implies only reading while enabling
checksums causes a write.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though. I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.
--
Michael

#3Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#2)
Re: Offline enabling/disabling of data checksums

Hi,

I have added it to the commitfest now:

https://commitfest.postgresql.org/21/1944/

On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:

On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

There are two discussion points which deserve attention here:
1) Do we want to rename pg_verify_checksums to something else, like
pg_checksums. I like a lot if we would do a simple renaming of the
tool, which should be the first step taken.

I am for it, but don't mind whether it's before or afterwards, your
call.

2) Which kind of interface do we want to use? When I did my own
flavor of pg_checksums, I used an --action switch able to use the
following values:
- enable
- disable
- verify
The switch cannot be specified twice (perhaps we could enforce the
last value as other binaries do in the tree, not sure if that's
adapted here). A second type of interface is to use one switch per
action. For both interfaces if no action is specify then the tool
fails. Vote is open.

Even though my fork has the separate switches, I like the --action one.
On the other hand, it is a bit more typing as you always have to spell
out the action (is there precendent of accepting also incomplete option
arguments like 'v', 'e', 'd'?).

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

OK; we have also used that at clients - if the instance has a size of
less than a few dozen GBs, enabling checksums during a routine minor
upgrade restart is not delaying things much.

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

The renaming makes sense, as scan implies only reading while enabling
checksums causes a write.

Ok, will do in the next version.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#4Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#3)
Re: Offline enabling/disabling of data checksums

On Sat, Dec 22, 2018 at 02:42:55PM +0100, Michael Banck wrote:

On Sat, Dec 22, 2018 at 08:28:34AM +0900, Michael Paquier wrote:

There are two discussion points which deserve attention here:
1) Do we want to rename pg_verify_checksums to something else, like
pg_checksums. I like a lot if we would do a simple renaming of the
tool, which should be the first step taken.

I am for it, but don't mind whether it's before or afterwards, your
call.

Doing the renaming after would be a bit weird logically, as we would
finish with a point in time in the tree where pg_verify_checksums is
able to do something else than just verifying checksums.

Even though my fork has the separate switches, I like the --action one.
On the other hand, it is a bit more typing as you always have to spell
out the action (is there precendent of accepting also incomplete option
arguments like 'v', 'e', 'd'?).

Yes, there is a bit of that in psql for example for formats. Not sure
that we should take this road for a checksumming tool though. If a
new option is added which takes the first letter then we would have
incompatibility issues. That's unlikely to happen, still that feels
uneasy.
--
Michael

#5Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#2)
Re: Offline enabling/disabling of data checksums

On Fri, Dec 21, 2018 at 6:28 PM Michael Paquier <michael@paquier.xyz> wrote:

2) Which kind of interface do we want to use? When I did my own
flavor of pg_checksums, I used an --action switch able to use the
following values:
- enable
- disable
- verify
The switch cannot be specified twice (perhaps we could enforce the
last value as other binaries do in the tree, not sure if that's
adapted here). A second type of interface is to use one switch per
action. For both interfaces if no action is specify then the tool
fails. Vote is open.

I vote for separate switches. Using the same switch with an argument
seems like it adds typing for no real gain.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#1)
Re: Offline enabling/disabling of data checksums

Hallo Michael,

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable, and verify
would be the default if none is provided.

This is basically meant as a stop-gap measure in case online activation
of checksums won't make it for v12, but maybe it is independently
useful?

I would say yes.

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

I'd agree to rename the tool as "pg_checksums".

2. Rename the scan_* functions (Michael renamed them to operate_file and
operate_directory but I am not sure it is worth it.

Hmmm. The file is indeed scanned, and "operate" is kind of very fuzzy.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).

My 0.02ᅵ is that data safety should comes first, thus checksums should be
enabled by default.

About the patch: applies, compiles, "make check" ok.

There is no documentation.

In "scan_file", I would open RW only for enable, but keep RO for verify.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

It seems that the control file is unlinked and then rewritten. If the
rewritting fails, or the command is interrupted, the user has a problem.

Could the control file be simply opened RW? Else, I would suggest to
rename (eg add .tmp), write the new one, then unlink the old one, so that
recovering the old state in case of problem is possible.

For enable/disable, while the command is running, it should mark the
cluster as opened to prevent an unwanted database start. I do not see
where this is done.

--
Fabien.

#7Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#6)
Re: Offline enabling/disabling of data checksums

On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable, and verify
would be the default if none is provided.

Okay, noted for the separate switches. But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined. That feels like a trap for newcomers of this tool..

For enable/disable, while the command is running, it should mark the cluster
as opened to prevent an unwanted database start. I do not see where this is
done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance. In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool. I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.
--
Michael

#8Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#7)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable, and verify
would be the default if none is provided.

Okay, noted for the separate switches. But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined. That feels like a trap for newcomers of this tool..

Hmmm. It does something safe and useful, especially if it also works
online (patch pending), and the initial tool only does checking. However,
I'd be okay for no default.

For enable/disable, while the command is running, it should mark the cluster
as opened to prevent an unwanted database start. I do not see where this is
done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance. In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool. I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.

Hmmm. I do not buy the comparison.

A verify that fails is not a big problem, you can run it again. If
pg_rewind fails, you can probably run it again as well, the source is
probably still consistent even if it has changed, and too bad for the
target side, but it was scheduled to be overwritten anyway.

However, a tool which overwrites files beyond the back of a running server
is a recipee for data-loss, so I think it should take much more care, i.e.
set the server state into some specific safe state.

About code simplicity: probably there is, or there should be, a
change-the-state function somewhere, because quite a few tools could use
it?

--
Fabien.

#9Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#7)
Re: Offline enabling/disabling of data checksums

On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable, and

verify

would be the default if none is provided.

Okay, noted for the separate switches. But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined. That feels like a trap for newcomers of this tool..

Defaulting to the choice that makes no actual changes to the data surely is
the safe choice,a nd not a trap :)

That said, this would probably be our first tool where you switch it
between readonly and rewrite mode with just a switch, woudn't it? All other
tools are either read-only or read/write at the *tool* level, not the
switch level.

That in itself would be an argument for making it a separate tool. But not
a very strong one I think, I prefer the single-tool-renamed approach as
well.

There's plenty enough precedent for the "separate switches and a default
behaviour if none is specified" in other tools though, and I don't think
that's generally considered a trap.

So count me in the camp for separate switches and default to verify. If one
didn't mean that, it's only a quick Ctrl-C away with no damage done.

For enable/disable, while the command is running, it should mark the
cluster

as opened to prevent an unwanted database start. I do not see where this

is

done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance. In short, this is a deal
between code simplicity and trying to have the tool outsmart users in
the way users use the tool. I tend to prefer keeping the code simple
and not worry about cases where the users mess up with their
application, as there are many more things which could go wrong.

I think it comes down to what the outcome is. If we're going to end up with
a corrupt database (e.g. one where checksums aren't set everywhere but they
are marked as such in pg_control) then it's not acceptable. If the only
outcome is the tool gives an error that's not an error and if re-run it's
fine, then it's a different story.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#10Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#2)
Re: Offline enabling/disabling of data checksums

On Sat, Dec 22, 2018 at 12:28 AM Michael Paquier <michael@paquier.xyz>
wrote:

On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

Very much so, IMHO.

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no longer
only verify checksums.

Check. That sounds right to me.

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it perfectly
fine I think -- and it's a smaller change than when we did all those
xlog->wal changes...

3. Once that patch is in, there would be a way to disable checksums so

there'd be a case to also change the initdb default to enabled, but that
required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though. I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.

I'd be a strong +1 for changing the default once we have a painless way to
turn them off.

It remains super-cheap to turn them off (stop database, one command, turn
them on). So those people that aren't willing to pay the overhead of
checksums, can very cheaply get away from it.

It's a lot more expensive to turn them on once your database has grown to
some size (definitely in offline mode, but also in an online mode when we
get that one in).

Plus, the majority of people *should* want them on :) We don't run with say
synchronous_commit=off by default either to make it easier on those that
don't want to pay the overhead of full data safety :P (I know it's not a
direct match, but you get the idea)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#11Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Magnus Hagander (#9)
Re: Offline enabling/disabling of data checksums

For enable/disable, while the command is running, it should mark the
cluster as opened to prevent an unwanted database start. I do not see
where this is done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance. [...]

I think it comes down to what the outcome is. If we're going to end up with
a corrupt database (e.g. one where checksums aren't set everywhere but they
are marked as such in pg_control) then it's not acceptable. If the only
outcome is the tool gives an error that's not an error and if re-run it's
fine, then it's a different story.

ISTM that such an outcome is indeed a risk, as a starting postgres could
update already checksummed pages without putting a checksum. It could be
even worse, although with a (very) low probability, with updates
overwritten on a race condition between the processes. In any case, no
error would be reported before much later, with invalid checksums or
inconsistent data, or undetected forgotten committed data.

--
Fabien.

#12Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#10)
Re: Offline enabling/disabling of data checksums

On 12/27/18 11:43 AM, Magnus Hagander wrote:

On Sat, Dec 22, 2018 at 12:28 AM Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> wrote:

On Fri, Dec 21, 2018 at 09:16:16PM +0100, Michael Banck wrote:

I think that this is independently useful, I got this stuff part of an
upgrade workflow where the user is ready to accept some extra one-time
offline time so as checksums are enabled.

Very much so, IMHO.

Things I have not done so far:

1. Rename pg_verify_checksums to e.g. pg_checksums as it will no

longer

only verify checksums.

Check.  That sounds right to me.

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it
perfectly fine I think -- and it's a smaller change than when we did all
those xlog->wal changes...

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

3. Once that patch is in, there would be a way to disable checksums so
there'd be a case to also change the initdb default to enabled,

but that

required further discussion (and maybe another round of benchmarks).

Perhaps, that's unrelated to this thread though.  I am not sure that
all users would be ready to pay the extra cost of checksums enabled by
default.

I'd be a strong +1 for changing the default once we have a painless way
to turn them off.

It remains super-cheap to turn them off (stop database, one command,
turn them on). So those people that aren't willing to pay the overhead
of checksums, can very cheaply get away from it.

It's a lot more expensive to turn them on once your database has grown
to some size (definitely in offline mode, but also in an online mode
when we get that one in).

Plus, the majority of people *should* want them on :) We don't run with
say synchronous_commit=off by default either to make it easier on those
that don't want to pay the overhead of full data safety :P (I know it's
not a direct match, but you get the idea)

I don't know, TBH. I agree making the on/off change cheaper makes moves
us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.
That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Magnus Hagander (#9)
Re: Offline enabling/disabling of data checksums

On 12/27/18 11:39 AM, Magnus Hagander wrote:

On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> wrote:

On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:

It adds an (now mandatory) --action parameter that takes either

verify,

enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable,

and verify

would be the default if none is provided.

Okay, noted for the separate switches.  But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined.  That feels like a trap for newcomers of this tool..

Defaulting to the choice that makes no actual changes to the data surely
is the safe choice,a nd not a trap :)

That said, this would probably be our first tool where you switch it
between readonly and rewrite mode with just a switch, woudn't it? All
other tools are either read-only or read/write at the *tool* level, not
the switch level.

Eh? Isn't pg_rewind "modify by default" with --dry-run switch to run in
a read-only mode. So I'm not sure what you mean by "tool level" here.

FWIW I'd prefer sticking to the same approach for this tool too, i.e.
have a "dry-run" switch that makes it read-only. IMHO that's pretty
common pattern.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#13)
Re: Offline enabling/disabling of data checksums

On Thu, Dec 27, 2018 at 3:54 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 12/27/18 11:39 AM, Magnus Hagander wrote:

On Thu, Dec 27, 2018 at 2:15 AM Michael Paquier <michael@paquier.xyz
<mailto:michael@paquier.xyz>> wrote:

On Wed, Dec 26, 2018 at 07:43:17PM +0100, Fabien COELHO wrote:

It adds an (now mandatory) --action parameter that takes either

verify,

enable or disable as argument.

I'd rather have explicit switches for verify, enable & disable,

and verify

would be the default if none is provided.

Okay, noted for the separate switches. But I don't agree with the
point of assuming that --verify should be enforced if no switches are
defined. That feels like a trap for newcomers of this tool..

Defaulting to the choice that makes no actual changes to the data surely
is the safe choice,a nd not a trap :)

That said, this would probably be our first tool where you switch it
between readonly and rewrite mode with just a switch, woudn't it? All
other tools are either read-only or read/write at the *tool* level, not
the switch level.

Eh? Isn't pg_rewind "modify by default" with --dry-run switch to run in
a read-only mode. So I'm not sure what you mean by "tool level" here.

FWIW I'd prefer sticking to the same approach for this tool too, i.e.
have a "dry-run" switch that makes it read-only. IMHO that's pretty
common pattern.

That's a different thing.

pg_rewind in dry-run mode does the same thing, except it doesn't actually
do it, it just pretends.

Verifying checksums is not the same as "turn on checksums except don't
actually do it" or "turn off checksums except don't actually do it".

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#15Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#12)
Re: Offline enabling/disabling of data checksums

On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:

On 12/27/18 11:43 AM, Magnus Hagander wrote:

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it
perfectly fine I think -- and it's a smaller change than when we did all
those xlog->wal changes...

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

I don't know, TBH. I agree making the on/off change cheaper makes moves
us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.
That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization. Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.
--
Michael

#16Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Michael Paquier (#15)
Re: Offline enabling/disabling of data checksums

On 12/28/18 12:25 AM, Michael Paquier wrote:

On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:

On 12/27/18 11:43 AM, Magnus Hagander wrote:

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it
perfectly fine I think -- and it's a smaller change than when we did all
those xlog->wal changes...

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

I don't know, TBH. I agree making the on/off change cheaper makes moves
us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.
That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization. Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#16)
Re: Offline enabling/disabling of data checksums

On Fri, Dec 28, 2018 at 01:14:05AM +0100, Tomas Vondra wrote:

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

(The previous email had an extra "Would"... Sorry.)
Let's ask those folks then.

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

What I am referring to here is the possibility to enable, disable and
check checksums for an online cluster. I am not sure what kind of
tooling able to do chirurgy at page level would make sense. Once a
checksum is corrupted a user knows about a problem, which mainly needs
a human lookup.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.

Sure, they are not a solution to all problems. Still they give hints
before the problem spreads, and sometimes by looking at one corrupted
page by yourself one can see if the data fetched from disk comes from
Postgres or not (say inspecting the page header with pageinspect,
etc.).
--
Michael

#18Magnus Hagander
magnus@hagander.net
In reply to: Tomas Vondra (#16)
Re: Offline enabling/disabling of data checksums

On Fri, Dec 28, 2018 at 1:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 12/28/18 12:25 AM, Michael Paquier wrote:

On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:

On 12/27/18 11:43 AM, Magnus Hagander wrote:

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it
perfectly fine I think -- and it's a smaller change than when we did

all

those xlog->wal changes...

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

I think a heads- up in the way of "planning to change it, now's the time to
yell" is the reasonable thing.

I don't know, TBH. I agree making the on/off change cheaper makes moves

us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.
That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

Enabling data checksums by default is still a couple of steps ahead,
without a way to control them better..

What do you mean by "control" here? Dealing with checksum failures, or
some additional capabilities?

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

Yes, data checksums are extremely useful to tell people when the
problem is *not* from Postgres, which can be really hard in a large
organization. Knowing about the corrupted page is also useful as you
can look at its contents and look at its bytes before it gets zero'ed
to spot patterns which can help other teams in charge of a lower level
of the application layer.

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.

They are a great evidence that your data is corrupt. You *want* to know
that your data is corrupt. Even if our best recommendation is "go restore
your backups", you still want to know. Otherwise you are sitting around on
data that's corrupt and you don't know about it.

There are certainly many things we can do to improve the experience. But
not telling people their data is coorrupt when it is, isn't one of them.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#19Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Magnus Hagander (#18)
Re: Offline enabling/disabling of data checksums

[...]

I'm not sure data checksums are particularly great evidence. For example
with the recent fsync issues, we might have ended with partial writes
(and thus invalid checksums). The OS migh have even told us about the
failure, but we've gracefully ignored it. So I'm afraid data checksums
are not a particularly great proof it's not our fault.

They are a great evidence that your data is corrupt. You *want* to know
that your data is corrupt. Even if our best recommendation is "go restore
your backups", you still want to know. Otherwise you are sitting around on
data that's corrupt and you don't know about it.

There are certainly many things we can do to improve the experience. But
not telling people their data is coorrupt when it is, isn't one of them.

Yep, anyone should want to know if their database is corrupt, compare to
ignoring the fact.

One reason not to enable it could be if the implementation is not trusted,
i.e. if false positive (corrupt page detected while the data are okay and
there was only an issue with computing or storing the checksum) can occur.

There is also the performance impact. I did some quick-and-dirty pgbench
simple update single thread performance tests to compare with vs without
checksum. Enabling checksums on these tests seems to induce a 1.4%
performance penalty, although I'm moderately confident about it given the
standard deviation. At least it is an indication, and it seems to me that
it is consistent with other figures previously reported on the list.

--
Fabien.

#20Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#18)
Re: Offline enabling/disabling of data checksums

On Fri, Dec 28, 2018 at 10:12:24AM +0100, Magnus Hagander wrote:

I think a heads- up in the way of "planning to change it, now's the time to
yell" is the reasonable thing.

And done.
--
Michael

#21Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#18)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 28.12.2018, 10:12 +0100 schrieb Magnus Hagander:

On Fri, Dec 28, 2018 at 1:14 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 12/28/18 12:25 AM, Michael Paquier wrote:

On Thu, Dec 27, 2018 at 03:46:48PM +0100, Tomas Vondra wrote:

On 12/27/18 11:43 AM, Magnus Hagander wrote:

Should we double-check with packagers that this won't cause a problem?
Though the fact that it's done in a major release should make it
perfectly fine I think -- and it's a smaller change than when we did all
those xlog->wal changes...

I think it makes little sense to not rename the tool now. I'm pretty
sure we'd end up doing that sooner or later anyway, and we'll just live
with a misnamed tool until then.

Do you think that a thread Would on -packagers be more adapted then?

I'm sorry, but I'm not sure I understand the question. Of course, asking
over at -packagers won't hurt, but my guess is the response will be it's
not a big deal from the packaging perspective.

I think a heads- up in the way of "planning to change it, now's the
time to yell" is the reasonable thing.

Renaming applications shouldn't be a problem unless they have to be
moved from one binary package to another. I assume all packagers ship
all client/server binaries in one package, respectively (and not e.g. a
dedicated postgresql-11-pg_test_fsync package), this should only be a
matter of updating package metadata.

In any case, it should be identical to the xlog->wal rename.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#22Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#21)
Re: Offline enabling/disabling of data checksums

On Sat, Dec 29, 2018 at 11:55:43AM +0100, Michael Banck wrote:

Renaming applications shouldn't be a problem unless they have to be
moved from one binary package to another. I assume all packagers ship
all client/server binaries in one package, respectively (and not e.g. a
dedicated postgresql-11-pg_test_fsync package), this should only be a
matter of updating package metadata.

In any case, it should be identical to the xlog->wal rename.

I have poked -packagers on the matter and I am seeing no complains, so
let's move forward with this stuff. From the consensus I am seeing on
the thread, we have been discussing about the following points:
1) Rename pg_verify_checksums to pg_checksums.
2) Have separate switches for each action, aka --verify, --enable and
--disable, or a unified --action switch which can take different
values.
3) Do we want to imply --verify by default if no switch is specified?

About 2), folks who have expressed an opinion are:
- Multiple switches: Robert, Fabien, Magnus
- Single --action switch: Michael B, Michael P

About 3), aka --verify implied if no action is specified:
- In favor: Fabien C, Magnus
- Against: Michael P

If I missed what someone said, please feel free to complete with your
votes here.
--
Michael

#23Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#22)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 01.01.2019, 11:38 +0900 schrieb Michael Paquier:

On Sat, Dec 29, 2018 at 11:55:43AM +0100, Michael Banck wrote:

Renaming applications shouldn't be a problem unless they have to be
moved from one binary package to another. I assume all packagers ship
all client/server binaries in one package, respectively (and not e.g. a
dedicated postgresql-11-pg_test_fsync package), this should only be a
matter of updating package metadata.

In any case, it should be identical to the xlog->wal rename.

I have poked -packagers on the matter and I am seeing no complains, so
let's move forward with this stuff. From the consensus I am seeing on
the thread, we have been discussing about the following points:
1) Rename pg_verify_checksums to pg_checksums.
2) Have separate switches for each action, aka --verify, --enable and
--disable, or a unified --action switch which can take different
values.
3) Do we want to imply --verify by default if no switch is specified?

About 2), folks who have expressed an opinion are:
- Multiple switches: Robert, Fabien, Magnus
- Single --action switch: Michael B, Michael P

I implemented the multiple switches thing in my branch first anyway and
don't mind a lot either way; I think the consensus goes towards multiple
switches.

About 3), aka --verify implied if no action is specified:
- In favor: Fabien C, Magnus
- Against: Michael P

I think I'm in favor as well.

I wonder whether we (or packagers) could then just ship a
pg_verify_checksums -> pg_checksums symlink for compatibility if we/they
want, as the behaviour would stay the same?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#24Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#23)
Re: Offline enabling/disabling of data checksums

On Tue, Jan 01, 2019 at 11:42:49AM +0100, Michael Banck wrote:

Am Dienstag, den 01.01.2019, 11:38 +0900 schrieb Michael Paquier:

About 3), aka --verify implied if no action is specified:
- In favor: Fabien C, Magnus
- Against: Michael P

I think I'm in favor as well.

Okay, it looks to be the direction to take then.

I wonder whether we (or packagers) could then just ship a
pg_verify_checksums -> pg_checksums symlink for compatibility if we/they
want, as the behaviour would stay the same?

In the v10 dev cycle this part has been discarded for the switch from
pg_xlogdump to pg_waldump. I don't think that's worth bothering this
time either in the build.
--
Michael

#25Stephen Frost
sfrost@snowman.net
In reply to: Tomas Vondra (#12)
Re: Offline enabling/disabling of data checksums

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 12/27/18 11:43 AM, Magnus Hagander wrote:

Plus, the majority of people *should* want them on :) We don't run with
say synchronous_commit=off by default either to make it easier on those
that don't want to pay the overhead of full data safety :P (I know it's
not a direct match, but you get the idea)

+1 to having them on by default, we should have done that a long time
ago.

I don't know, TBH. I agree making the on/off change cheaper makes moves
us closer to 'on' by default, because they may disable it if needed. But
it's not the whole story.

If we enable checksums by default, 99% users will have them enabled.

Yes, and they'll then be able to catch data corruption much earlier.
Today, 99% of our users don't have them enabled and have no clue if
their data has been corrupted on disk, or not. That's not good.

That means more people will actually observe data corruption cases that
went unnoticed so far. What shall we do with that? We don't have very
good answers to that (tooling, docs) and I'd say "disable checksums" is
not a particularly amazing response in this case :-(

Now that we've got a number of tools available which will check the
checksums in a running system and throw up warnings when found
(pg_basebackup, pgBackRest and I think other backup tools,
pg_checksums...), users will see corruption and have the option to
restore from a backup before those backups expire out and they're left
with a corrupt database and backups which also have that corruption.

This ongoing call for specific tooling to do "something" about checksums
is certainly good, but it's not right to say that we don't have existing
documentation- we do, quite a bit of it, and it's all under the heading
of "Backup and Recovery".

FWIW I don't know what to do about that. We certainly can't prevent the
data corruption, but maybe we could help with fixing it (although that's
bound to be low-level work).

There's been some effort to try and automagically correct corrupted
pages but it's certainly not something I'm ready to trust beyond a
"well, this is what it might have been" review. The answer today is to
find a backup which isn't corrupt and restore from it on a known-good
system. If adding explicit documentation to that effect would reduce
your level of concern when it comes to enabling checksums by default,
then I'm happy to do that.

Thanks!

Stephen

#26Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#11)
Re: Offline enabling/disabling of data checksums

Hi,

Am Donnerstag, den 27.12.2018, 12:26 +0100 schrieb Fabien COELHO:

For enable/disable, while the command is running, it should mark the
cluster as opened to prevent an unwanted database start. I do not see
where this is done.

You have pretty much the same class of problems if you attempt to
start a cluster on which pg_rewind or the existing pg_verify_checksums
is run after these have scanned the control file to make sure that
they work on a cleanly-stopped instance. [...]

I think it comes down to what the outcome is. If we're going to end up with
a corrupt database (e.g. one where checksums aren't set everywhere but they
are marked as such in pg_control) then it's not acceptable. If the only
outcome is the tool gives an error that's not an error and if re-run it's
fine, then it's a different story.

ISTM that such an outcome is indeed a risk, as a starting postgres could
update already checksummed pages without putting a checksum. It could be
even worse, although with a (very) low probability, with updates
overwritten on a race condition between the processes. In any case, no
error would be reported before much later, with invalid checksums or
inconsistent data, or undetected forgotten committed data.

One difference between pg_rewind and pg_checksums is that the latter
potentially runs for a longer time (or rather a non-trivial amount of
time, compared to pg_rewind), so the margin of error of another DBA
saying "oh, that DB is down, let me start it again" might be much
higher.

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have any
suggestions here?

The alternative would be to document that it needs to be made sure that
the database is not started up during enabling of checksums, yielding to
pilot error.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#27Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#6)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 26.12.2018, 19:43 +0100 schrieb Fabien COELHO:

It adds an (now mandatory) --action parameter that takes either verify,
enable or disable as argument.

 
I'd rather have explicit switches for verify, enable & disable, and verify 
would be the default if none is provided.

I changed that to the switches -c/--verify (-c for check as -v is taken,
should it be --check as well? I personally like verify better), 
-d/--disable and -e/--enable.

About the patch: applies, compiles, "make check" ok.

There is no documentation.

Yeah, I'll write that once the CLI is settled.

In "scan_file", I would open RW only for enable, but keep RO for verify.

OK, I've changed that.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

It seems that the control file is unlinked and then rewritten. If the
rewritting fails, or the command is interrupted, the user has a problem.

Could the control file be simply opened RW? Else, I would suggest to
rename (eg add .tmp), write the new one, then unlink the old one, so that
recovering the old state in case of problem is possible.

I have mostly taken the pg_rewind code here; if there was a function
that allowed for safe offline changes of the control file, I'd be happy
to use it but I don't think it should be this patch to invent that.

In any case, I have removed the unlink() now (not sure where that came
from), and changed it to open(O_WRONLY) same as in Michael's code and
pg_rewind.

V2 attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

offline-activation-of-checksums_V2.patchtext/x-patch; charset=UTF-8; name=offline-activation-of-checksums_V2.patchDownload
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index cc6ebb9df0..3ecce3da32 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies/enables/disables page level checksums in an offline cluster
  *
  *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -13,15 +13,16 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
-
 
 static int64 files = 0;
 static int64 blocks = 0;
@@ -31,16 +32,31 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE,
+	PG_ACTION_VERIFY
+} ChecksumAction;
+
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_VERIFY;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -A, --action   action to take on the cluster, can be set as\n"));
+	printf(_("                 \"verify\", \"enable\" and \"disable\"\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -80,6 +96,77 @@ skipfile(const char *fn)
 }
 
 static void
+updateControlFile(char *DataDir, ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_DISABLE);
+
+	/*
+	 * For good luck, apply the same static assertions as in backend's
+	 * WriteControlFile().
+	 */
+#if PG_VERSION_NUM >= 100000
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+					 "pg_control is too large for atomic disk writes");
+#endif
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+				(char *) ControlFile,
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+	fd = open(ControlFilePath, O_WRONLY | O_CREAT | PG_BINARY,
+			  pg_file_create_mode);
+	if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	errno = 0;
+	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		fprintf(stderr, _("%s: could not write control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+
+	if (close(fd) < 0)
+	{
+		fprintf(stderr, _("%s: could not close control file: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
 scan_file(const char *fn, BlockNumber segmentno)
 {
 	PGAlignedBlock buf;
@@ -87,7 +174,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	int			f;
 	BlockNumber blockno;
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_VERIFY);
+
+	if (action == PG_ACTION_VERIFY)
+		f = open(fn, O_RDONLY | PG_BINARY, 0);
+	else
+		f = open(fn, O_RDWR | PG_BINARY, 0);
+
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -117,18 +211,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_VERIFY)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_VERIFY)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -230,7 +353,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"verify", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -258,10 +384,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				action = PG_ACTION_VERIFY;
+				break;
+			case 'd':
+				action = PG_ACTION_DISABLE;
+				break;
+			case 'e':
+				action = PG_ACTION_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -282,6 +417,21 @@ main(int argc, char *argv[])
 		}
 	}
 
+	/*
+	 * Don't allow pg_checksums to be run as root, to avoid overwriting the
+	 * ownership of files in the data directory. We need only check for root
+	 * -- any other user won't have sufficient permissions to modify files in
+	 * the data directory.  This does not matter for the "verify" mode, but
+	 * let's be consistent.
+	 */
+#ifndef WIN32
+	if (geteuid() == 0)
+	{
+		fprintf(stderr, _("%s: cannot be executed by \"root\"\n"), progname);
+		exit(1);
+	}
+#endif
+
 	if (DataDir == NULL)
 	{
 		if (optind < argc)
@@ -308,6 +458,16 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in verify mode */
+	if (action != PG_ACTION_VERIFY &&
+		only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with verify action\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+						progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -319,29 +479,74 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_VERIFY)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
+	/*
+	 * When disabling data checksums, only update the control file and call it
+	 * a day.
+	 */
+	if (action == PG_ACTION_DISABLE)
+	{
+		ControlFile->data_checksum_version = 0;
+		updateControlFile(DataDir, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		printf(_("Checksums disabled in cluster\n"));
+		return 0;
+	}
+
+	/* Operate on all files */
 	scan_directory(DataDir, "global");
 	scan_directory(DataDir, "base");
 	scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Checksum operation completed\n"));
 	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
 	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+	if (action == PG_ACTION_VERIFY)
+	{
+		printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
+		if (badblocks > 0)
 		return 1;
+	}
+
+	/*
+	 * When enabling checksums, wait until the end the operation has completed
+	 * to do the switch.
+	 */
+	if (action == PG_ACTION_ENABLE)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+		updateControlFile(DataDir, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		printf(_("Checksums enabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 5250b5a728..af20c60445 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 59;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_verify_checksums',  '-c', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,49 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums
+command_ok(['pg_verify_checksums', '-e', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
+# Disable checksums again
+command_ok(['pg_verify_checksums', '-d', '-D', $pgdata],
+		   "checksums successfully disabled in cluster");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again with long option
+command_ok(['pg_verify_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_verify_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is disable
+command_fails(['pg_verify_checksums', '-d', '-r', '1234', '-D',
+			  $pgdata],
+			  "fails when relfilnodes are requested and action is not verify");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_verify_checksums', '-c', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +169,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
#28Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#26)
Re: Offline enabling/disabling of data checksums

One difference between pg_rewind and pg_checksums is that the latter
potentially runs for a longer time (or rather a non-trivial amount of
time, compared to pg_rewind), so the margin of error of another DBA
saying "oh, that DB is down, let me start it again" might be much
higher.

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have any
suggestions here?

Adding a new state to ControlFileData which would prevent it from
starting?

--
Fabien.

#29Bernd Helmle
mailings@oopsware.de
In reply to: Fabien COELHO (#28)
Re: Offline enabling/disabling of data checksums

Am Dienstag, den 08.01.2019, 15:09 +0100 schrieb Fabien COELHO:

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have
any
suggestions here?

Adding a new state to ControlFileData which would prevent it from
starting?

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

Setting the checksum flag is done after having finished all blocks, so
there is no problem. But we need to set this new flag before and reset
it afterwards, so in between strange things can happen (as the various
calls to exit() within error handling illustrates).

Bernd.

#30Michael Banck
michael.banck@credativ.de
In reply to: Bernd Helmle (#29)
Re: Offline enabling/disabling of data checksums

Am Dienstag, den 08.01.2019, 15:39 +0100 schrieb Bernd Helmle:

Am Dienstag, den 08.01.2019, 15:09 +0100 schrieb Fabien COELHO:

The question is how to reliably do this in an acceptable way? Just
faking a postmaster.pid sounds pretty hackish to me, do you have
any
suggestions here?

Adding a new state to ControlFileData which would prevent it from
starting?

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

Setting the checksum flag is done after having finished all blocks, so
there is no problem. But we need to set this new flag before and reset
it afterwards, so in between strange things can happen (as the various
calls to exit() within error handling illustrates).

It seems writing a note like "pg_checksums is running" into the
postmaster.pid would work, and would give a hopefully useful hint to
somebody trying to start Postgres while pg_checksums is running:

postgres@kohn:~$ echo  "pg_checksums running with pid 1231, cluster disabled" > data/postmaster.pid 
postgres@kohn:~$ pg_ctl -D data -l logfile start
pg_ctl: invalid data in PID file "data/postmaster.pid"
postgres@kohn:~$ echo $?
1
postgres@kohn:~$ 

If the DBA then just simply deletes postmaster.pid and starts over, well
then I call pilot error; though we could in theory change pg_ctl (or
whatever checks postmaster.pid) to emit an even more useful error
message if it encounters a "cluster is locked" keyword in it.

Not sure whether everybody likes that (or is future-proof for that
matter), but I like it better than adding a new field to the control
file, for the reasons Bernd outlined above.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#31Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Bernd Helmle (#29)
Re: Offline enabling/disabling of data checksums

Adding a new state to ControlFileData which would prevent it from
starting?

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

The usual approach is a restart with some --force option?

Setting the checksum flag is done after having finished all blocks, so
there is no problem.

There is also a problem if the db is started while the checksum is being
enabled.

But we need to set this new flag before and reset it afterwards, so in
between strange things can happen (as the various calls to exit() within
error handling illustrates).

Sure, there is some need for a backup plan if it fails and the control
file is let in a wrong state.

--
Fabien.

#32Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#30)
Re: Offline enabling/disabling of data checksums

Setting the checksum flag is done after having finished all blocks, so
there is no problem. But we need to set this new flag before and reset
it afterwards, so in between strange things can happen (as the various
calls to exit() within error handling illustrates).

It seems writing a note like "pg_checksums is running" into the
postmaster.pid would work, and would give a hopefully useful hint to
somebody trying to start Postgres while pg_checksums is running:

postgres@kohn:~$ echo  "pg_checksums running with pid 1231, cluster disabled" > data/postmaster.pid 
postgres@kohn:~$ pg_ctl -D data -l logfile start
pg_ctl: invalid data in PID file "data/postmaster.pid"
postgres@kohn:~$ echo $?
1
postgres@kohn:~$ 

Looks ok, but I'm unsure how portable it is though. What if started with
"postmater" directly?

If the DBA then just simply deletes postmaster.pid and starts over, well
then I call pilot error; though we could in theory change pg_ctl (or
whatever checks postmaster.pid) to emit an even more useful error
message if it encounters a "cluster is locked" keyword in it.

Not sure whether everybody likes that (or is future-proof for that
matter), but I like it better than adding a new field to the control
file, for the reasons Bernd outlined above.

ISTM that the point of the control file is exactly to tell what is current
the status of the cluster, so it is where this information really belongs?

AFAICS all commands take care of the status in some way to avoid
accidents.

--
Fabien.

#33Bernd Helmle
mailings@oopsware.de
In reply to: Fabien COELHO (#31)
Re: Offline enabling/disabling of data checksums

Am Dienstag, den 08.01.2019, 16:17 +0100 schrieb Fabien COELHO:

Adding a new state to ControlFileData which would prevent it from
starting?

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

The usual approach is a restart with some --force option?

Setting the checksum flag is done after having finished all blocks,
so
there is no problem.

There is also a problem if the db is started while the checksum is
being
enabled.

What i mean is that interrupting pg_verify_checksums won't leave
pg_control in a state where starting the cluster won't work without any
further interaction.

Bernd.

#34Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Bernd Helmle (#33)
Re: Offline enabling/disabling of data checksums

But then you have to make sure the control flag gets cleared in any
case pg_verify_checksums crashes somehow or gets SIGKILL'ed ...

The usual approach is a restart with some --force option?

Setting the checksum flag is done after having finished all blocks, so
there is no problem.

There is also a problem if the db is started while the checksum is
being enabled.

What i mean is that interrupting pg_verify_checksums won't leave
pg_control in a state where starting the cluster won't work without any
further interaction.

Yep, I understood that, and agree that a way out is needed, hence the
--force option suggestion.

--
Fabien.

#35Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#27)
Re: Offline enabling/disabling of data checksums

On Tue, Jan 08, 2019 at 01:03:25PM +0100, Michael Banck wrote:

I changed that to the switches -c/--verify (-c for check as -v is taken,
should it be --check as well? I personally like verify better), 
-d/--disable and -e/--enable.

Indeed we could use --check, pg_checksums --check looks repetitive
still that makes the short option more consistent with the rest.

+   printf(_("  -A, --action   action to take on the cluster, can be set as\n"));
+   printf(_("                 \"verify\", \"enable\" and \"disable\"\n"));
Not reflected yet in the --help portion.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

The OS would write blocks of 4kB out of the 8kB as that's the usual
page size, no? So this could save a lot of I/O.

I have mostly taken the pg_rewind code here; if there was a function
that allowed for safe offline changes of the control file, I'd be happy
to use it but I don't think it should be this patch to invent that.

In any case, I have removed the unlink() now (not sure where that came
from), and changed it to open(O_WRONLY) same as in Michael's code and
pg_rewind.

My own stuff in pg_checksums.c does not have an unlink(), anyway... I
think that there is room for improvement for both pg_rewind and
pg_checksums here. What about refactoring updateControlFile() and
move it to controldata_utils.c()? This centralizes the CRC check,
static assertions, file open and writes into a single place. The
backend has a similar flavor with UpdateControlFile. By combining
both we need some extra "ifdef FRONTEND" for BasicOpenFile and the
wait events which generates some noise, still both share a lot. The
backend also includes a fsync() for the control file which happens
when the file is written, but for pg_checksums and pg_rewind we just
do it in one go at the end, so we would need an extra flag to decide
if fsync should happen or not. pg_rewind has partially the right
interface by passing ControlFileData contents as an argument.

V2 attached.

+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
This may look strange, but these are needed because pg_checksums
calls some of the sync-related routines which are defined in fd.c.
Amen.
+   if (fsync(fd) != 0)
+   {
+       fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+       exit(1);
+   }
No need for that as fsync_pgdata() gets called at the end.
--
Michael
#36Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#27)
Re: Offline enabling/disabling of data checksums

I changed that to the switches -c/--verify (-c for check as -v is taken,
should it be --check as well? I personally like verify better),ᅵ
-d/--disable and -e/--enable.

I agree that checking the checksum sounds repetitive, but I think that for
consistency --check should be provided.

About the patch: applies, compiles, global & local "make check" are ok.

There is still no documentation.

I think that there is a consensus about renaming the command.

The --help string documents --action which does not exists anymore.

The code in "updateControlFile" seems to allow to create the file
(O_CREAT). I do not think that it should, it should only apply to an
existing file.

ISTM that some generalized version of this function should be in
"src/common/controldata_utils.c" instead of duplicating it from command to
command (as suggested by Michaᅵl as well).

In "scan_file" verbose output, ISTM that the checksum is more computed
than enabled on the file. It is really enabled at the cluster level in the
end.

Maybe there could be only one open call with a ?: for RO vs RW.

Non root check: as files are only manipulated RW, ISTM that there is no
reason why the ownership would be changed, so I do not think that this
constraint is useful.

There is kind of a copy paste for enabling/disabling, I'd consider
skipping the scan when not necessary and merge both branches.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

Possibly the OS would write its block size, which is not necessary the
same as postgres page size?

It seems that the control file is unlinked and then rewritten. If the
rewritting fails, or the command is interrupted, the user has a problem.

Could the control file be simply opened RW? Else, I would suggest to
rename (eg add .tmp), write the new one, then unlink the old one, so that
recovering the old state in case of problem is possible.

I have mostly taken the pg_rewind code here; if there was a function
that allowed for safe offline changes of the control file, I'd be happy
to use it but I don't think it should be this patch to invent that.

It is reinventing it somehow by duplicating the stuff anyway. I'd suggest
a separate preparatory patch to do the cleanup.

--
Fabien.

#37Andres Freund
andres@anarazel.de
In reply to: Fabien COELHO (#36)
Re: Offline enabling/disabling of data checksums

Hi,

On 2019-01-09 07:07:17 +0100, Fabien COELHO wrote:

There is still no documentation.

Michael, are you planning to address this? It'd also be useful to state
when you just don't agree with things / don't plan to address them.

Given the docs piece hasn't been addressed, and seems uncontroversial,
I'm marking this patch as returned with feedback. Please resubmit once
ready.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

Possibly the OS would write its block size, which is not necessary the same
as postgres page size?

I think it'd be a bad idea to write more granular. Very commonly that'll
turn a write operation into a read-modify-write (although caching will
often prevent that from being a problem here), and it'll be bad for
flash translation layers.

Greetings,

Andres Freund

#38Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#36)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

sorry for letting this slack.

First off, thanks for the review!

Am Mittwoch, den 09.01.2019, 07:07 +0100 schrieb Fabien COELHO:

I changed that to the switches -c/--verify (-c for check as -v is taken,
should it be --check as well? I personally like verify better), 
-d/--disable and -e/--enable.

I agree that checking the checksum sounds repetitive, but I think that for
consistency --check should be provided.

Ok then. The enum is currently called PG_ACTION_VERIFY, I changed that
to PG_ACTION_CHECK as well.

About the patch: applies, compiles, global & local "make check" are ok.

There is still no documentation.

I've added that now, though I did that blindly and have not checked the
output yet.

I think that there is a consensus about renaming the command.

I think so as well, but doing that right now will make the patch
difficult to review, so I'd prefer to leave it to the committer to do
that. 

I can submit a patch with the directory/file rename if that is
preferred.

The --help string documents --action which does not exists anymore.

Fixed that.

The code in "updateControlFile" seems to allow to create the file
(O_CREAT). I do not think that it should, it should only apply to an
existing file.

Removed that.

ISTM that some generalized version of this function should be in
"src/common/controldata_utils.c" instead of duplicating it from command to
command (as suggested by Michaël as well).

Haven't done that yet.

In "scan_file" verbose output, ISTM that the checksum is more computed
than enabled on the file. It is really enabled at the cluster level in the
end.

It's certainly not just computed but also written. It's true that it
will be only meaningful if the control file is updated accordingly at
the end, but I don't think that message is very incorrect, so left it
as-is for now.

Maybe there could be only one open call with a ?: for RO vs RW.

Done that.

Non root check: as files are only manipulated RW, ISTM that there is no
reason why the ownership would be changed, so I do not think that this
constraint is useful.

Now that we no longer unlink() pg_control, I believe you are right and I
have removed it.
`

There is kind of a copy paste for enabling/disabling, I'd consider
skipping the scan when not necessary and merge both branches.

Done so.

Also, the full page is rewritten... would it make sense to only overwrite
the checksum part itself?

So just writing the page header? I find that a bit scary and don't
expect much speedup as the OS would write the whole block anyway I
guess? I haven't touched that yet.

Possibly the OS would write its block size, which is not necessary the
same as postgres page size?

I haven't changed that yet, I think Andres was also of the opinion that
this is not necessary?

It seems that the control file is unlinked and then rewritten. If the
rewritting fails, or the command is interrupted, the user has a problem.

Could the control file be simply opened RW?

I've done that now.

New patch attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

offline-activation-of-checksums_V3.patchtext/x-patch; charset=UTF-8; name=offline-activation-of-checksums_V3.patchDownload
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..a565cb52ae 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_verify_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -25,6 +25,11 @@ PostgreSQL documentation
    <arg rep="repeat" choice="opt"><replaceable class="parameter">option</replaceable></arg>
    <group choice="opt">
     <group choice="opt">
+     <arg choice="plain"><option>--check</option></arg>
+     <arg choice="plain"><option>--disable</option></arg>
+     <arg choice="plain"><option>--enable</option></arg>
+    </group>
+    <group choice="opt">
      <arg choice="plain"><option>-D</option></arg>
      <arg choice="plain"><option>--pgdata</option></arg>
     </group>
@@ -36,10 +41,18 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_verify_checksums-1">
   <title>Description</title>
   <para>
-   <command>pg_verify_checksums</command> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_verify_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <command>pg_verify_checksums</command> enable, disable or verifies data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_verify_checksums</application> .
+   The exit status is zero if there are no checksum errors or checksum
+   enabling/disabled was successful, otherwise nonzero.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the <filename>pg_control</filename>
+   file.  
   </para>
  </refsect1>
 
@@ -61,6 +74,36 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Verify checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disable checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enable checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
       <listitem>
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..07cb5787c5 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,11 +1,11 @@
 /*
- * pg_verify_checksums
+ * pg_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies/enables/disables page level checksums in an offline cluster
  *
  *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
- *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ *	src/bin/pg_checksums/pg_checksums.c
  */
 #include "postgres_fe.h"
 
@@ -13,15 +13,16 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
-
 
 static int64 files = 0;
 static int64 blocks = 0;
@@ -31,16 +32,33 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_CHECK,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE
+} ChecksumAction;
+
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("                 \"check\", \"enable\" and \"disable\"\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -80,6 +98,77 @@ skipfile(const char *fn)
 }
 
 static void
+updateControlFile(char *DataDir, ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_DISABLE);
+
+	/*
+	 * For good luck, apply the same static assertions as in backend's
+	 * WriteControlFile().
+	 */
+#if PG_VERSION_NUM >= 100000
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+					 "pg_control is too large for atomic disk writes");
+#endif
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+				(char *) ControlFile,
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+	fd = open(ControlFilePath, O_WRONLY | PG_BINARY,
+			  pg_file_create_mode);
+	if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	errno = 0;
+	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		fprintf(stderr, _("%s: could not write control file: %s\n"),
+				progname, strerror(errno));
+		exit(1);
+	}
+
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+
+	if (close(fd) < 0)
+	{
+		fprintf(stderr, _("%s: could not close control file: %s\n"), progname, strerror(errno));
+		exit(1);
+	}
+}
+
+static void
 scan_file(const char *fn, BlockNumber segmentno)
 {
 	PGAlignedBlock buf;
@@ -87,7 +176,11 @@ scan_file(const char *fn, BlockNumber segmentno)
 	int			f;
 	BlockNumber blockno;
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_CHECK);
+
+	f = open(fn, action == PG_ACTION_ENABLE ? O_RDWR : O_RDONLY | PG_BINARY, 0);
+
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -117,18 +210,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -230,17 +352,22 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
 
 	char	   *DataDir = NULL;
+	char		pid_file[MAXPGPATH];
 	int			c;
 	int			option_index;
+	int			pidf;
 	bool		crc_ok;
 
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_checksums"));
 
 	progname = get_progname(argv[0]);
 
@@ -253,15 +380,24 @@ main(int argc, char *argv[])
 		}
 		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
 		{
-			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			puts("pg_checksums (PostgreSQL) " PG_VERSION);
 			exit(0);
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				action = PG_ACTION_CHECK;
+				break;
+			case 'd':
+				action = PG_ACTION_DISABLE;
+				break;
+			case 'e':
+				action = PG_ACTION_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -308,6 +444,16 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in check mode */
+	if (action != PG_ACTION_CHECK &&
+		only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with check action\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -319,29 +465,85 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
+		exit(1);
+	}
+
+	/* Also check for postmaster.pid file */
+	snprintf(pid_file, sizeof(pid_file), "%s/postmaster.pid", DataDir);
+	pidf = open(pid_file, O_RDONLY, 0);
+	if (pidf < 0)
+	{
+		/*
+		 * if the errno is ENOENT, there is no pid file which is what we
+		 * expect.  Otherwise, it exits but we cannot open it so exit with
+		 * failure.
+		 */
+		if (errno != ENOENT)
+		{
+			fprintf(stderr, _("%s: postmaster.pid cannot be opened for reading: %s\n"),
+					progname, strerror(errno));
+			exit(1);
+		}
+	}
+	else
+	{
+		fprintf(stderr, _("%s: postmaster.pid exists, cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	if (action == PG_ACTION_CHECK || action == PG_ACTION_ENABLE)
+	{
+		/* Operate on all files */
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
+
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (action == PG_ACTION_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			if (badblocks > 0)
+				return 1;
+		}
+	}
 
-	if (badblocks > 0)
-		return 1;
+	if (action == PG_ACTION_ENABLE || action == PG_ACTION_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version = action == PG_ACTION_ENABLE ? PG_DATA_CHECKSUM_VERSION : 0;
+		updateControlFile(DataDir, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (action == PG_ACTION_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 5250b5a728..af20c60445 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 59;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_verify_checksums',  '-c', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,49 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums
+command_ok(['pg_verify_checksums', '-e', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
+# Disable checksums again
+command_ok(['pg_verify_checksums', '-d', '-D', $pgdata],
+		   "checksums successfully disabled in cluster");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again with long option
+command_ok(['pg_verify_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_verify_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is disable
+command_fails(['pg_verify_checksums', '-d', '-r', '1234', '-D',
+			  $pgdata],
+			  "fails when relfilnodes are requested and action is not verify");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_verify_checksums', '-c', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +169,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
#39Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#38)
Re: Offline enabling/disabling of data checksums

On Sun, Feb 17, 2019 at 07:31:38PM +0100, Michael Banck wrote:

New patch attached.

- * src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * src/bin/pg_checksums/pg_checksums.c
That's lacking a rename, or this comment is incorrect.
+#if PG_VERSION_NUM >= 100000
+   StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+                    "pg_control is too large for atomic disk writes");
+#endif
This is compiled with only one version of the control file data, so
you don't need that.

Any reason why we don't refactor updateControlFile() into
controldata_utils.c? This duplicates the code, at the exception of
some details.
--
Michael

#40Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#39)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 19.02.2019, 14:02 +0900 schrieb Michael Paquier:

On Sun, Feb 17, 2019 at 07:31:38PM +0100, Michael Banck wrote:

New patch attached.

- * src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * src/bin/pg_checksums/pg_checksums.c
That's lacking a rename, or this comment is incorrect.

Right, I started the rename, but then backed off pending further
discussion whether I should submit that or whether the committer will
just do it.

I've backed those 4 in-file renames out for now.

+#if PG_VERSION_NUM >= 100000
+   StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+                    "pg_control is too large for atomic disk writes");
+#endif
This is compiled with only one version of the control file data, so
you don't need that.

Oops, yeah.

Any reason why we don't refactor updateControlFile() into
controldata_utils.c? This duplicates the code, at the exception of
some details.

Ok, I've done that now, and migrated pg_rewind as well, do you know of
any other programs that might benefit here?

This could/should probably be committed separately beforehand.

New patch attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

offline-activation-of-checksums_V4.patchtext/x-patch; charset=UTF-8; name=offline-activation-of-checksums_V4.patchDownload
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..a565cb52ae 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_verify_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -25,6 +25,11 @@ PostgreSQL documentation
    <arg rep="repeat" choice="opt"><replaceable class="parameter">option</replaceable></arg>
    <group choice="opt">
     <group choice="opt">
+     <arg choice="plain"><option>--check</option></arg>
+     <arg choice="plain"><option>--disable</option></arg>
+     <arg choice="plain"><option>--enable</option></arg>
+    </group>
+    <group choice="opt">
      <arg choice="plain"><option>-D</option></arg>
      <arg choice="plain"><option>--pgdata</option></arg>
     </group>
@@ -36,10 +41,18 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_verify_checksums-1">
   <title>Description</title>
   <para>
-   <command>pg_verify_checksums</command> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_verify_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <command>pg_verify_checksums</command> enable, disable or verifies data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_verify_checksums</application> .
+   The exit status is zero if there are no checksum errors or checksum
+   enabling/disabled was successful, otherwise nonzero.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the <filename>pg_control</filename>
+   file.  
   </para>
  </refsect1>
 
@@ -61,6 +74,36 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Verify checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disable checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enable checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
       <listitem>
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index aa753bb315..2420aef870 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -37,7 +37,6 @@ static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
 
 static void digestControlFile(ControlFileData *ControlFile, char *source,
 				  size_t size);
-static void updateControlFile(ControlFileData *ControlFile);
 static void syncTargetDirectory(void);
 static void sanityChecks(void);
 static void findCommonAncestorTimeline(XLogRecPtr *recptr, int *tliIndex);
@@ -377,7 +376,7 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
-	updateControlFile(&ControlFile_new);
+	update_controlfile(datadir_target, progname, &ControlFile_new);
 
 	pg_log(PG_PROGRESS, "syncing target data directory\n");
 	syncTargetDirectory();
@@ -667,45 +666,6 @@ digestControlFile(ControlFileData *ControlFile, char *src, size_t size)
 }
 
 /*
- * Update the target's control file.
- */
-static void
-updateControlFile(ControlFileData *ControlFile)
-{
-	char		buffer[PG_CONTROL_FILE_SIZE];
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-					 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
-	/* Recalculate CRC of control file */
-	INIT_CRC32C(ControlFile->crc);
-	COMP_CRC32C(ControlFile->crc,
-				(char *) ControlFile,
-				offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile->crc);
-
-	/*
-	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
-	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
-	 * errors when reading it.
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, ControlFile, sizeof(ControlFileData));
-
-	open_target_file("global/pg_control", false);
-
-	write_target_range(buffer, 0, PG_CONTROL_FILE_SIZE);
-
-	close_target_file();
-}
-
-/*
  * Sync target data directory to ensure that modifications are safely on disk.
  *
  * We do this once, for the whole data directory, for performance reasons.  At
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..f75bf9fcf5 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies/enables/disables page level checksums in an offline cluster
  *
  *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -13,14 +13,16 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -31,16 +33,33 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_CHECK,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE
+} ChecksumAction;
+
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables/disables/verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("                 \"check\", \"enable\" and \"disable\"\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -87,7 +106,11 @@ scan_file(const char *fn, BlockNumber segmentno)
 	int			f;
 	BlockNumber blockno;
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_CHECK);
+
+	f = open(fn, action == PG_ACTION_ENABLE ? O_RDWR : O_RDONLY | PG_BINARY, 0);
+
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -117,18 +140,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -230,14 +282,19 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
 
 	char	   *DataDir = NULL;
+	char		pid_file[MAXPGPATH];
 	int			c;
 	int			option_index;
+	int			pidf;
 	bool		crc_ok;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
@@ -258,10 +315,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				action = PG_ACTION_CHECK;
+				break;
+			case 'd':
+				action = PG_ACTION_DISABLE;
+				break;
+			case 'e':
+				action = PG_ACTION_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -308,6 +374,16 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in check mode */
+	if (action != PG_ACTION_CHECK &&
+		only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with check action\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -319,29 +395,85 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	/* Also check for postmaster.pid file */
+	snprintf(pid_file, sizeof(pid_file), "%s/postmaster.pid", DataDir);
+	pidf = open(pid_file, O_RDONLY, 0);
+	if (pidf < 0)
+	{
+		/*
+		 * if the errno is ENOENT, there is no pid file which is what we
+		 * expect.  Otherwise, it exits but we cannot open it so exit with
+		 * failure.
+		 */
+		if (errno != ENOENT)
+		{
+			fprintf(stderr, _("%s: postmaster.pid cannot be opened for reading: %s\n"),
+					progname, strerror(errno));
+			exit(1);
+		}
+	}
+	else
+	{
+		fprintf(stderr, _("%s: postmaster.pid exists, cluster must be shut down\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	if (action == PG_ACTION_CHECK || action == PG_ACTION_ENABLE)
+	{
+		/* Operate on all files */
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
+
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (action == PG_ACTION_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			if (badblocks > 0)
+				return 1;
+		}
+	}
 
-	if (badblocks > 0)
-		return 1;
+	if (action == PG_ACTION_ENABLE || action == PG_ACTION_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version = action == PG_ACTION_ENABLE ? PG_DATA_CHECKSUM_VERSION : 0;
+		update_controlfile(DataDir, progname, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (action == PG_ACTION_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 74ad5ad723..364c215cba 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 59;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_verify_checksums',  '-c', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,49 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums
+command_ok(['pg_verify_checksums', '-e', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
+# Disable checksums again
+command_ok(['pg_verify_checksums', '-d', '-D', $pgdata],
+		   "checksums successfully disabled in cluster");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again with long option
+command_ok(['pg_verify_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_verify_checksums', '-c', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_verify_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is disable
+command_fails(['pg_verify_checksums', '-d', '-r', '1234', '-D',
+			  $pgdata],
+			  "fails when relfilnodes are requested and action is not verify");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_verify_checksums', '-c', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +169,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_verify_checksums', '-c', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index abfe7065f5..3d9b190dff 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -24,12 +24,14 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
 #include "port/pg_crc32c.h"
 
 /*
- * get_controlfile(char *DataDir, const char *progname, bool *crc_ok_p)
+ * get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  *
  * Get controlfile values.  The result is returned as a palloc'd copy of the
  * control file data.
@@ -120,3 +122,75 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
 
 	return ControlFile;
 }
+
+/*
+ * update_controlfile(const char *DataDir, const char *progname,
+ *                    ControlFileData *ControlFile)
+ *
+ * Update controlfile values with the content of ControlFile.
+ */
+void
+update_controlfile(const char *DataDir, const char *progname, ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	/*
+	 * For good luck, apply the same static assertions as in backend's
+	 * WriteControlFile().
+	 */
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+					 "pg_control is too large for atomic disk writes");
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+				(char *) ControlFile,
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+	fd = open(ControlFilePath, O_WRONLY | PG_BINARY,
+			  pg_file_create_mode);
+	if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open control file: %s\n"),
+				progname, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+
+	errno = 0;
+	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+		fprintf(stderr, _("%s: could not write control file: %s\n"),
+				progname, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+
+	if (close(fd) < 0)
+	{
+		fprintf(stderr, _("%s: could not close control file: %s\n"), progname, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+}
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 0ffa2000fc..0308f131e0 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -13,5 +13,6 @@
 #include "catalog/pg_control.h"
 
 extern ControlFileData *get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p);
+extern void update_controlfile(const char *DataDir, const char *progname, ControlFileData *ControlFile);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
#41Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#40)
Re: Offline enabling/disabling of data checksums

Hallo Michael,

- * src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * src/bin/pg_checksums/pg_checksums.c
That's lacking a rename, or this comment is incorrect.

Right, I started the rename, but then backed off pending further
discussion whether I should submit that or whether the committer will
just do it.

ISTM that there is a all clear for renaming.

The renaming implies quite a few changes (eg in the documentation,
makefiles, tests) which warrants a review, so it should be a patch. Also,
ISTM that the renaming only make sense when adding the enable/disable
feature, so I'd say that it belongs to this patch. Opinions?

About v4:

Patch applies cleanly, compiles, global & local "make check" are ok.

Doc: "enable, disable or verifies" -> "checks, enables or disables"?
Spurious space: "> ." -> ">.".

As checksum are now checked, the doc could use "check" instead of
"verify", especially if there is a rename and the "verify" word
disappears.

I'd be less terse when documenting actions, eg: "Disable checksums" ->
"Disable checksums on cluster."

Doc should state that checking is the default action, eg "Check checksums
on cluster. This is the default action."

Help string could say that -c is the default action. There is a spurious
help line remaining from the previous "--action" implementation.

open: I'm positively unsure about ?: priority over |, and probably not the
only one, so I'd add parentheses around the former.

I'm at odds with the "postmaster.pid" check, which would not prevent an
issue if a cluster is started with "postmaster". I still think that the
enabling-in-progress should be stored in the cluster state.

ISTM that the cluster read/update cycle should lock somehow the control
file being modified. However other commands do not seem to do something
about it.

I do not think that enabling if already enabled or disabling or already
disable should exit(1), I think it is a no-op and should simply exit(0).

About tests: I'd run a check on a disabled cluster to check that the
command fails because disabled.

--
Fabien.

#42Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#41)
Re: Offline enabling/disabling of data checksums

On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:

The renaming implies quite a few changes (eg in the documentation,
makefiles, tests) which warrants a review, so it should be a patch. Also,
ISTM that the renaming only make sense when adding the enable/disable
feature, so I'd say that it belongs to this patch. Opinions?

I would think that the rename should happen first, but it is possible
to make git diffs less noisy as well for files copied, so merging
things is technically doable.

About tests: I'd run a check on a disabled cluster to check that the command
fails because disabled.

While I look at that... If you could split the refactoring into a
separate, first, patch as well..
--
Michael

#43Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#41)
4 attachment(s)
Re: Offline enabling/disabling of data checksums

On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:

Hallo Michael,

Okay, let's move on with these patches!

The renaming implies quite a few changes (eg in the documentation,
makefiles, tests) which warrants a review, so it should be a patch. Also,
ISTM that the renaming only make sense when adding the enable/disable
feature, so I'd say that it belongs to this patch. Opinions?

I have worked on the last v4 sent by Michael B, finishing with the
attached after review and addressed the last points raised by Fabien.
The thing is that I have been rather unhappy with a couple of things
in what was proposed, so I have finished by modifying quite a couple
of areas. The patch set is now splitted as I think is suited for
commit, with the refactoring and renaming being separated from the
actual feature:
- 0001 if a patch to refactor the routine for the control file
update. I have made it backend-aware, and we ought to be careful with
error handling, use of fds and such, something that v4 was not very
careful about.
- 0002 renames pg_verify_checksums to pg_checksums with a
straight-forward switch. Docs as well as all references to
pg_verify_checksums are updated.
- 0003 adds the new options --check, --enable and --disable, with
--check being the default as discussed.
- 0004 adds a -N/--no-sync which I think is nice for consistency with
other tools. That's also useful for the tests, and was not discussed
until now on this thread.

Patch applies cleanly, compiles, global & local "make check" are ok.

Doc: "enable, disable or verifies" -> "checks, enables or disables"?
Spurious space: "> ." -> ">.".

As checksum are now checked, the doc could use "check" instead of "verify",
especially if there is a rename and the "verify" word disappears.

That makes sense. I have fixed these, and simplified the docs a bit
to have a more simple page.

I'd be less terse when documenting actions, eg: "Disable checksums" ->
"Disable checksums on cluster."

The former is fine in my opinion.

Doc should state that checking is the default action, eg "Check checksums on
cluster. This is the default action."

Check.

Help string could say that -c is the default action. There is a spurious
help line remaining from the previous "--action" implementation.

Yeah, we should. Added.

open: I'm positively unsure about ?: priority over |, and probably not the
only one, so I'd add parentheses around the former.

Yes, I agree that the current code is hard to decrypt. So reworked
with a separate variable in scan_file, and added extra parenthesis in
the part which updates the control file.

I'm at odds with the "postmaster.pid" check, which would not prevent an
issue if a cluster is started with "postmaster". I still think that the
enabling-in-progress should be stored in the cluster state.

ISTM that the cluster read/update cycle should lock somehow the control file
being modified. However other commands do not seem to do something about it.

I am still not on board for adding more complexity in this area, at
least not for this stuff and for the purpose of this thread, because
this can happen at various degrees for various configurations for ages
and not only for checksums. Also, I think that we still have to see
users complain about that. Here are some scenarios where this can
happen:
- A base backup partially written. pg_basebackup limits this risk but
it could still be possible to see a case where partially-written data
folder. And base backups are around for many years.
- pg_rewind, and the tool is in the tree since 9.5, the tool is
actually available on github since 9.3.

I do not think that enabling if already enabled or disabling or already
disable should exit(1), I think it is a no-op and should simply exit(0).

We already issue exit(1) when attempting to verify checksums on a
cluster where they are disabled. So I agree with Michael B's point of
Issuing an error in such cases. I think also that this makes handling
for operators easier.

About tests: I'd run a check on a disabled cluster to check that the command
fails because disabled.

Makes sense. Added. We need a test also for the case of successive
runs with --enable.

Here are also some notes from my side.
- There was no need to complicate the synopsis of the docs.
- usage() included still references to --action and indentation was a
bit too long at the top.
- There were no tests for disabling checksums, so I have added some.
- We should check that the combination of --enable and -r fails.
- Tests use only long options, that's better for readability.
- Improved comments in tests.
- Better to check for "data_checksum_version > 0" if --enable is
used. That's more portable long-term if more checksum versions are
added.
- The check on postmaster.pid is actually not necessary as we already
know that the cluster has been shutdown cleanly per the state of the
control file. I agree that there could be a small race condition
here, and we could discuss that in a different thread if need be as
such things could be improved for other frontend tools as well. For
now I am taking the most simple approach.

(Still need to indent the patches before commit, but that's a nit.)
--
Michael

Attachments:

0001-Refactor-routine-for-update-of-control-file.patchtext/x-diff; charset=us-asciiDownload
From 35b089e29f78704f40b0cbc1d2a912e1eb649869 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 11 Mar 2019 11:17:07 +0900
Subject: [PATCH 1/4] Refactor routine for update of control file

This adds a new routine to src/common/ which is compatible with both the
frontend and backend code able to update a control file's contents.
This is now getting used only by pg_rewind, and some upcoming patches
for offline checksums will make use of it.

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 src/bin/pg_rewind/pg_rewind.c          | 43 +-----------
 src/common/controldata_utils.c         | 93 ++++++++++++++++++++++++++
 src/include/common/controldata_utils.h |  6 +-
 3 files changed, 100 insertions(+), 42 deletions(-)

diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index aa753bb315..7f1d6bf48a 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -24,6 +24,7 @@
 #include "access/xlog_internal.h"
 #include "catalog/catversion.h"
 #include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
 #include "common/file_perm.h"
 #include "common/file_utils.h"
 #include "common/restricted_token.h"
@@ -37,7 +38,6 @@ static void createBackupLabel(XLogRecPtr startpoint, TimeLineID starttli,
 
 static void digestControlFile(ControlFileData *ControlFile, char *source,
 				  size_t size);
-static void updateControlFile(ControlFileData *ControlFile);
 static void syncTargetDirectory(void);
 static void sanityChecks(void);
 static void findCommonAncestorTimeline(XLogRecPtr *recptr, int *tliIndex);
@@ -377,7 +377,7 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
-	updateControlFile(&ControlFile_new);
+	update_controlfile(datadir_target, progname, &ControlFile_new);
 
 	pg_log(PG_PROGRESS, "syncing target data directory\n");
 	syncTargetDirectory();
@@ -666,45 +666,6 @@ digestControlFile(ControlFileData *ControlFile, char *src, size_t size)
 	checkControlFile(ControlFile);
 }
 
-/*
- * Update the target's control file.
- */
-static void
-updateControlFile(ControlFileData *ControlFile)
-{
-	char		buffer[PG_CONTROL_FILE_SIZE];
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-					 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
-	/* Recalculate CRC of control file */
-	INIT_CRC32C(ControlFile->crc);
-	COMP_CRC32C(ControlFile->crc,
-				(char *) ControlFile,
-				offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile->crc);
-
-	/*
-	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
-	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
-	 * errors when reading it.
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, ControlFile, sizeof(ControlFileData));
-
-	open_target_file("global/pg_control", false);
-
-	write_target_range(buffer, 0, PG_CONTROL_FILE_SIZE);
-
-	close_target_file();
-}
-
 /*
  * Sync target data directory to ensure that modifications are safely on disk.
  *
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 6289a4343a..1e44f36765 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -24,8 +24,10 @@
 #include <sys/stat.h>
 #include <fcntl.h>
 
+#include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
 #include "port/pg_crc32c.h"
 #ifndef FRONTEND
 #include "storage/fd.h"
@@ -137,3 +139,94 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
 
 	return ControlFile;
 }
+
+/*
+ * update_controlfile
+ *
+ * Update controlfile values with the contents given by caller.  The
+ * contents to write are included in "ControlFile".  Note that it is up
+ * to the caller to fsync the updated file.
+ */
+void
+update_controlfile(const char *DataDir, const char *progname,
+				   ControlFileData *ControlFile)
+{
+	int			fd;
+	char		buffer[PG_CONTROL_FILE_SIZE];
+	char		ControlFilePath[MAXPGPATH];
+
+	/*
+	 * Apply the same static assertions as in backend's WriteControlFile().
+	 */
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
+					 "pg_control is too large for atomic disk writes");
+	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
+					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
+
+	/* Recalculate CRC of control file */
+	INIT_CRC32C(ControlFile->crc);
+	COMP_CRC32C(ControlFile->crc,
+				(char *) ControlFile,
+				offsetof(ControlFileData, crc));
+	FIN_CRC32C(ControlFile->crc);
+
+	/*
+	 * Write out PG_CONTROL_FILE_SIZE bytes into pg_control by zero-padding
+	 * the excess over sizeof(ControlFileData), to avoid premature EOF related
+	 * errors when reading it.
+	 */
+	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
+	memcpy(buffer, ControlFile, sizeof(ControlFileData));
+
+	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
+
+#ifndef FRONTEND
+	if ((fd = OpenTransientFile(ControlFilePath, O_WRONLY | PG_BINARY)) == -1)
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not open file \"%s\": %m",
+						ControlFilePath)));
+#else
+	if ((fd = open(ControlFilePath, O_WRONLY | PG_BINARY,
+				   pg_file_create_mode)) == -1)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
+				progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+#endif
+
+	errno = 0;
+	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
+	{
+		/* if write didn't set errno, assume problem is no disk space */
+		if (errno == 0)
+			errno = ENOSPC;
+
+#ifndef FRONTEND
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not write file \"%s\": %m",
+						ControlFilePath)));
+#else
+		fprintf(stderr, _("%s: could not write \"%s\": %s\n"),
+				progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+#endif
+	}
+
+#ifndef FRONTEND
+	if (CloseTransientFile(fd))
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not close file \"%s\": %m",
+						ControlFilePath)));
+#else
+	if (close(fd) < 0)
+	{
+		fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
+				progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+#endif
+}
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 0ffa2000fc..95317ebacf 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,6 +12,10 @@
 
 #include "catalog/pg_control.h"
 
-extern ControlFileData *get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p);
+extern ControlFileData *get_controlfile(const char *DataDir,
+										const char *progname,
+										bool *crc_ok_p);
+extern void update_controlfile(const char *DataDir, const char *progname,
+							   ControlFileData *ControlFile);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
-- 
2.20.1

0002-Rename-pg_verify_checksums-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From 494b9967c3c6dc58c248a46ebc093f1061565e7e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 11 Mar 2019 12:43:18 +0900
Subject: [PATCH 2/4] Rename pg_verify_checksums to pg_checksums

The current name is too generic and focuses only on verifying checksums.
More options to control checksums for an offline cluster are going to be
added.  Documentation as well as all past references to the tool are
updated.

Author: Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/allfiles.sgml                |  2 +-
 ...erify_checksums.sgml => pg_checksums.sgml} | 24 +++++++++----------
 doc/src/sgml/reference.sgml                   |  2 +-
 src/backend/replication/basebackup.c          |  2 +-
 src/bin/Makefile                              |  2 +-
 src/bin/initdb/t/001_initdb.pl                |  8 +++----
 src/bin/pg_checksums/.gitignore               |  3 +++
 .../Makefile                                  | 20 ++++++++--------
 src/bin/pg_checksums/nls.mk                   |  4 ++++
 .../pg_checksums.c}                           | 17 +++++++------
 src/bin/pg_checksums/t/001_basic.pl           |  8 +++++++
 .../t/002_actions.pl                          | 18 +++++++-------
 src/bin/pg_verify_checksums/.gitignore        |  3 ---
 src/bin/pg_verify_checksums/nls.mk            |  4 ----
 src/bin/pg_verify_checksums/t/001_basic.pl    |  8 -------
 15 files changed, 64 insertions(+), 61 deletions(-)
 rename doc/src/sgml/ref/{pg_verify_checksums.sgml => pg_checksums.sgml} (79%)
 create mode 100644 src/bin/pg_checksums/.gitignore
 rename src/bin/{pg_verify_checksums => pg_checksums}/Makefile (53%)
 create mode 100644 src/bin/pg_checksums/nls.mk
 rename src/bin/{pg_verify_checksums/pg_verify_checksums.c => pg_checksums/pg_checksums.c} (94%)
 create mode 100644 src/bin/pg_checksums/t/001_basic.pl
 rename src/bin/{pg_verify_checksums => pg_checksums}/t/002_actions.pl (89%)
 delete mode 100644 src/bin/pg_verify_checksums/.gitignore
 delete mode 100644 src/bin/pg_verify_checksums/nls.mk
 delete mode 100644 src/bin/pg_verify_checksums/t/001_basic.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index c81c87ef41..f10d42ed84 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -199,6 +199,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgarchivecleanup   SYSTEM "pgarchivecleanup.sgml">
 <!ENTITY pgBasebackup       SYSTEM "pg_basebackup.sgml">
 <!ENTITY pgbench            SYSTEM "pgbench.sgml">
+<!ENTITY pgChecksums  SYSTEM "pg_checksums.sgml">
 <!ENTITY pgConfig           SYSTEM "pg_config-ref.sgml">
 <!ENTITY pgControldata      SYSTEM "pg_controldata.sgml">
 <!ENTITY pgCtl              SYSTEM "pg_ctl-ref.sgml">
@@ -210,7 +211,6 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
-<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
similarity index 79%
rename from doc/src/sgml/ref/pg_verify_checksums.sgml
rename to doc/src/sgml/ref/pg_checksums.sgml
index 905b8f1222..6eec88afab 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -1,27 +1,27 @@
 <!--
-doc/src/sgml/ref/pg_verify_checksums.sgml
+doc/src/sgml/ref/pg_checksums.sgml
 PostgreSQL documentation
 -->
 
-<refentry id="pgverifychecksums">
- <indexterm zone="pgverifychecksums">
-  <primary>pg_verify_checksums</primary>
+<refentry id="pgchecksums">
+ <indexterm zone="pgchecksums">
+  <primary>pg_checksums</primary>
  </indexterm>
 
  <refmeta>
-  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <refentrytitle><application>pg_checksums</application></refentrytitle>
   <manvolnum>1</manvolnum>
   <refmiscinfo>Application</refmiscinfo>
  </refmeta>
 
  <refnamediv>
-  <refname>pg_verify_checksums</refname>
+  <refname>pg_checksums</refname>
   <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
   <cmdsynopsis>
-   <command>pg_verify_checksums</command>
+   <command>pg_checksums</command>
    <arg rep="repeat" choice="opt"><replaceable class="parameter">option</replaceable></arg>
    <group choice="opt">
     <group choice="opt">
@@ -33,12 +33,12 @@ PostgreSQL documentation
   </cmdsynopsis>
  </refsynopsisdiv>
 
- <refsect1 id="r1-app-pg_verify_checksums-1">
+ <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <command>pg_verify_checksums</command> verifies data checksums in a
+   <command>pg_checksums</command> verifies data checksums in a
    <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_verify_checksums</application>.
+   down cleanly before running <application>pg_checksums</application>.
    The exit status is zero if there are no checksum errors, otherwise nonzero.
   </para>
  </refsect1>
@@ -84,7 +84,7 @@ PostgreSQL documentation
        <term><option>--version</option></term>
        <listitem>
        <para>
-        Print the <application>pg_verify_checksums</application> version and exit.
+        Print the <application>pg_checksums</application> version and exit.
        </para>
        </listitem>
      </varlistentry>
@@ -94,7 +94,7 @@ PostgreSQL documentation
       <term><option>--help</option></term>
        <listitem>
         <para>
-         Show help about <application>pg_verify_checksums</application> command line
+         Show help about <application>pg_checksums</application> command line
          arguments, and exit.
         </para>
        </listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index db4f4167e3..cef09dd38b 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -276,6 +276,7 @@
 
    &initdb;
    &pgarchivecleanup;
+   &pgChecksums;
    &pgControldata;
    &pgCtl;
    &pgResetwal;
@@ -283,7 +284,6 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
-   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6c324a6661..537f09e342 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -190,7 +190,7 @@ static const char *excludeFiles[] =
 /*
  * List of files excluded from checksum validation.
  *
- * Note: this list should be kept in sync with what pg_verify_checksums.c
+ * Note: this list should be kept in sync with what pg_checksums.c
  * includes.
  */
 static const char *const noChecksumFiles[] = {
diff --git a/src/bin/Makefile b/src/bin/Makefile
index c66bfa887e..903e58121f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
 	initdb \
 	pg_archivecleanup \
 	pg_basebackup \
+	pg_checksums \
 	pg_config \
 	pg_controldata \
 	pg_ctl \
@@ -26,7 +27,6 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
-	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 759779adb2..8dfcd8752a 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -63,12 +63,12 @@ mkdir $datadir;
 command_like(['pg_controldata', $datadir],
 			 qr/Data page checksum version:.*0/,
 			 'checksums are disabled in control file');
-# pg_verify_checksums fails with checksums disabled by default.  This is
-# not part of the tests included in pg_verify_checksums to save from
+# pg_checksums fails with checksums disabled by default.  This is
+# not part of the tests included in pg_checksums to save from
 # the creation of an extra instance.
 command_fails(
-	[ 'pg_verify_checksums', '-D', $datadir],
-	"pg_verify_checksums fails with data checksum disabled");
+	[ 'pg_checksums', '-D', $datadir],
+	"pg_checksums fails with data checksum disabled");
 
 command_ok([ 'initdb', '-S', $datadir ], 'sync only');
 command_fails([ 'initdb', $datadir ], 'existing data directory');
diff --git a/src/bin/pg_checksums/.gitignore b/src/bin/pg_checksums/.gitignore
new file mode 100644
index 0000000000..7888625094
--- /dev/null
+++ b/src/bin/pg_checksums/.gitignore
@@ -0,0 +1,3 @@
+/pg_checksums
+
+/tmp_check/
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_checksums/Makefile
similarity index 53%
rename from src/bin/pg_verify_checksums/Makefile
rename to src/bin/pg_checksums/Makefile
index ab6d3ea9e2..278b7a0f2e 100644
--- a/src/bin/pg_verify_checksums/Makefile
+++ b/src/bin/pg_checksums/Makefile
@@ -1,38 +1,38 @@
 #-------------------------------------------------------------------------
 #
-# Makefile for src/bin/pg_verify_checksums
+# Makefile for src/bin/pg_checksums
 #
 # Copyright (c) 1998-2019, PostgreSQL Global Development Group
 #
-# src/bin/pg_verify_checksums/Makefile
+# src/bin/pg_checksums/Makefile
 #
 #-------------------------------------------------------------------------
 
-PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGFILEDESC = "pg_checksums - verify data checksums in an offline cluster"
 PGAPPICON=win32
 
-subdir = src/bin/pg_verify_checksums
+subdir = src/bin/pg_checksums
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= pg_verify_checksums.o $(WIN32RES)
+OBJS= pg_checksums.o $(WIN32RES)
 
-all: pg_verify_checksums
+all: pg_checksums
 
-pg_verify_checksums: $(OBJS) | submake-libpgport
+pg_checksums: $(OBJS) | submake-libpgport
 	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 install: all installdirs
-	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+	$(INSTALL_PROGRAM) pg_checksums$(X) '$(DESTDIR)$(bindir)/pg_checksums$(X)'
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
 
 uninstall:
-	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+	rm -f '$(DESTDIR)$(bindir)/pg_checksums$(X)'
 
 clean distclean maintainer-clean:
-	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -f pg_checksums$(X) $(OBJS)
 	rm -rf tmp_check
 
 check:
diff --git a/src/bin/pg_checksums/nls.mk b/src/bin/pg_checksums/nls.mk
new file mode 100644
index 0000000000..2748b18ef7
--- /dev/null
+++ b/src/bin/pg_checksums/nls.mk
@@ -0,0 +1,4 @@
+# src/bin/pg_checksums/nls.mk
+CATALOG_NAME     = pg_checksums
+AVAIL_LANGUAGES  =
+GETTEXT_FILES    = pg_checksums.c
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_checksums/pg_checksums.c
similarity index 94%
rename from src/bin/pg_verify_checksums/pg_verify_checksums.c
rename to src/bin/pg_checksums/pg_checksums.c
index 511262ab5f..f95e39f31e 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,12 +1,15 @@
-/*
- * pg_verify_checksums
+/*-------------------------------------------------------------------------
+ * pg_checksums.c
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in an offline cluster.
  *
- *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
+ * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
- *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * IDENTIFICATION
+ *	  src/bin/pg_checksums/pg_checksums.c
+ *-------------------------------------------------------------------------
  */
+
 #include "postgres_fe.h"
 
 #include <dirent.h>
@@ -240,7 +243,7 @@ main(int argc, char *argv[])
 	int			option_index;
 	bool		crc_ok;
 
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_checksums"));
 
 	progname = get_progname(argv[0]);
 
@@ -253,7 +256,7 @@ main(int argc, char *argv[])
 		}
 		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
 		{
-			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			puts("pg_checksums (PostgreSQL) " PG_VERSION);
 			exit(0);
 		}
 	}
diff --git a/src/bin/pg_checksums/t/001_basic.pl b/src/bin/pg_checksums/t/001_basic.pl
new file mode 100644
index 0000000000..4334c80606
--- /dev/null
+++ b/src/bin/pg_checksums/t/001_basic.pl
@@ -0,0 +1,8 @@
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_checksums');
+program_version_ok('pg_checksums');
+program_options_handling_ok('pg_checksums');
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
similarity index 89%
rename from src/bin/pg_verify_checksums/t/002_actions.pl
rename to src/bin/pg_checksums/t/002_actions.pl
index 74ad5ad723..97284e8930 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -1,4 +1,4 @@
-# Do basic sanity checks supported by pg_verify_checksums using
+# Do basic sanity checks supported by pg_checksums using
 # an initialized cluster.
 
 use strict;
@@ -38,7 +38,7 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
+	command_ok(['pg_checksums',  '-D', $pgdata,
 		'-r', $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
@@ -49,7 +49,7 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
 								$relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
@@ -57,7 +57,7 @@ sub check_relation_corruption
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,7 +67,7 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
@@ -101,12 +101,12 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums',  '-D', $pgdata],
 		   "succeeds with offline cluster");
 
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums',  '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -121,7 +121,7 @@ $node->safe_psql('postgres',
 	"CREATE TABLESPACE ts_corrupt LOCATION '$tablespace_dir';");
 check_relation_corruption($node, 'corrupt2', 'ts_corrupt');
 
-# Utility routine to check that pg_verify_checksums is able to detect
+# Utility routine to check that pg_checksums is able to detect
 # correctly-named relation files filled with some corrupted data.
 sub fail_corrupt
 {
@@ -133,7 +133,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
deleted file mode 100644
index 0e5e569a54..0000000000
--- a/src/bin/pg_verify_checksums/.gitignore
+++ /dev/null
@@ -1,3 +0,0 @@
-/pg_verify_checksums
-
-/tmp_check/
diff --git a/src/bin/pg_verify_checksums/nls.mk b/src/bin/pg_verify_checksums/nls.mk
deleted file mode 100644
index 893efaf0f0..0000000000
--- a/src/bin/pg_verify_checksums/nls.mk
+++ /dev/null
@@ -1,4 +0,0 @@
-# src/bin/pg_verify_checksums/nls.mk
-CATALOG_NAME     = pg_verify_checksums
-AVAIL_LANGUAGES  =
-GETTEXT_FILES    = pg_verify_checksums.c
diff --git a/src/bin/pg_verify_checksums/t/001_basic.pl b/src/bin/pg_verify_checksums/t/001_basic.pl
deleted file mode 100644
index 1fa2e12db2..0000000000
--- a/src/bin/pg_verify_checksums/t/001_basic.pl
+++ /dev/null
@@ -1,8 +0,0 @@
-use strict;
-use warnings;
-use TestLib;
-use Test::More tests => 8;
-
-program_help_ok('pg_verify_checksums');
-program_version_ok('pg_verify_checksums');
-program_options_handling_ok('pg_verify_checksums');
-- 
2.20.1

0003-Add-options-to-enable-and-disable-checksums-in-pg_ch.patchtext/x-diff; charset=us-asciiDownload
From d86b8a1fa87f0244d8eaaffe21de2290e7732d76 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 11 Mar 2019 13:45:44 +0900
Subject: [PATCH 3/4] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  50 +++++++-
 src/bin/pg_checksums/pg_checksums.c   | 171 ++++++++++++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl |  76 +++++++++---
 3 files changed, 251 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6eec88afab..89f2f2b0e9 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,19 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <command>pg_checksums</command> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <command>pg_checksums</command> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the file
+   <filename>pg_control</filename>.
   </para>
  </refsect1>
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index f95e39f31e..366f397e3f 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  * pg_checksums.c
  *
- * Verifies page level checksums in an offline cluster.
+ * Checks, enables or disables page level checksums in an offline cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -16,14 +16,15 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -34,16 +35,40 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_CHECK,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE
+} ChecksumAction;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL\n"), progname);
+	printf(_("database cluster.\n\n"));
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums.  This is the default\n"));
+	printf(_("                         mode if nothing is specified.\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -89,8 +114,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_CHECK);
+
+	flags = (action == PG_ACTION_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -120,18 +151,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -233,7 +293,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -261,10 +324,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				action = PG_ACTION_CHECK;
+				break;
+			case 'd':
+				action = PG_ACTION_DISABLE;
+				break;
+			case 'e':
+				action = PG_ACTION_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -311,6 +383,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (action != PG_ACTION_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -322,29 +403,67 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/* Operate on all files if checking or enabling checksums */
+	if (action == PG_ACTION_CHECK || action == PG_ACTION_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (action == PG_ACTION_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
-		return 1;
+			if (badblocks > 0)
+				return 1;
+		}
+	}
+
+	/*
+	 * Finally update the control file, flushing the data directory at the
+	 * end.
+	 */
+	if (action == PG_ACTION_ENABLE || action == PG_ACTION_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version =
+			(action == PG_ACTION_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+		update_controlfile(DataDir, progname, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (action == PG_ACTION_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..58be8d5cb6 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilnodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilnodes are requested and action is --disable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
-- 
2.20.1

0004-Add-option-N-no-sync-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From bd3a64fea68d67eb349e12cf222fd86919419bd1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 11 Mar 2019 13:46:09 +0900
Subject: [PATCH 4/4] Add option -N/--no-sync to pg_checksums

This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    | 16 ++++++++++++++++
 src/bin/pg_checksums/pg_checksums.c   | 11 +++++++++--
 src/bin/pg_checksums/t/002_actions.pl | 10 +++++-----
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 89f2f2b0e9..2819335224 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_checksums</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_checksums</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the updated data folder corrupt.  Generally, this option is useful
+        for testing but should not be used on a production installation.
+        This option has no effect when using <literal>--check</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 366f397e3f..36f66ba872 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -33,6 +33,7 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static bool do_sync = true;
 static bool verbose = false;
 
 typedef enum
@@ -69,6 +70,7 @@ usage(void)
 	printf(_("                         mode if nothing is specified.\n"));
 	printf(_("  -d, --disable          disable data checksums\n"));
 	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("  -N, --no-sync          do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -297,6 +299,7 @@ main(int argc, char *argv[])
 		{"pgdata", required_argument, NULL, 'D'},
 		{"disable", no_argument, NULL, 'd'},
 		{"enable", no_argument, NULL, 'e'},
+		{"no-sync", no_argument, NULL, 'N'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -324,7 +327,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNr:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -337,6 +340,9 @@ main(int argc, char *argv[])
 			case 'e':
 				action = PG_ACTION_ENABLE;
 				break;
+			case 'N':
+				do_sync = false;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -456,7 +462,8 @@ main(int argc, char *argv[])
 		ControlFile->data_checksum_version =
 			(action == PG_ACTION_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (do_sync)
+			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (action == PG_ACTION_ENABLE)
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 58be8d5cb6..ff9bb70040 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -101,11 +101,11 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Enable checksums.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	   "checksums successfully enabled in cluster");
 
 # Successive attempt to enable checksums fails.
-command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+command_fails(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	      "enabling checksums fails if already enabled");
 
 # Control file should know that checksums are enabled.
@@ -113,12 +113,12 @@ command_like(['pg_controldata', $pgdata],
 	     qr/Data page checksum version:.*1/,
 	     'checksums enabled in control file');
 
-# Disable checksums again.
+# Disable checksums again.  Flush result here as that should be cheap.
 command_ok(['pg_checksums', '--disable', '-D', $pgdata],
 	   "checksums successfully disabled in cluster");
 
 # Successive attempt to disable checksums fails.
-command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+command_fails(['pg_checksums', '--disable', '--no-sync', '-D', $pgdata],
 	      "disabling checksums fails if already disabled");
 
 # Control file should know that checksums are disabled.
@@ -127,7 +127,7 @@ command_like(['pg_controldata', $pgdata],
 		 'checksums disabled in control file');
 
 # Enable checksums again for follow-up tests.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 		   "checksums successfully enabled in cluster");
 
 # Control file should know that checksums are enabled.
-- 
2.20.1

#44Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#43)
Re: Offline enabling/disabling of data checksums

Hi Michael,

Am Montag, den 11.03.2019, 13:53 +0900 schrieb Michael Paquier:

On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote:

Hallo Michael,

Okay, let's move on with these patches!

Wow cool. I was going to go back to these and split them up similar to
how you did it now that the online verification patch seems to be
done/stuck for v12, but great that you beat me to it.

I had a quick look over the patch and your changes and it LGTM.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#45Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#44)
Re: Offline enabling/disabling of data checksums

Hi,

Am Montag, den 11.03.2019, 11:11 +0100 schrieb Michael Banck:

I had a quick look over the patch and your changes and it LGTM.

One thing: you (Michael) should be co-author for patch #3 as I took some
of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
_checksums

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In reply to: Michael Paquier (#43)
Re: Offline enabling/disabling of data checksums

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Hello

I review latest patchset. I have one big question: Is pg_checksums safe for cross-versions operations? Even with update_controlfile call? Currently i am able to enable checksums in pg11 cluster with pg_checksums compiled on HEAD. Is this expected? I didn't notice any version-specific check in code.

And few small notes:

<command>pg_checksums</command> checks, enables or disables data checksums

maybe better is <application>, not <command>?

+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL\n"), progname);
+	printf(_("database cluster.\n\n"));

I doubt this is good line formatting for translation purposes.

+	printf(_("  -c, --check            check data checksums.  This is the default\n"));
+	printf(_("                         mode if nothing is specified.\n"));

same. For example pg_basebackup uses different multiline style:

printf(_(" -r, --max-rate=RATE maximum transfer rate to transfer data directory\n"
" (in kB/s, or use suffix \"k\" or \"M\")\n"));

+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilnodes are requested and action is --disable");

action is "--enable" here ;-)

if (badblocks > 0)
return 1;

Small question: why return 1 instead of exit(1)?

<refentry id="pgchecksums">
<indexterm zone="pgchecksums">

How about use "app-pgchecksums" similar to other applications?

regards, Sergei

#47Michael Paquier
michael@paquier.xyz
In reply to: Sergei Kornilov (#46)
4 attachment(s)
Re: Offline enabling/disabling of data checksums

On Mon, Mar 11, 2019 at 02:11:11PM +0000, Sergei Kornilov wrote:

I review latest patchset.

Thanks, I have committed the refactoring of src/common/ as a first
step.

I have one big question: Is pg_checksums
safe for cross-versions operations? Even with update_controlfile
call? Currently i am able to enable checksums in pg11 cluster with
pg_checksums compiled on HEAD. Is this expected? I didn't notice any
version-specific check in code.

This depends on the version of the control file, and it happens that
we don't check for it, so that's a good catch from your side. Not
doing the check is a bad idea as ControlFileData should be compatible
between the binary and the data read. I am attaching a fresh 0001
which should be back-patched down to v11 as a bug fix. An advantage
of that, which is similar to pg_rewind, is that if the control file
version does not change in a major version, then the tool can be
used. And the data folder layer is unlikely going to change..

<command>pg_checksums</command> checks, enables or disables data checksums

maybe better is <application>, not <command>?

Fixed, as part of the renaming patch.

+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL\n"), progname);
+	printf(_("database cluster.\n\n"));

I doubt this is good line formatting for translation purposes.

+	printf(_("  -c, --check            check data checksums.  This is the default\n"));
+	printf(_("                         mode if nothing is specified.\n"));

same. For example pg_basebackup uses different multiline style:

Oh, good points. I forgot about that point of view.

+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilnodes are requested and action is --disable");

action is "--enable" here ;-)

s/relfilnodes/relfilenodes/ while on it.

if (badblocks > 0)gi
return 1;

Small question: why return 1 instead of exit(1)?

OK, let's fix that on the way as part of the renaming.

<refentry id="pgchecksums">
<indexterm zone="pgchecksums">

How about use "app-pgchecksums" similar to other applications?

Yes, I was wondering about that one when doing the renaming, but did
not bother much for consistency. Anyway switched, you are right.

Attached is an updated patch set, minus the refactoring for
src/common/.
--
Michael

Attachments:

v2-0001-Ensure-version-compatibility-of-pg_verify_checksu.patchtext/x-diff; charset=us-asciiDownload
From ef3e4f9b0dbff7deaaf19cb303f71c7883193a72 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 12 Mar 2019 10:43:47 +0900
Subject: [PATCH v2 1/4] Ensure version compatibility of pg_verify_checksums

pg_verify_checksums performs a read of the control file, and the data it
fetches should be from a data folder compatible with the major version
of Postgres the binary has been compiled with.

Reported-by: Sergei Kornilov
Author: Michael Paquier
Discussion: https://postgr.es/m/155231347133.16480.11453587097036807558.pgcf@coridan.postgresql.org
Backpatch-through: 11, where the tool has been introduced.
---
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..4c7c055b31 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -316,6 +316,13 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
+	{
+		fprintf(stderr, _("%s: cluster is not compatible with this version of pg_verify_checksums\n"),
+				progname);
+		exit(1);
+	}
+
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-- 
2.20.1

v2-0002-Rename-pg_verify_checksums-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From deee88e2d95f69b360580fbdd5300d2c34cda58b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 12 Mar 2019 11:02:05 +0900
Subject: [PATCH v2 2/4] Rename pg_verify_checksums to pg_checksums

The current tool name is too restrictive and focuses only on verifying
checksums.  As more options to control checksums for an offline cluster
are planned to be added, switch to a more generic name.  Documentation
as well as all past references to the tool are updated.

Author: Michael Paquier
Reviewed-by: Michael Banck, Seigei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/allfiles.sgml                |  2 +-
 ...erify_checksums.sgml => pg_checksums.sgml} | 24 +++++++++----------
 doc/src/sgml/reference.sgml                   |  2 +-
 src/backend/replication/basebackup.c          |  2 +-
 src/bin/Makefile                              |  2 +-
 src/bin/initdb/t/001_initdb.pl                |  8 +++----
 src/bin/pg_checksums/.gitignore               |  3 +++
 .../Makefile                                  | 20 ++++++++--------
 src/bin/pg_checksums/nls.mk                   |  4 ++++
 .../pg_checksums.c}                           | 20 +++++++++-------
 src/bin/pg_checksums/t/001_basic.pl           |  8 +++++++
 .../t/002_actions.pl                          | 18 +++++++-------
 src/bin/pg_verify_checksums/.gitignore        |  3 ---
 src/bin/pg_verify_checksums/nls.mk            |  4 ----
 src/bin/pg_verify_checksums/t/001_basic.pl    |  8 -------
 15 files changed, 66 insertions(+), 62 deletions(-)
 rename doc/src/sgml/ref/{pg_verify_checksums.sgml => pg_checksums.sgml} (79%)
 create mode 100644 src/bin/pg_checksums/.gitignore
 rename src/bin/{pg_verify_checksums => pg_checksums}/Makefile (53%)
 create mode 100644 src/bin/pg_checksums/nls.mk
 rename src/bin/{pg_verify_checksums/pg_verify_checksums.c => pg_checksums/pg_checksums.c} (94%)
 create mode 100644 src/bin/pg_checksums/t/001_basic.pl
 rename src/bin/{pg_verify_checksums => pg_checksums}/t/002_actions.pl (89%)
 delete mode 100644 src/bin/pg_verify_checksums/.gitignore
 delete mode 100644 src/bin/pg_verify_checksums/nls.mk
 delete mode 100644 src/bin/pg_verify_checksums/t/001_basic.pl

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index c81c87ef41..8d91f3529e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -199,6 +199,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgarchivecleanup   SYSTEM "pgarchivecleanup.sgml">
 <!ENTITY pgBasebackup       SYSTEM "pg_basebackup.sgml">
 <!ENTITY pgbench            SYSTEM "pgbench.sgml">
+<!ENTITY pgChecksums        SYSTEM "pg_checksums.sgml">
 <!ENTITY pgConfig           SYSTEM "pg_config-ref.sgml">
 <!ENTITY pgControldata      SYSTEM "pg_controldata.sgml">
 <!ENTITY pgCtl              SYSTEM "pg_ctl-ref.sgml">
@@ -210,7 +211,6 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
-<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
similarity index 79%
rename from doc/src/sgml/ref/pg_verify_checksums.sgml
rename to doc/src/sgml/ref/pg_checksums.sgml
index 905b8f1222..6a47dda683 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -1,27 +1,27 @@
 <!--
-doc/src/sgml/ref/pg_verify_checksums.sgml
+doc/src/sgml/ref/pg_checksums.sgml
 PostgreSQL documentation
 -->
 
-<refentry id="pgverifychecksums">
- <indexterm zone="pgverifychecksums">
-  <primary>pg_verify_checksums</primary>
+<refentry id="app-pgchecksums">
+ <indexterm zone="app-pgchecksums">
+  <primary>pg_checksums</primary>
  </indexterm>
 
  <refmeta>
-  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <refentrytitle><application>pg_checksums</application></refentrytitle>
   <manvolnum>1</manvolnum>
   <refmiscinfo>Application</refmiscinfo>
  </refmeta>
 
  <refnamediv>
-  <refname>pg_verify_checksums</refname>
+  <refname>pg_checksums</refname>
   <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
   <cmdsynopsis>
-   <command>pg_verify_checksums</command>
+   <command>pg_checksums</command>
    <arg rep="repeat" choice="opt"><replaceable class="parameter">option</replaceable></arg>
    <group choice="opt">
     <group choice="opt">
@@ -33,12 +33,12 @@ PostgreSQL documentation
   </cmdsynopsis>
  </refsynopsisdiv>
 
- <refsect1 id="r1-app-pg_verify_checksums-1">
+ <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <command>pg_verify_checksums</command> verifies data checksums in a
+   <application>pg_checksums</application> verifies data checksums in a
    <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_verify_checksums</application>.
+   down cleanly before running <application>pg_checksums</application>.
    The exit status is zero if there are no checksum errors, otherwise nonzero.
   </para>
  </refsect1>
@@ -84,7 +84,7 @@ PostgreSQL documentation
        <term><option>--version</option></term>
        <listitem>
        <para>
-        Print the <application>pg_verify_checksums</application> version and exit.
+        Print the <application>pg_checksums</application> version and exit.
        </para>
        </listitem>
      </varlistentry>
@@ -94,7 +94,7 @@ PostgreSQL documentation
       <term><option>--help</option></term>
        <listitem>
         <para>
-         Show help about <application>pg_verify_checksums</application> command line
+         Show help about <application>pg_checksums</application> command line
          arguments, and exit.
         </para>
        </listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index db4f4167e3..cef09dd38b 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -276,6 +276,7 @@
 
    &initdb;
    &pgarchivecleanup;
+   &pgChecksums;
    &pgControldata;
    &pgCtl;
    &pgResetwal;
@@ -283,7 +284,6 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
-   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6c324a6661..537f09e342 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -190,7 +190,7 @@ static const char *excludeFiles[] =
 /*
  * List of files excluded from checksum validation.
  *
- * Note: this list should be kept in sync with what pg_verify_checksums.c
+ * Note: this list should be kept in sync with what pg_checksums.c
  * includes.
  */
 static const char *const noChecksumFiles[] = {
diff --git a/src/bin/Makefile b/src/bin/Makefile
index c66bfa887e..903e58121f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
 	initdb \
 	pg_archivecleanup \
 	pg_basebackup \
+	pg_checksums \
 	pg_config \
 	pg_controldata \
 	pg_ctl \
@@ -26,7 +27,6 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
-	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 759779adb2..8dfcd8752a 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -63,12 +63,12 @@ mkdir $datadir;
 command_like(['pg_controldata', $datadir],
 			 qr/Data page checksum version:.*0/,
 			 'checksums are disabled in control file');
-# pg_verify_checksums fails with checksums disabled by default.  This is
-# not part of the tests included in pg_verify_checksums to save from
+# pg_checksums fails with checksums disabled by default.  This is
+# not part of the tests included in pg_checksums to save from
 # the creation of an extra instance.
 command_fails(
-	[ 'pg_verify_checksums', '-D', $datadir],
-	"pg_verify_checksums fails with data checksum disabled");
+	[ 'pg_checksums', '-D', $datadir],
+	"pg_checksums fails with data checksum disabled");
 
 command_ok([ 'initdb', '-S', $datadir ], 'sync only');
 command_fails([ 'initdb', $datadir ], 'existing data directory');
diff --git a/src/bin/pg_checksums/.gitignore b/src/bin/pg_checksums/.gitignore
new file mode 100644
index 0000000000..7888625094
--- /dev/null
+++ b/src/bin/pg_checksums/.gitignore
@@ -0,0 +1,3 @@
+/pg_checksums
+
+/tmp_check/
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_checksums/Makefile
similarity index 53%
rename from src/bin/pg_verify_checksums/Makefile
rename to src/bin/pg_checksums/Makefile
index ab6d3ea9e2..278b7a0f2e 100644
--- a/src/bin/pg_verify_checksums/Makefile
+++ b/src/bin/pg_checksums/Makefile
@@ -1,38 +1,38 @@
 #-------------------------------------------------------------------------
 #
-# Makefile for src/bin/pg_verify_checksums
+# Makefile for src/bin/pg_checksums
 #
 # Copyright (c) 1998-2019, PostgreSQL Global Development Group
 #
-# src/bin/pg_verify_checksums/Makefile
+# src/bin/pg_checksums/Makefile
 #
 #-------------------------------------------------------------------------
 
-PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGFILEDESC = "pg_checksums - verify data checksums in an offline cluster"
 PGAPPICON=win32
 
-subdir = src/bin/pg_verify_checksums
+subdir = src/bin/pg_checksums
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS= pg_verify_checksums.o $(WIN32RES)
+OBJS= pg_checksums.o $(WIN32RES)
 
-all: pg_verify_checksums
+all: pg_checksums
 
-pg_verify_checksums: $(OBJS) | submake-libpgport
+pg_checksums: $(OBJS) | submake-libpgport
 	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
 
 install: all installdirs
-	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+	$(INSTALL_PROGRAM) pg_checksums$(X) '$(DESTDIR)$(bindir)/pg_checksums$(X)'
 
 installdirs:
 	$(MKDIR_P) '$(DESTDIR)$(bindir)'
 
 uninstall:
-	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+	rm -f '$(DESTDIR)$(bindir)/pg_checksums$(X)'
 
 clean distclean maintainer-clean:
-	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -f pg_checksums$(X) $(OBJS)
 	rm -rf tmp_check
 
 check:
diff --git a/src/bin/pg_checksums/nls.mk b/src/bin/pg_checksums/nls.mk
new file mode 100644
index 0000000000..2748b18ef7
--- /dev/null
+++ b/src/bin/pg_checksums/nls.mk
@@ -0,0 +1,4 @@
+# src/bin/pg_checksums/nls.mk
+CATALOG_NAME     = pg_checksums
+AVAIL_LANGUAGES  =
+GETTEXT_FILES    = pg_checksums.c
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_checksums/pg_checksums.c
similarity index 94%
rename from src/bin/pg_verify_checksums/pg_verify_checksums.c
rename to src/bin/pg_checksums/pg_checksums.c
index 4c7c055b31..6571c34211 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,12 +1,16 @@
-/*
- * pg_verify_checksums
+/*-------------------------------------------------------------------------
  *
- * Verifies page level checksums in an offline cluster
+ * pg_checksums.c
+ *	  Verifies page level checksums in an offline cluster.
  *
- *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
+ * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
- *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ * IDENTIFICATION
+ *	  src/bin/pg_checksums/pg_checksums.c
+ *
+ *-------------------------------------------------------------------------
  */
+
 #include "postgres_fe.h"
 
 #include <dirent.h>
@@ -240,7 +244,7 @@ main(int argc, char *argv[])
 	int			option_index;
 	bool		crc_ok;
 
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_checksums"));
 
 	progname = get_progname(argv[0]);
 
@@ -253,7 +257,7 @@ main(int argc, char *argv[])
 		}
 		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
 		{
-			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			puts("pg_checksums (PostgreSQL) " PG_VERSION);
 			exit(0);
 		}
 	}
@@ -318,7 +322,7 @@ main(int argc, char *argv[])
 
 	if (ControlFile->pg_control_version != PG_CONTROL_VERSION)
 	{
-		fprintf(stderr, _("%s: cluster is not compatible with this version of pg_verify_checksums\n"),
+		fprintf(stderr, _("%s: cluster is not compatible with this version of pg_checksums\n"),
 				progname);
 		exit(1);
 	}
diff --git a/src/bin/pg_checksums/t/001_basic.pl b/src/bin/pg_checksums/t/001_basic.pl
new file mode 100644
index 0000000000..4334c80606
--- /dev/null
+++ b/src/bin/pg_checksums/t/001_basic.pl
@@ -0,0 +1,8 @@
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_checksums');
+program_version_ok('pg_checksums');
+program_options_handling_ok('pg_checksums');
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
similarity index 89%
rename from src/bin/pg_verify_checksums/t/002_actions.pl
rename to src/bin/pg_checksums/t/002_actions.pl
index 74ad5ad723..97284e8930 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -1,4 +1,4 @@
-# Do basic sanity checks supported by pg_verify_checksums using
+# Do basic sanity checks supported by pg_checksums using
 # an initialized cluster.
 
 use strict;
@@ -38,7 +38,7 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_verify_checksums',  '-D', $pgdata,
+	command_ok(['pg_checksums',  '-D', $pgdata,
 		'-r', $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
@@ -49,7 +49,7 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
 								$relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
@@ -57,7 +57,7 @@ sub check_relation_corruption
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,7 +67,7 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_verify_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
@@ -101,12 +101,12 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Checksums pass on a newly-created cluster
-command_ok(['pg_verify_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums',  '-D', $pgdata],
 		   "succeeds with offline cluster");
 
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums',  '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -121,7 +121,7 @@ $node->safe_psql('postgres',
 	"CREATE TABLESPACE ts_corrupt LOCATION '$tablespace_dir';");
 check_relation_corruption($node, 'corrupt2', 'ts_corrupt');
 
-# Utility routine to check that pg_verify_checksums is able to detect
+# Utility routine to check that pg_checksums is able to detect
 # correctly-named relation files filled with some corrupted data.
 sub fail_corrupt
 {
@@ -133,7 +133,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
deleted file mode 100644
index 0e5e569a54..0000000000
--- a/src/bin/pg_verify_checksums/.gitignore
+++ /dev/null
@@ -1,3 +0,0 @@
-/pg_verify_checksums
-
-/tmp_check/
diff --git a/src/bin/pg_verify_checksums/nls.mk b/src/bin/pg_verify_checksums/nls.mk
deleted file mode 100644
index 893efaf0f0..0000000000
--- a/src/bin/pg_verify_checksums/nls.mk
+++ /dev/null
@@ -1,4 +0,0 @@
-# src/bin/pg_verify_checksums/nls.mk
-CATALOG_NAME     = pg_verify_checksums
-AVAIL_LANGUAGES  =
-GETTEXT_FILES    = pg_verify_checksums.c
diff --git a/src/bin/pg_verify_checksums/t/001_basic.pl b/src/bin/pg_verify_checksums/t/001_basic.pl
deleted file mode 100644
index 1fa2e12db2..0000000000
--- a/src/bin/pg_verify_checksums/t/001_basic.pl
+++ /dev/null
@@ -1,8 +0,0 @@
-use strict;
-use warnings;
-use TestLib;
-use Test::More tests => 8;
-
-program_help_ok('pg_verify_checksums');
-program_version_ok('pg_verify_checksums');
-program_options_handling_ok('pg_verify_checksums');
-- 
2.20.1

v2-0003-Add-options-to-enable-and-disable-checksums-in-pg.patchtext/x-diff; charset=us-asciiDownload
From f07d248aa82d4838d23c45238ce2147d31b8727d Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 12 Mar 2019 11:12:03 +0900
Subject: [PATCH v2 3/4] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  50 +++++++-
 src/bin/pg_checksums/pg_checksums.c   | 171 ++++++++++++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl |  76 +++++++++---
 3 files changed, 251 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..776f7be477 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,19 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the file
+   <filename>pg_control</filename>.
   </para>
  </refsect1>
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 6571c34211..7d9c44c361 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -35,16 +37,39 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_ACTION_CHECK,
+	PG_ACTION_DISABLE,
+	PG_ACTION_ENABLE
+} ChecksumAction;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static ChecksumAction action = PG_ACTION_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("                         This is the default mode if nothing is specified.\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -90,8 +115,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(action == PG_ACTION_ENABLE ||
+		   action == PG_ACTION_CHECK);
+
+	flags = (action == PG_ACTION_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +152,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (action == PG_ACTION_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (action == PG_ACTION_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (action == PG_ACTION_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (action == PG_ACTION_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,7 +294,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -262,10 +325,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				action = PG_ACTION_CHECK;
+				break;
+			case 'd':
+				action = PG_ACTION_DISABLE;
+				break;
+			case 'e':
+				action = PG_ACTION_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +384,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (action != PG_ACTION_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -330,29 +411,67 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		action == PG_ACTION_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		action == PG_ACTION_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/* Operate on all files if checking or enabling checksums */
+	if (action == PG_ACTION_CHECK || action == PG_ACTION_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (action == PG_ACTION_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
-		return 1;
+			if (badblocks > 0)
+				return 1;
+		}
+	}
+
+	/*
+	 * Finally update the control file, flushing the data directory at the
+	 * end.
+	 */
+	if (action == PG_ACTION_ENABLE || action == PG_ACTION_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version =
+			(action == PG_ACTION_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+		update_controlfile(DataDir, progname, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (action == PG_ACTION_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
-- 
2.20.1

v2-0004-Add-option-N-no-sync-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From 9ac23f54948f697d7b8591e4880dab354718790e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 11 Mar 2019 13:46:09 +0900
Subject: [PATCH v2 4/4] Add option -N/--no-sync to pg_checksums

This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    | 16 ++++++++++++++++
 src/bin/pg_checksums/pg_checksums.c   | 11 +++++++++--
 src/bin/pg_checksums/t/002_actions.pl | 10 +++++-----
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 776f7be477..c3ccbf4eb7 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_checksums</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_checksums</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the updated data folder corrupt.  Generally, this option is useful
+        for testing but should not be used on a production installation.
+        This option has no effect when using <literal>--check</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 7d9c44c361..8edbeefc91 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -35,6 +35,7 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static bool do_sync = true;
 static bool verbose = false;
 
 typedef enum
@@ -70,6 +71,7 @@ usage(void)
 	printf(_("                         This is the default mode if nothing is specified.\n"));
 	printf(_("  -d, --disable          disable data checksums\n"));
 	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("  -N, --no-sync          do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -298,6 +300,7 @@ main(int argc, char *argv[])
 		{"pgdata", required_argument, NULL, 'D'},
 		{"disable", no_argument, NULL, 'd'},
 		{"enable", no_argument, NULL, 'e'},
+		{"no-sync", no_argument, NULL, 'N'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -325,7 +328,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNr:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -338,6 +341,9 @@ main(int argc, char *argv[])
 			case 'e':
 				action = PG_ACTION_ENABLE;
 				break;
+			case 'N':
+				do_sync = false;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -464,7 +470,8 @@ main(int argc, char *argv[])
 		ControlFile->data_checksum_version =
 			(action == PG_ACTION_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (do_sync)
+			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (action == PG_ACTION_ENABLE)
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 3ab18a6b89..41575c5245 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -101,11 +101,11 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Enable checksums.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	   "checksums successfully enabled in cluster");
 
 # Successive attempt to enable checksums fails.
-command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+command_fails(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	      "enabling checksums fails if already enabled");
 
 # Control file should know that checksums are enabled.
@@ -113,12 +113,12 @@ command_like(['pg_controldata', $pgdata],
 	     qr/Data page checksum version:.*1/,
 	     'checksums enabled in control file');
 
-# Disable checksums again.
+# Disable checksums again.  Flush result here as that should be cheap.
 command_ok(['pg_checksums', '--disable', '-D', $pgdata],
 	   "checksums successfully disabled in cluster");
 
 # Successive attempt to disable checksums fails.
-command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+command_fails(['pg_checksums', '--disable', '--no-sync', '-D', $pgdata],
 	      "disabling checksums fails if already disabled");
 
 # Control file should know that checksums are disabled.
@@ -127,7 +127,7 @@ command_like(['pg_controldata', $pgdata],
 		 'checksums disabled in control file');
 
 # Enable checksums again for follow-up tests.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 		   "checksums successfully enabled in cluster");
 
 # Control file should know that checksums are enabled.
-- 
2.20.1

#48Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#45)
Re: Offline enabling/disabling of data checksums

On Mon, Mar 11, 2019 at 11:19:50AM +0100, Michael Banck wrote:

One thing: you (Michael) should be co-author for patch #3 as I took some
of your code from https://github.com/michaelpq/pg_plugins/tree/master/pg
_checksums

OK, thanks for the notice. I was not sure as we actually developped
the same fork.
--
Michael

In reply to: Michael Paquier (#47)
Re: Offline enabling/disabling of data checksums

Hi

Not doing the check is a bad idea as ControlFileData should be compatible
between the binary and the data read. I am attaching a fresh 0001
which should be back-patched down to v11 as a bug fix.

Looks fine. We need add few words to documentation?

 if (badblocks > 0)
        return 1;

 Small question: why return 1 instead of exit(1)?

OK, let's fix that on the way as part of the renaming.

Was not changed?..

I have no new notes after reading updated patchset.

regards, Sergei

#50Michael Banck
michael.banck@credativ.de
In reply to: Sergei Kornilov (#46)
Re: Offline enabling/disabling of data checksums

Hi,

Am Montag, den 11.03.2019, 14:11 +0000 schrieb Sergei Kornilov:

if (badblocks > 0)
return 1;

Small question: why return 1 instead of exit(1)?

I have a feeling it is project policy to return 0 from main(), and
exit(1) if a program aborts with an error.

In the above case, the program finishes more-or-less as intended (no
abort), but due to errors found on the way, does not return with 0.

I don't mind either way and probably exit(1) makes more sense, but I
wanted to explain why it is like that.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

In reply to: Michael Banck (#50)
Re: Offline enabling/disabling of data checksums

Hello

Thank you for explain. I thought so.

PS: I am not sure for now about patch status in CF app. Did not changed status

regards, Sergei

#52Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#50)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 12, 2019 at 11:13:46AM +0100, Michael Banck wrote:

I have a feeling it is project policy to return 0 from main(), and
exit(1) if a program aborts with an error.

Yes, it does not matter much in practice, but other tools just don't
do that. Note that changing it can be actually annoying for a
backpatch if we don't have the --enable/--disable part, because git is
actually smart enough to detect the file renaming across branches as
far as I tried, but as we are refactoring this code anyway for
--enable and --disable let's just do it, that's cleaner.
--
Michael

#53Fabien COELHO
fabien.coelho@mines-paristech.fr
In reply to: Michael Paquier (#43)
Re: Offline enabling/disabling of data checksums

Bonjour Michaël,

Here is a partial review:

- 0001 if a patch to refactor the routine for the control file
update. I have made it backend-aware, and we ought to be careful with
error handling, use of fds and such, something that v4 was not very
careful about.

This refactoring patch is ok for me: applies, compiles, check is ok.

However, Am I right in thinking that the change should propagate to other
tools which manipulate the control file, eg pg_resetwal, postmaster… So
that there would be only one shared API to update the control file?

- 0002 renames pg_verify_checksums to pg_checksums with a
straight-forward switch. Docs as well as all references to
pg_verify_checksums are updated.

Looks ok to me. Applies, compiles, checks are ok. Doc build is ok.

I'm wondering whether there should be something done so that the
inter-release documentation navigation works? Should the section keep the
former name? Is it managed by hand somewhere else? Maybe it would require
to keep the refsect1 id, or to duplicate it, not sure.

In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on
the SYSTEM keyword, which is not fellowed by the patch.

- 0003 adds the new options --check, --enable and --disable, with
--check being the default as discussed.

Looks like the patch I already reviewed, but I'll look at it in details
later.

"If enabling or disabling checksums, the exit status is nonzero if the
operation failed."

However:

  +       if (ControlFile->data_checksum_version == 0 &&
  +               action == PG_ACTION_DISABLE)
  +       {
  +               fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
  +               exit(1);
  +       }

This seem contradictory to me: you want to disable checksum, and they are
already disabled, so nothing is needed. How does that qualifies as a
"failed" operation?

Further review will come later.

- 0004 adds a -N/--no-sync which I think is nice for consistency with
other tools. That's also useful for the tests, and was not discussed
until now on this thread.

Indeed. I do not immediately see the use case where no syncing would be a
good idea. I can see why it would be a bad idea. So I'm not sure of the
concept.

--
Fabien.

#54Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#53)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 12, 2019 at 10:08:19PM +0100, Fabien COELHO wrote:

This refactoring patch is ok for me: applies, compiles, check is ok.
However, Am I right in thinking that the change should propagate to other
tools which manipulate the control file, eg pg_resetwal, postmaster… So that
there would be only one shared API to update the control file?

Yes, that would be nice, for now I have focused. For pg_resetwal yes
we could do it easily. Would you like to send a patch?

I'm wondering whether there should be something done so that the
inter-release documentation navigation works? Should the section keep the
former name? Is it managed by hand somewhere else? Maybe it would require to
keep the refsect1 id, or to duplicate it, not sure.

When it came to the renaming of pg_receivexlog to pg_receivewal, we
did not actually do anything in the core code, and let the magic
happen on pgsql-www. I have also pinged pgsql-packagers about the
renaming and it is not really an issue on their side. So I have
committed the renaming to pg_checksums as well. So now remains only
the new options.

In "doc/src/sgml/ref/allfiles.sgml" there seems to be a habit to align on
the SYSTEM keyword, which is not fellowed by the patch.

Sure. I sent an updated patch to actually fix that, and also address
a couple of other side things I noticed on the way like the top
refentry in the docs or the header format at the top of
pg_checksums.c as we are on tweaking the area.

This seem contradictory to me: you want to disable checksum, and they are
already disabled, so nothing is needed. How does that qualifies as a
"failed" operation?

If the operation is automated, then a proper reaction can be done if
multiple attempts are done. Of course, I am fine to tune things one
way or the other depending on the opinion of the crowd here. From the
opinions gathered, I can see that (Michael * 2) prefer failing with
exit(1), while (Fabien * 1) would like to just do exit(0).

Further review will come later.

Thanks, Fabien!

Indeed. I do not immediately see the use case where no syncing would be a
good idea. I can see why it would be a bad idea. So I'm not sure of the
concept.

To leverage the buildfarm effort I think this one is worth it. Or we
finish to fsync the data folder a couple of times, which would make
the small-ish buildfarm machines suffer more than they need.

I am going to send a rebased patch set of the remaining things at the
top of the discussion as well.
--
Michael

#55Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#52)
2 attachment(s)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 12, 2019 at 09:44:03PM +0900, Michael Paquier wrote:

Yes, it does not matter much in practice, but other tools just don't
do that. Note that changing it can be actually annoying for a
backpatch if we don't have the --enable/--disable part, because git is
actually smart enough to detect the file renaming across branches as
far as I tried, but as we are refactoring this code anyway for
--enable and --disable let's just do it, that's cleaner.

Okay, please find attached a rebased patch set. I have committed 0001
which adds version checks for the control file, and the renaming
part 0002. What remains now is the addition of the new options, and
--no-sync. The "return 1" stuff has been switched to exit(1) while on
it, and is part of 0003.

Now the set of patches is:
- 0001, add --enable and --disable. I have tweaked a bit the patch so
as "action" is replaced by "mode" which is more consistent with other
tools like pg_ctl. pg_indent was also complaining about one of the
new enum structures.
- 0002, add --no-sync.

Thanks,
--
Michael

Attachments:

v3-0001-Add-options-to-enable-and-disable-checksums-in-pg.patchtext/x-diff; charset=us-asciiDownload
From e8118b7063a1a615dfc24f376ab3998cda67330a Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 13 Mar 2019 11:12:53 +0900
Subject: [PATCH v3 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  50 +++++++-
 src/bin/pg_checksums/pg_checksums.c   | 171 ++++++++++++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl |  76 +++++++++---
 src/tools/pgindent/typedefs.list      |   1 +
 4 files changed, 252 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..776f7be477 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,19 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling will only update the file
+   <filename>pg_control</filename>.
   </para>
  </refsect1>
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 6571c34211..a7c39ac99a 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -35,16 +37,39 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static PgChecksumMode mode = PG_MODE_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("                         This is the default mode if nothing is specified.\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -90,8 +115,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(mode == PG_MODE_ENABLE ||
+		   mode == PG_MODE_CHECK);
+
+	flags = (mode == PG_MODE_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +152,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (mode == PG_MODE_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (mode == PG_MODE_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (mode == PG_MODE_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (mode == PG_MODE_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,7 +294,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -262,10 +325,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				mode = PG_MODE_CHECK;
+				break;
+			case 'd':
+				mode = PG_MODE_DISABLE;
+				break;
+			case 'e':
+				mode = PG_MODE_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +384,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (mode != PG_MODE_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -330,29 +411,67 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		mode == PG_MODE_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/* Operate on all files if checking or enabling checksums */
+	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (mode == PG_MODE_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
-		return 1;
+			if (badblocks > 0)
+				exit(1);
+		}
+	}
+
+	/*
+	 * Finally update the control file, flushing the data directory at the
+	 * end.
+	 */
+	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version =
+			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+		update_controlfile(DataDir, progname, ControlFile);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (mode == PG_MODE_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b821df9e71..e86fecb849 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1697,6 +1697,7 @@ PgBenchExprType
 PgBenchFunction
 PgBenchValue
 PgBenchValueType
+PgChecksumMode
 PgFdwAnalyzeState
 PgFdwDirectModifyState
 PgFdwModifyState
-- 
2.20.1

v3-0002-Add-option-N-no-sync-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From c3d6ec7349792f1c94071533d4bfd38f8856a8e1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 13 Mar 2019 11:13:47 +0900
Subject: [PATCH v3 2/2] Add option -N/--no-sync to pg_checksums

This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    | 16 ++++++++++++++++
 src/bin/pg_checksums/pg_checksums.c   | 11 +++++++++--
 src/bin/pg_checksums/t/002_actions.pl | 10 +++++-----
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 776f7be477..c3ccbf4eb7 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_checksums</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_checksums</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the updated data folder corrupt.  Generally, this option is useful
+        for testing but should not be used on a production installation.
+        This option has no effect when using <literal>--check</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index a7c39ac99a..0e464299f3 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -35,6 +35,7 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static bool do_sync = true;
 static bool verbose = false;
 
 typedef enum
@@ -70,6 +71,7 @@ usage(void)
 	printf(_("                         This is the default mode if nothing is specified.\n"));
 	printf(_("  -d, --disable          disable data checksums\n"));
 	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("  -N, --no-sync          do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -298,6 +300,7 @@ main(int argc, char *argv[])
 		{"pgdata", required_argument, NULL, 'D'},
 		{"disable", no_argument, NULL, 'd'},
 		{"enable", no_argument, NULL, 'e'},
+		{"no-sync", no_argument, NULL, 'N'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -325,7 +328,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNr:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -338,6 +341,9 @@ main(int argc, char *argv[])
 			case 'e':
 				mode = PG_MODE_ENABLE;
 				break;
+			case 'N':
+				do_sync = false;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -464,7 +470,8 @@ main(int argc, char *argv[])
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (do_sync)
+			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (mode == PG_MODE_ENABLE)
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 3ab18a6b89..41575c5245 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -101,11 +101,11 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Enable checksums.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	   "checksums successfully enabled in cluster");
 
 # Successive attempt to enable checksums fails.
-command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+command_fails(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	      "enabling checksums fails if already enabled");
 
 # Control file should know that checksums are enabled.
@@ -113,12 +113,12 @@ command_like(['pg_controldata', $pgdata],
 	     qr/Data page checksum version:.*1/,
 	     'checksums enabled in control file');
 
-# Disable checksums again.
+# Disable checksums again.  Flush result here as that should be cheap.
 command_ok(['pg_checksums', '--disable', '-D', $pgdata],
 	   "checksums successfully disabled in cluster");
 
 # Successive attempt to disable checksums fails.
-command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+command_fails(['pg_checksums', '--disable', '--no-sync', '-D', $pgdata],
 	      "disabling checksums fails if already disabled");
 
 # Control file should know that checksums are disabled.
@@ -127,7 +127,7 @@ command_like(['pg_controldata', $pgdata],
 		 'checksums disabled in control file');
 
 # Enable checksums again for follow-up tests.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 		   "checksums successfully enabled in cluster");
 
 # Control file should know that checksums are enabled.
-- 
2.20.1

#56Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#54)
Re: Offline enabling/disabling of data checksums

Bonjour Michaël,

Yes, that would be nice, for now I have focused. For pg_resetwal yes
we could do it easily. Would you like to send a patch?

I probably can do that before next Monday. I'll prioritize reviewing the
latest instance of this patch, though.

This seem contradictory to me: you want to disable checksum, and they are
already disabled, so nothing is needed. How does that qualifies as a
"failed" operation?

If the operation is automated, then a proper reaction can be done if
multiple attempts are done. Of course, I am fine to tune things one
way or the other depending on the opinion of the crowd here. From the
opinions gathered, I can see that (Michael * 2) prefer failing with
exit(1), while (Fabien * 1) would like to just do exit(0).

Yep, that sums it up:-).

Indeed. I do not immediately see the use case where no syncing would be a
good idea. I can see why it would be a bad idea. So I'm not sure of the
concept.

To leverage the buildfarm effort I think this one is worth it. Or we
finish to fsync the data folder a couple of times, which would make
the small-ish buildfarm machines suffer more than they need.

Ok for the particular use-case, provided that the documentation is very
clear about the risks, which is the case, so fine with me wrt to the
feature.

--
Fabien.

#57Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#56)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 07:18:32AM +0100, Fabien COELHO wrote:

I probably can do that before next Monday. I'll prioritize reviewing the
latest instance of this patch, though.

Thanks. The core code of the feature has not really changed with the
last reviews, except for the tweaks in the variable names and I think
that it's in a rather committable shape.
--
Michael

#58Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#55)
Re: Offline enabling/disabling of data checksums

Michaᅵl-san,

Now the set of patches is:
- 0001, add --enable and --disable. I have tweaked a bit the patch so
as "action" is replaced by "mode" which is more consistent with other
tools like pg_ctl. pg_indent was also complaining about one of the
new enum structures.

Patch applies cleanly, compiles, various make check ok, doc build ok.

I'm still at odds with the exit(1) behavior when there is nothing to do.

If this behavior is kept, I think that the documentation needs to be
improved because "failed" does not describe a no-op-was-needed to me.

"""
If enabling or disabling checksums, the exit status is nonzero if the
operation failed.
"""

Maybe: "... if the operation failed or the requested setting is already
active." would at least describe clearly the implemented behavior.

  +       printf(_("  -c, --check            check data checksums\n"));
  +       printf(_("                         This is the default mode if nothing is specified.\n"));

I'm not sure of the punctuation logic on the help line: the first sentence
does not end with a ".". I could not find an instance of this style in
other help on pg commands. I'd suggest "check data checksums (default)"
would work around and be more in line with other commands help.

I see a significant locking issue, which I discussed on other threads
without convincing anyone. I could do the following things:

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new
file, then started a "pg_checksums --enable" on a stopped cluster, then
started the cluster while the enabling was in progress, then connected and
updated data. Hmmm. Then I stopped while the slow enabling was still in
progress. Then I could also run a fast pg_checksums --enable in parallel,
overtaking the first one... then when the fast one finished, I started the
cluster again. When the slow one finished, it overwrote the control file,
I had a running cluster with a control file which did not say so, so I
could disable the checksum. Hmmm again. Ok, I could not generate a
inconsistent state because on stopping the cluster the cluster file is
overwritten with the initial state from the point of view of postmater,
but it does not look good.

I do not think it is a good thing that two commands can write to the data
directory at the same time, really.

About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.

- 0002, add --no-sync.

Patch applies cleanly, compiles, various make checks are ok, doc build ok.

Fine with me.

--
Fabien.

#59Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#58)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:

I'm not sure of the punctuation logic on the help line: the first sentence
does not end with a ".". I could not find an instance of this style in other
help on pg commands. I'd suggest "check data checksums (default)" would work
around and be more in line with other commands help.

Good idea, let's do that.

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
then started a "pg_checksums --enable" on a stopped cluster, then started
the cluster while the enabling was in progress, then connected and updated
data.

Well, yes, don't do that. You can get into the same class of problems
while running pg_rewind, pg_basebackup or even pg_resetwal once the
initial control file check is done for each one of these tools.

I do not think it is a good thing that two commands can write to the data
directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel. That would be... Interesting.

About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.

If --disable is used, the control file gets updated at the end without
doing anything else. If the host crashes, it could be possible that
the control file has checksums enabled or disabled. If the state is
disabled, then well it succeeded. If the state is enabled, then the
control file is still correct, because all the other blocks still have
checksums set.

if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end. The case you
are referring to here is in fsync_pgdata(), not pg_checksums actually,
because you could reach the same state after a simple initdb. It
could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.
--
Michael

#60Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#59)
Re: Offline enabling/disabling of data checksums

I do not think it is a good thing that two commands can write to the data
directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel. That would be... Interesting.

Yep, I'm trying again to suggest that this kind of thing should be
prevented. It seems that I'm pretty unconvincing.

About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.

if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end. [...]
It could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.

Yep. That is the issue I think is preventable by fsyncing updated data
*then* writing & syncing the control file, and that should be done by
pg_checksums.

--
Fabien.

#61Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#60)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 10:44:03AM +0100, Fabien COELHO wrote:

Yep. That is the issue I think is preventable by fsyncing updated data
*then* writing & syncing the control file, and that should be done by
pg_checksums.

Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed. In my opinion, the take here
is that we log something after the sync of the whole data folder is
done, so as in the event of a crash an operator can make sure that
everything has happened.
--
Michael

#62Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#59)
Re: Offline enabling/disabling of data checksums

Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:

On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:

I'm not sure of the punctuation logic on the help line: the first sentence
does not end with a ".". I could not find an instance of this style in other
help on pg commands. I'd suggest "check data checksums (default)" would work
around and be more in line with other commands help.

Good idea, let's do that.

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
then started a "pg_checksums --enable" on a stopped cluster, then started
the cluster while the enabling was in progress, then connected and updated
data.

Well, yes, don't do that. You can get into the same class of problems
while running pg_rewind, pg_basebackup or even pg_resetwal once the
initial control file check is done for each one of these tools.

I do not think it is a good thing that two commands can write to the data
directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel. That would be... Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

About fsync-ing: ISTM that it is possible that the control file is written
to disk while data are still not written, so a failure in between would
leave the cluster with an inconsistent state. I think that it should fsync
the data *then* update the control file and fsync again on that one.

if --enable is used, we fsync the whole data directory after writing
all the blocks and updating the control file at the end. The case you
are referring to here is in fsync_pgdata(), not pg_checksums actually,
because you could reach the same state after a simple initdb.

But in the initdb case you don't have any valuable data in the instance
yet.

It
could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.

Would you? I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#63Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#61)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hello,

Yep. That is the issue I think is preventable by fsyncing updated data
*then* writing & syncing the control file, and that should be done by
pg_checksums.

Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed.

So it is basically prone to the same potential issue?

In my opinion, the take here is that we log something after the sync of
the whole data folder is done, so as in the event of a crash an operator
can make sure that everything has happened.

I do not understand. I'm basically only suggesting to reorder 3 lines and
add an fsync so that this potential problem goes away, see attached poc
(which does not compile because pg_fsync is in the backend only, however
it works with fsync but on linux, I'm unsure of the portability,
probably pg_fsync should be moved to port or something).

--
Fabien.

Attachments:

checksum-fsync-reorder.patchtext/x-diff; name=checksum-fsync-reorder.patchDownload
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index a7c39ac99a..1d7dd52ad0 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -454,17 +454,16 @@ main(int argc, char *argv[])
 		}
 	}
 
-	/*
-	 * Finally update the control file, flushing the data directory at the
-	 * end.
-	 */
+	/* Flush the data directory and update the control file */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (mode == PG_MODE_ENABLE)
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..0f599826e0 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -145,8 +145,8 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  *
  * Update controlfile values with the contents given by caller.  The
  * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
@@ -216,6 +216,20 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 	}
 
+	if (pg_fsync(fd) != 0)
+	{
+#ifndef FRONTEND
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync file \"%s\": %m",
+						ControlFilePath)));
+#else
+		fprintf(stderr, _("%s: could not fsync \"%s\": %s\n"),
+				progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+#endif
+	}
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd))
 		ereport(PANIC,
#64Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#62)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <michael.banck@credativ.de>
wrote:

Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:

On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:

I'm not sure of the punctuation logic on the help line: the first

sentence

does not end with a ".". I could not find an instance of this style in

other

help on pg commands. I'd suggest "check data checksums (default)"

would work

around and be more in line with other commands help.

Good idea, let's do that.

I slowed down pg_checksums by adding a 0.1s sleep when scanning a new

file,

then started a "pg_checksums --enable" on a stopped cluster, then

started

the cluster while the enabling was in progress, then connected and

updated

data.

Well, yes, don't do that. You can get into the same class of problems
while running pg_rewind, pg_basebackup or even pg_resetwal once the
initial control file check is done for each one of these tools.

I do not think it is a good thing that two commands can write to the

data

directory at the same time, really.

We don't prevent either a pg_resetwal and a pg_basebackup to run in
parallel. That would be... Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

In (i) you need to also check that is' not offline *again*. Somebody could
start *and* stop the database while pg_checksums is running. But that
should hopefully be enough to check the time field?

Another option would be to add a new feature which reliably blocks an

instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL: bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

Instead of overloading yet another thing on postmaster.pid, it might be
better to just have a separate file that if it exists, blocks startup with
a message defined as the content of that file?

It

could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.

Would you? I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway.

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Seems like a good idea -- there certainly could be a substantial delay
there depending on data size and underlying storage.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

In reply to: Michael Paquier (#61)
Re: Offline enabling/disabling of data checksums

Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA

regards, Sergei

#66Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Banck (#62)
Re: Offline enabling/disabling of data checksums

Hallo Michael,

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

That looks like a simple but mostly effective guard.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:ᅵᅵbogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

I think that a clear cluster-locking can-be-overriden-if-needed
shared-between-commands mechanism would be a good thing (tm), although it
requires some work.

My initial suggestion was to update the control file with an appropriate
state, eg some general "admin command in progress", but I understood that
it is rejected, and for another of your patch it seems that the
"postmaster.pid" file is the right approach. Fine with me, the point is
that it should be effective and consistent accross all relevant commands.

A good point about the "postmaster.pid" trick, when it does not contain
the posmaster pid, is that overriding is as simple as "rm postmaster.pid".

It could be possible to reach a state where the control file has
checksums enabled and some blocks are not correctly synced, still you
would notice rather quickly if the server is in an incorrect state at
the follow-up startup.

Would you? I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway.ᅵ

Note that fsync_pgdata is kind of heavy, it recurses everywhere. I think
that a simple fsync on the control file only is enough.

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

I agree, as it might not be cheap.

--
Fabien.

#67Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#64)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 11:47 +0100 schrieb Magnus Hagander:

On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <michael.banck@credativ.de> wrote:

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

In (i) you need to also check that is' not offline *again*. Somebody
could start *and* stop the database while pg_checksums is running. But
that should hopefully be enough to check the time field?

Good point.

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Seems like a good idea -- there certainly could be a substantial delay
there depending on data size and underlying storage.

The attached patch should do the above, on top of Michael's last
patchset.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

0001-Guard-against-concurrent-cluster-changes-when-enabli.patchtext/x-patch; charset=UTF-8; name=0001-Guard-against-concurrent-cluster-changes-when-enabli.patchDownload
From 731c48d21b0adfa74e7ce1e05b2d8f1146b9e3cf Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@debian.org>
Date: Wed, 13 Mar 2019 12:05:34 +0100
Subject: [PATCH] Guard against concurrent cluster changes when enabling
 checksums.

Re-read the control file after operating on all files in order to check whether
the instance is still shutdown and the control file still has the same
modification timestamp.

In passing, add a note about syncing the data directory in verbose mode.
---
 src/bin/pg_checksums/pg_checksums.c | 38 ++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0e464299f3..afca6cf027 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -309,6 +309,7 @@ main(int argc, char *argv[])
 	int			c;
 	int			option_index;
 	bool		crc_ok;
+	pg_time_t	controlfile_last_updated;
 
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_checksums"));
 
@@ -399,7 +400,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	/* Check if cluster is running */
+	/* Get control file data */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
 	{
@@ -414,6 +415,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Check if cluster is running */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
@@ -440,6 +442,9 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Save time of last control file modification */
+	controlfile_last_updated = ControlFile->time;
+
 	/* Operate on all files if checking or enabling checksums */
 	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
 	{
@@ -461,17 +466,44 @@ main(int argc, char *argv[])
 	}
 
 	/*
-	 * Finally update the control file, flushing the data directory at the
-	 * end.
+	 * Finally update the control file after checking the cluster is still
+	 * offline and its control file has not changed, flushing the data
+	 * directory at the end.
 	 */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+		ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+		if (!crc_ok)
+		{
+			fprintf(stderr, _("%s: pg_control CRC value is incorrect\n"), progname);
+			exit(1);
+		}
+
+		/* Check if cluster is running */
+		if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+		{
+			fprintf(stderr, _("%s: cluster no longer shut down\n"), progname);
+			exit(1);
+		}
+
+		/* Check if control file has changed */
+		if (controlfile_last_updated != ControlFile->time)
+		{
+			fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+			exit(1);
+		}
+
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 		update_controlfile(DataDir, progname, ControlFile);
 		if (do_sync)
+		{
+			if (verbose && mode == PG_MODE_ENABLE)
+				printf(_("Syncing data directory\n"));
 			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		}
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 		if (mode == PG_MODE_ENABLE)
-- 
2.11.0

#68Magnus Hagander
magnus@hagander.net
In reply to: Sergei Kornilov (#65)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:

Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and
"started streaming WAL from primary" without any issue. I have master with
checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby
think we have checksums.

Enabling or disabling the checksums offline on the master quite clearly
requires a rebuild of the standby, there is no other way (this is one of
the reasons for the online enabling in that patch, so I still hope we can
get that done -- but not for this version).

You would have the same with PITR backups for example. And especially if
you have some tool that does block or segment level differential.

Of course, we have to make sure that this actually fails.

I wonder if we should bring out the big hammer and actually change the
system id in pg_control when checksums are enabled/disabled by this tool?
That should make it clear to any tool that it's changed.

Also we support ./configure --with-blocksize=(not equals 8)? make check on

HEAD fails for me. If we support this - i think we need recheck BLCKSZ
between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different blocksize?
Yeah, that sounds like something which is a cheap check and should be done.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#69Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#68)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:

Also we support ./configure --with-blocksize=(not equals 8)? make
check on HEAD fails for me. If we support this - i think we need
recheck BLCKSZ between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different
blocksize? Yeah, that sounds like something which is a cheap check and
should be done. 

I've been doing that in my pg_checksums fork for a while (as it further
removed from the Postgres binaries) but yeah we should check for that as
well in pg_checksums, see attached patch.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

0001-Check-data-directory-block-size.patchtext/x-patch; charset=UTF-8; name=0001-Check-data-directory-block-size.patchDownload
From f942136e09cd54b1032c7c5d9b4f3305e7dc043f Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@debian.org>
Date: Wed, 13 Mar 2019 12:27:44 +0100
Subject: [PATCH] Check data directory block size

---
 src/bin/pg_checksums/pg_checksums.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index afca6cf027..dfd522ca6a 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -442,6 +442,17 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/*
+	 * Check that the PGDATA blocksize is the same as the one pg_checksums
+	 * was compiled against (BLCKSZ).
+	 */
+	if (ControlFile->blcksz != BLCKSZ)
+	{
+		fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"),
+				progname, ControlFile->blcksz, BLCKSZ);
+		exit(1);
+	}
+
 	/* Save time of last control file modification */
 	controlfile_last_updated = ControlFile->time;
 
-- 
2.11.0

In reply to: Magnus Hagander (#68)
Re: Offline enabling/disabling of data checksums

Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done.

Yep

regards, Sergei

In reply to: Michael Banck (#69)
Re: Offline enabling/disabling of data checksums

Hi,

 > Also we support ./configure --with-blocksize=(not equals 8)? make
 > check on HEAD fails for me. If we support this - i think we need
 > recheck BLCKSZ between compiled pg_checksum and used in PGDATA

 You mean if the backend and pg_checksums is built with different
 blocksize? Yeah, that sounds like something which is a cheap check and
 should be done.

I've been doing that in my pg_checksums fork for a while (as it further
removed from the Postgres binaries) but yeah we should check for that as
well in pg_checksums, see attached patch.

Seems good. And I think we need backpath this check to pg11. similar to cross-version compatibility checks

regards, Sergei

#72Magnus Hagander
magnus@hagander.net
In reply to: Sergei Kornilov (#70)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 12:40 PM Sergei Kornilov <sk@zsrv.org> wrote:

Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and

"started streaming WAL from primary" without any issue. I have master with
checksums, but replica without.

Or cluster with checksums, then disable checksums on primary, but

standby think we have checksums.

Enabling or disabling the checksums offline on the master quite clearly

requires a rebuild of the standby, there is no other way (this is one of
the reasons for the online enabling in that patch, so I still hope we can
get that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

I think this is dangerous enough that it needs to be enforced and not
documented.

Most people who care about checksums are also going to be having either
replication or backup...

Also we support ./configure --with-blocksize=(not equals 8)? make check

on HEAD fails for me. If we support this - i think we need recheck BLCKSZ
between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different

blocksize? Yeah, that sounds like something which is a cheap check and
should be done.

Yep

This one I could more live with it only being a documented problem rather
than enforced, but it also seems very simple to enforce.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#73Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#67)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:

The attached patch should do the above, on top of Michael's last
patchset.

What you are doing here looks like a good defense in itself.
--
Michael

#74Michael Paquier
michael@paquier.xyz
In reply to: Sergei Kornilov (#71)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:

Seems good. And I think we need backpath this check to pg11. similar
to cross-version compatibility checks

Good point raised, a backpatch looks adapted. It would be nice to get
into something more dynamic, but pg_checksum_block() uses directly
BLCKSZ :(
--
Michael

#75Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#68)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:

On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and
"started streaming WAL from primary" without any issue. I have
master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but
standby think we have checksums.

Enabling or disabling the checksums offline on the master quite
clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#76Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#72)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:

I think this is dangerous enough that it needs to be enforced and not
documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Another possibility would be to extend the replication protocol's
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#77Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#75)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 4:46 PM Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:

On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and
"started streaming WAL from primary" without any issue. I have
master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but
standby think we have checksums.

Enabling or disabling the checksums offline on the master quite
clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?

That sounds pretty fragile to me. But if we can prove that the user has
done things in the right order, sure. But how can we do that in an offline
process? what if the user just quickly restarted the primary note after the
standby had been shut down? We'll need to somehow validate it across the
nodes..
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#78Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#76)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 4:51 PM Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:

I think this is dangerous enough that it needs to be enforced and not
documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Well, whatever we do they have to update, right? If we're not changing it,
then we're basically saying that it's (systemid, checksums) that is the
identifier of the cluster, not just systemid. They'd have to go around and
check each node individually for the configuration and not just use
systemid anyway, so what's the actual win?

Another possibility would be to extend the replication protocol's

IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?

We could, but is it really a win in those scenarios? Vs just making the
systemid different? With systemid being different it's obvious that
something needs to be done. If it's not then at the best, if we check it in
the standby startup, the standby won't start. But people can still end up
with things like unusuable/corrupt backups for example.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#79Michael Paquier
michael@paquier.xyz
In reply to: Sergei Kornilov (#71)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:

Seems good. And I think we need backpath this check to pg11. similar
to cross-version compatibility checks

+    fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"),
+            progname, ControlFile->blcksz, BLCKSZ);
The error message looks grammatically a bit weird to me.  What about
the following?  Say:
"database block size of %u is different from supported block size of
%u."
Better ideas are welcome.

Please note that hose integers are unsigned by the way.
--
Michael

#80Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#68)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:

Enabling or disabling the checksums offline on the master quite clearly
requires a rebuild of the standby, there is no other way (this is one of
the reasons for the online enabling in that patch, so I still hope we can
get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby. Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

You mean if the backend and pg_checksums is built with different blocksize?
Yeah, that sounds like something which is a cheap check and should be done.

Yes, we should check after that, checksum calculation uses BLCKSZ with
a hardcoded value, so a mismatch would cause computation failures. It
could be possible to not have this restriction if we made the block
size an argument of the checksum calculation though.
--
Michael

#81Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#73)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:

On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:

The attached patch should do the above, on top of Michael's last
patchset.

What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+       /* Check if control file has changed */
+       if (controlfile_last_updated != ControlFile->time)
+       {
+           fprintf(stderr, _("%s: control file has changed since startup\n"), progname);
+           exit(1);
+       }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder. Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable). This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

There is a part discussed about standbys and primaries with not the
same checksum settings, but I commented on that in [1]/messages/by-id/20190314002342.GC3493@paquier.xyz.

There is a secondary bug fix to prevent the tool to run if the data
folder has been initialized with a block size different than what
pg_checksums has been compiled with in [2]/messages/by-id/20190313224742.GA3493@paquier.xyz. The patch looks good,
still the error message could be better per my lookup.

[1]: /messages/by-id/20190314002342.GC3493@paquier.xyz
[2]: /messages/by-id/20190313224742.GA3493@paquier.xyz

Am I missing something?
--
Michael

#82Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#78)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 17:54 +0100 schrieb Magnus Hagander:

On Wed, Mar 13, 2019 at 4:51 PM Michael Banck <michael.banck@credativ.de> wrote:

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:

I think this is dangerous enough that it needs to be enforced and not
documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Well, whatever we do they have to update, right?

Yeah, but I am saying their orchestrators might get confused about where
the old instance went and what this new thing with a totally different
systemid is and lose the connection between the two. 

Maybe that is a feature and not a bug.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#83Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#80)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 1:23 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:

Enabling or disabling the checksums offline on the master quite clearly
requires a rebuild of the standby, there is no other way (this is one of
the reasons for the online enabling in that patch, so I still hope we can
get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby. Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems.. Very
fragile.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#84Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#81)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 5:39 AM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote:

On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:

The attached patch should do the above, on top of Michael's last
patchset.

What you are doing here looks like a good defense in itself.

More thoughts on that, and here is a short summary of the thread.

+       /* Check if control file has changed */
+       if (controlfile_last_updated != ControlFile->time)
+       {
+           fprintf(stderr, _("%s: control file has changed since
startup\n"), progname);
+           exit(1);
+       }
Actually, under the conditions discussed on this thread that Postgres
is started in parallel of pg_checksums, imagine the following
scenario:
- pg_checksums starts to enable checksums, it reads a block to
calculate its checksum, then calculates it.
- Postgres has been started in parallel, writes the same block.
- pg_checksums finishes the block calculation, writes back the block
it has just read.
- Postgres stops, some data is lost.
- At the end of pg_checksums, we complain that the control file has
been updated since the start of pg_checksums.
I think that we should be way more noisy about this error message
document properly that Postgres should not be started while checksums
are enabled.  Basically, I guess that it should mention that there is
a risk of corruption because of this parallel operation.

Hence, based on what I could read on this thread, we'd like to have
the following things added to the core patch set:
1) When enabling checksums, fsync the data folder. Then update the
control file, and finally fsync the control file (+ flush of the
parent folder to make the whole durable). This way a host crash never
actually means that we finish in an inconsistent state.
2) When checksums are disabled, update the control file, then fsync
it + its parent folder.
3) Add tracking of the control file data, and complain loudly before
trying to update the file if there are any inconsistencies found.
4) Document with a big-fat-red warning that postgres should not be
worked on while the tool is enabling or disabling checksums.

Given that the failure is data corruption, I don't think big fat warning is
enough. We should really make it impossible to start up the postmaster by
mistake during the checksum generation. People don't read the documentation
until it's too late. And it might not even be under their control - some
automated tool might go in and try to start postgres, and boom, corruption.

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and when
done, put it back.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#85Christoph Berg
myon@debian.org
In reply to: Magnus Hagander (#83)
Re: Offline enabling/disabling of data checksums

Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems.. Very
fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Christoph

#86Magnus Hagander
magnus@hagander.net
In reply to: Christoph Berg (#85)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:

Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=
ZmB8qck7nDmcHEY5O5g@mail.gmail.com>

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems..

Very

fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

I can agree with that, if we can declare it safe. You might need some way
to ensure it was shut down cleanly on both sides, I'm guessing.

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Or any other way of changing the systemid in a predictable way would also
work, right? As long as it's done the same on both sides. And that way it
would look different to any system that *doesn't* know what it means, which
is probably a good thing.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#87Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#84)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Hi,

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:

Given that the failure is data corruption, I don't think big fat
warning is enough. We should really make it impossible to start up the
postmaster by mistake during the checksum generation. People don't
read the documentation until it's too late. And it might not even be
under their control - some automated tool might go in and try to start
postgres, and boom, corruption.

I guess you're right.

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and
when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

The only question is whether pg_checksums should try to move pg_control
back (i) on failure (ii) when interrupted?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

0001-Rename-away-pg_control-while-enabling-checksums.patchtext/x-patch; charset=UTF-8; name=0001-Rename-away-pg_control-while-enabling-checksums.patchDownload
From 9b1471afee485a96c32e393989ca4eb0aed52f88 Mon Sep 17 00:00:00 2001
From: Michael Banck <mbanck@debian.org>
Date: Thu, 14 Mar 2019 16:02:49 +0100
Subject: [PATCH] Rename away pg_control while enabling checksums.

In order to prevent that the cluster is accidently started during the
operation, the pg_control file is renamed to
pg_control.pg_checksums_in_progress during the operation and renamed back at
the end.
---
 doc/src/sgml/ref/pg_checksums.sgml  |  5 +++--
 src/bin/pg_checksums/pg_checksums.c | 36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c3ccbf4eb7..91dfd3515a 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -48,8 +48,9 @@ PostgreSQL documentation
   <para>
    While checking or enabling checksums needs to scan or write every file in
    the cluster, disabling will only update the file
-   <filename>pg_control</filename>.
-  </para>
+   <filename>pg_control</filename>. During enabling,
+   the <filename>pg_control</filename> </para> file is renamed in order to
+   prevent accidentally starting the cluster.
  </refsect1>
 
  <refsect1>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 852ddafc94..590fb925e8 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -55,6 +55,12 @@ typedef enum
 #define PG_TEMP_FILES_DIR "pgsql_tmp"
 #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
 
+#ifndef WIN32
+#define pg_mv_file			rename
+#else
+#define pg_mv_file			pgrename
+#endif
+
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
@@ -88,6 +94,7 @@ usage(void)
  */
 static const char *const skip[] = {
 	"pg_control",
+	"pg_control.pg_checksums_in_progress",
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
@@ -306,6 +313,8 @@ main(int argc, char *argv[])
 	};
 
 	char	   *DataDir = NULL;
+	char		old_controlfile_path[MAXPGPATH];
+	char		new_controlfile_path[MAXPGPATH];
 	int			c;
 	int			option_index;
 	bool		crc_ok;
@@ -440,6 +449,23 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/*
+	 * Disable cluster by renaming pg_control when enabling checksums so
+	 * that it cannot be started by accident during the operation
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		snprintf(old_controlfile_path, sizeof(old_controlfile_path),
+				 "%s/global/pg_control", DataDir);
+		snprintf(new_controlfile_path, sizeof(new_controlfile_path),
+				 "%s/global/pg_control.pg_checksums_in_progress", DataDir);
+		if (pg_mv_file(old_controlfile_path, new_controlfile_path) != 0)
+		{
+			fprintf(stderr, _("%s: unable to rename %s to %s.\n"), progname, old_controlfile_path, new_controlfile_path);
+			exit(1);
+		}
+	}
+
 	/* Operate on all files if checking or enabling checksums */
 	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
 	{
@@ -466,6 +492,16 @@ main(int argc, char *argv[])
 	 */
 	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
 	{
+
+		/* Move back pg_control */
+		if (mode == PG_MODE_ENABLE)
+		{
+			if (pg_mv_file(new_controlfile_path, old_controlfile_path) != 0)
+			{
+				fprintf(stderr, _("%s: unable to rename %s to %s.\n"), progname, new_controlfile_path, old_controlfile_path);
+				exit(1);
+			}
+		}
 		/* Update control file */
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
-- 
2.11.0

#88Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#86)
Re: Offline enabling/disabling of data checksums

Hi,

Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:

On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:

Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=ZmB8qck7nDmcHEY5O5g@mail.gmail.com>

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems.. Very
fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

I can agree with that, if we can declare it safe. You might need some
way to ensure it was shut down cleanly on both sides, I'm guessing. 

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Or any other way of changing the systemid in a predictable way would
also work, right? As long as it's done the same on both sides. And
that way it would look different to any system that *doesn't* know
what it means, which is probably a good thing.

If we change the system identifier, we'll have to reset the WAL as well
or otherwise we'll get "PANIC:  could not locate a valid checkpoint
record" on startup. So even if we do it predictably on both primary and
standby I guess the standby would need to be re-cloned?

So I think an option that skips that for people who know what they are
doing with the streaming replication setup would be required, should we
decide to bump the system identifier.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#89Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#83)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems.. Very
fragile.

Well, saying that it is supported is a too big term for that. What I
am saying is that the problems you are pointing out are not as bad as
you seem to mean they are as long as an operator does not copy on-disk
pages from one node to the other one. Knowing that checksums apply
only to pages flushed on disk on a local node, everything going
through WAL for availability is actually able to work fine:
- PITR
- archive recovery.
- streaming replication.
Reading the code I understand that. I have as well done some tests
with a primary/standby configuration to convince myself, using pgbench
on both nodes (read-write for the primary, read-only on the standby),
with checkpoint (or restart point) triggered on each node every 20s.
If one node has checksum enabled and the other checksum disabled, then
I am not seeing any inconsistency.

However, anything which does a physical copy of pages could get things
easily messed up if one node has checksum disabled and the other
enabled. One such tool is pg_rewind. If the promoted standby has
checksums disabled (becoming the source), and the old master to rewind
has checksums enabled, then the rewind could likely copy pages which
have not their checksums set correctly, resulting in incorrect
checksums on the old master.

So yes, it is easy to mess up things, however this does not apply to
all configurations. The suggestion from Christoph to enable checksums
on both nodes separately would work, and personally I find the
suggestion to update the system ID after enabling or disabling
checksums an over-engineered design because of the reasons in the
first part of this email (it is technically doable to enable checksums
with a minimum downtime and a failover), so my recommendation would be
to document that when enabling checksums on one instance in a cluster,
it should be applied to all instances as it could cause problems with
any tools performing a physical copy of relation files or blocks.
--
Michael

#90Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#87)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and
when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

Indeed. I did not know this trick from pg_upgrade. We could just use
the same.

The only question is whether pg_checksums should try to move pg_control
back (i) on failure (ii) when interrupted?

Yes, we should have a callback on SIGINT and SIGTERM here which just
moves back in place the control file if the temporary one exists. I
have been able to grab some time to incorporate the feedback gathered
on this thread, and please find attached a new version of the patch to
add --enable/--disable. The main changes are:
- When enabling checksums, fsync first the data directory, and at the
end then update/flush the control file and its parent folder as Fabien
has reported.
- When disabling checksums, only work on the control file, as Fabien
has also reported.
- Rename the control file when beginning the enabling operation, with
a callback to rename the file back if the operation is interrupted.

Does this make sense?
--
Michael

Attachments:

0001-Add-options-to-enable-and-disable-checksums-in-pg_ch.patchtext/x-diff; charset=us-asciiDownload
From 2ebb032e7bea22829396e88ff9cc1b52f1b754d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 13 Mar 2019 11:12:53 +0900
Subject: [PATCH] Add options to enable and disable checksums in pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When using --disable, only the control file is updated and then
flushed.  When using --enable, the process gets more complicated as the
operation can be long:
- Rename the control file to a temporary name, to prevent a parallel
startup of Postgres.
- Scan all files and update their checksums.
- Rename back the control file.
- Flush the data directory.
- Update the control file and then flush it, to make the change
durable.
If the operation is interrupted, the control file gets moved back in
place.

If no mode is specified in the options, then --check is used for
compatibility with older versions of pg_verify_checksums (now renamed to
pg_checksums in v12).

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  54 ++++-
 src/bin/pg_checksums/pg_checksums.c   | 287 +++++++++++++++++++++++---
 src/bin/pg_checksums/t/002_actions.pl |  76 +++++--
 src/tools/pgindent/typedefs.list      |   1 +
 4 files changed, 370 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..dc41553bc4 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,11 +36,24 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
   </para>
+
+  <para>
+   Checking checksums requires to scan every file holding them in the data
+   folder.  Disabling checksums requires only an update of the file
+   <filename>pg_control</filename>.  Enabling checksums first renames
+   the file <filename>pg_control</filename> to
+   <filename>pg_control.pg_checksums_in_progress</filename> to prevent
+   a parallel startup of the cluster, then it updates all files with
+   checksums, and it finishes by renaming and updating
+   <filename>pg_control</filename> to mark checksums as enabled.
  </refsect1>
 
  <refsect1>
@@ -60,6 +73,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5d4083fa9f..b98299a292 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -16,35 +17,78 @@
 #include <dirent.h>
 #include <sys/stat.h>
 #include <unistd.h>
+#include <signal.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
 static ControlFileData *ControlFile;
-
+static char *DataDir = NULL;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static char controlfile_path[MAXPGPATH];
+static char controlfile_path_temp[MAXPGPATH];
+
+
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"
+
+
+#ifndef WIN32
+#define pg_mv_file			rename
+#else
+#define pg_mv_file			pgrename
+#endif
+
+static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("                         This is the default mode if nothing is specified.\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -54,6 +98,32 @@ usage(void)
 	printf(_("Report bugs to <pgsql-bugs@lists.postgresql.org>.\n"));
 }
 
+/*
+ * Clean up the temporary control file when enabling checksums in the
+ * event of an interruption.
+ */
+static void
+signal_cleanup(int signum)
+{
+	/* nothing to do if there is no temporary control file */
+	if (access(controlfile_path_temp, F_OK) != 0)
+		exit(signum);
+
+	if (pg_mv_file(controlfile_path_temp, controlfile_path))
+	{
+		fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
+				progname, controlfile_path_temp, controlfile_path,
+				strerror(errno));
+		exit(1);
+	}
+
+	if (fsync_fname(controlfile_path, false, progname) != 0 ||
+		fsync_parent_path(controlfile_path, progname))
+		exit(1);
+
+	exit(signum);
+}
+
 /*
  * List of files excluded from checksum validation.
  *
@@ -61,6 +131,7 @@ usage(void)
  */
 static const char *const skip[] = {
 	"pg_control",
+	"pg_control.pg_checksums_in_progress",
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
@@ -90,8 +161,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(mode == PG_MODE_ENABLE ||
+		   mode == PG_MODE_CHECK);
+
+	flags = (mode == PG_MODE_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +198,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (mode == PG_MODE_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (mode == PG_MODE_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (mode == PG_MODE_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (mode == PG_MODE_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,12 +340,14 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
 
-	char	   *DataDir = NULL;
 	int			c;
 	int			option_index;
 	bool		crc_ok;
@@ -262,10 +370,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				mode = PG_MODE_CHECK;
+				break;
+			case 'd':
+				mode = PG_MODE_DISABLE;
+				break;
+			case 'e':
+				mode = PG_MODE_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +429,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (mode != PG_MODE_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -330,29 +456,134 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		mode == PG_MODE_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/*
+	 * Allocate the control file paths here, as this gets used in various
+	 * phases.
+	 */
+	snprintf(controlfile_path, sizeof(controlfile_path),
+			 "%s/%s", DataDir, CONTROL_FILE_PATH);
+	snprintf(controlfile_path_temp, sizeof(controlfile_path_temp),
+			 "%s/%s", DataDir, CONTROL_FILE_PATH_TEMP);
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+	/* Prevent leaving behind any intermediate state */
+	pqsignal(SIGINT, signal_cleanup);
+	pqsignal(SIGTERM, signal_cleanup);
 
-	if (badblocks > 0)
-		return 1;
+	/*
+	 * The operation is good to move on with all the sanity checks done.
+	 * Enabling checksums can take a long time as all the files need to
+	 * be scanned and rewritten.  Hence, first, prevent any parallel startup
+	 * of the instance by renaming the control file when enabling checksums
+	 * so that it cannot be started by accident during the operation.
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
+				controlfile_path_temp);
+		if (pg_mv_file(controlfile_path, controlfile_path_temp) != 0)
+		{
+			fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
+					progname, controlfile_path, controlfile_path_temp,
+					strerror(errno));
+			exit(1);
+		}
+	}
+
+	/* Operate on all files if checking or enabling checksums */
+	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
+
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (mode == PG_MODE_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+
+			if (badblocks > 0)
+				exit(1);
+		}
+	}
+
+	/*
+	 * Now that enabling data checksums is done, first put the control
+	 * file back in place and then flush the data directory.  The control
+	 * file is updated and flushed in a follow-up step to never have the
+	 * data folder into an inconsistent state should a crash happen
+	 * in-between.
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
+				controlfile_path);
+		if (pg_mv_file(controlfile_path_temp, controlfile_path) != 0)
+		{
+			fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
+					progname, controlfile_path_temp, controlfile_path,
+					strerror(errno));
+			exit(1);
+		}
+
+		printf(_("Syncing data folder\n"));
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+	}
+
+	/*
+	 * Finally update and flush the control file.
+	 */
+	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
+	{
+		/* Update the control control file */
+		printf(_("Updating control file\n"));
+		ControlFile->data_checksum_version =
+			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+		update_controlfile(DataDir, progname, ControlFile);
+
+		/*
+		 * Flush the control file and its parent path to make the change
+		 * durable.
+		 */
+		if (fsync_fname(controlfile_path, false, progname) != 0 ||
+			fsync_parent_path(controlfile_path, progname) != 0)
+		{
+			/* errors are already logged on failure */
+			exit(1);
+		}
+
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (mode == PG_MODE_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b821df9e71..e86fecb849 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1697,6 +1697,7 @@ PgBenchExprType
 PgBenchFunction
 PgBenchValue
 PgBenchValueType
+PgChecksumMode
 PgFdwAnalyzeState
 PgFdwDirectModifyState
 PgFdwModifyState
-- 
2.20.1

#91Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#90)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 11:50:27AM +0900, Michael Paquier wrote:

- Rename the control file when beginning the enabling operation, with
a callback to rename the file back if the operation is interrupted.

Does this make sense?

Just before I forget... Please note that this handles interruptions
but not failures, based on the assumption that on failures we can know
that the system was working on its checksums thanks to the temporary
control file so that's useful for debugging in my opinion.
--
Michael

#92Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#90)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:

On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and
when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

Indeed. I did not know this trick from pg_upgrade. We could just use
the same.

The only question is whether pg_checksums should try to move pg_control
back (i) on failure (ii) when interrupted?

Yes, we should have a callback on SIGINT and SIGTERM here which just
moves back in place the control file if the temporary one exists. I
have been able to grab some time to incorporate the feedback gathered
on this thread, and please find attached a new version of the patch to
add --enable/--disable.

Thanks!

One thing stood out to me while quickly looking over it:

+		/*
+		 * Flush the control file and its parent path to make the change
+		 * durable.
+		 */
+		if (fsync_fname(controlfile_path, false, progname) != 0 ||
+			fsync_parent_path(controlfile_path, progname) != 0)
+		{
+			/* errors are already logged on failure */
+			exit(1);
+		}

ISTM this would not run fsync_parent_path() unless the first fsync fails
which is not the intended use. I guess we need two ifs here?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#93Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#92)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 09:04:51AM +0100, Michael Banck wrote:

ISTM this would not run fsync_parent_path() unless the first fsync fails
which is not the intended use. I guess we need two ifs here?

Yes, let's do that. Let's see if others have input to offer about the
patch. This thread is gathering attention, which is good.
--
Michael

#94Magnus Hagander
magnus@hagander.net
In reply to: Michael Paquier (#89)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 1:49 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:

Are you suggesting we should support running with a master with checksums
on and a standby with checksums off in the same cluster? That seems..

Very

fragile.

Well, saying that it is supported is a too big term for that. What I
am saying is that the problems you are pointing out are not as bad as
you seem to mean they are as long as an operator does not copy on-disk
pages from one node to the other one. Knowing that checksums apply
only to pages flushed on disk on a local node, everything going
through WAL for availability is actually able to work fine:
- PITR
- archive recovery.
- streaming replication.
Reading the code I understand that. I have as well done some tests
with a primary/standby configuration to convince myself, using pgbench
on both nodes (read-write for the primary, read-only on the standby),
with checkpoint (or restart point) triggered on each node every 20s.
If one node has checksum enabled and the other checksum disabled, then
I am not seeing any inconsistency.

However, anything which does a physical copy of pages could get things
easily messed up if one node has checksum disabled and the other
enabled. One such tool is pg_rewind. If the promoted standby has
checksums disabled (becoming the source), and the old master to rewind
has checksums enabled, then the rewind could likely copy pages which
have not their checksums set correctly, resulting in incorrect
checksums on the old master.

So yes, it is easy to mess up things, however this does not apply to
all configurations. The suggestion from Christoph to enable checksums
on both nodes separately would work, and personally I find the
suggestion to update the system ID after enabling or disabling
checksums an over-engineered design because of the reasons in the
first part of this email (it is technically doable to enable checksums
with a minimum downtime and a failover), so my recommendation would be
to document that when enabling checksums on one instance in a cluster,
it should be applied to all instances as it could cause problems with
any tools performing a physical copy of relation files or blocks.

As I said, that's a big hammer. I'm all for having a better solution. But I
don't think it's acceptable not to have *any* defense against it, given how
bad corruption it can lead to.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#95Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#88)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 4:54 PM Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:

On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg <myon@debian.org> wrote:

Re: Magnus Hagander 2019-03-14 <CABUevEx7QZLOjWDvwTdm1VM+mjsDm7=

ZmB8qck7nDmcHEY5O5g@mail.gmail.com>

Are you suggesting we should support running with a master with

checksums

on and a standby with checksums off in the same cluster? That

seems.. Very

fragile.

The case "shut down master and standby, run pg_checksums on both, and
start them again" should be supported. That seems safe to do, and a
real-world use case.

I can agree with that, if we can declare it safe. You might need some
way to ensure it was shut down cleanly on both sides, I'm guessing.

Changing the system id to a random number would complicate this.

(Horrible idea: maybe just adding 1 (= checksum version) to the system
id would work?)

Or any other way of changing the systemid in a predictable way would
also work, right? As long as it's done the same on both sides. And
that way it would look different to any system that *doesn't* know
what it means, which is probably a good thing.

If we change the system identifier, we'll have to reset the WAL as well
or otherwise we'll get "PANIC: could not locate a valid checkpoint
record" on startup. So even if we do it predictably on both primary and
standby I guess the standby would need to be re-cloned?

So I think an option that skips that for people who know what they are
doing with the streaming replication setup would be required, should we
decide to bump the system identifier.

Ugh. I did not think of that one. But yes, the main idea there would be
that if you turn on checksums on the primary then you have to re-clone all
standbys. That's what happens if we change the system idenfier -- that's
why it's the "big hammer method".

But yeah, an option to avoid it could be one way to deal with it. If we
could find some safer way to handle it that'd be better, but otherwise
changing the sysid by default and having an option to turn it off could be
one way to deal with it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#96Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#87)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 14, 2019 at 4:26 PM Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:

Given that the failure is data corruption, I don't think big fat
warning is enough. We should really make it impossible to start up the
postmaster by mistake during the checksum generation. People don't
read the documentation until it's too late. And it might not even be
under their control - some automated tool might go in and try to start
postgres, and boom, corruption.

I guess you're right.

One big-hammer method could be similar to what pg_upgrade does --
temporarily rename away the controlfile so postgresql can't start, and
when done, put it back.

That sounds like a good solution to me. I've made PoC patch for that,
see attached.

The downside with this method is we can't get a nice error message during
the attempted startup. But it should at least be safe, which is the most
important part. And at least it's clear what's happening once you list the
files and see the name of the temporary one.

//Magnus

#97Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#94)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 09:52:11AM +0100, Magnus Hagander wrote:

As I said, that's a big hammer. I'm all for having a better solution. But I
don't think it's acceptable not to have *any* defense against it, given how
bad corruption it can lead to.

Hm... It looks that my arguments are not convincing enough. I am not
really convinced that there is any need to make that the default, nor
does it make much sense to embed that stuff directly into pg_checksums
because that's actually just doing an extra step which is equivalent
to calling pg_resetwal, and we know that this tool has the awesome
reputation to cause more harm than anything else. At least I would
like to have an option which allows to support the behavior to *not*
update the system identifier so as the cases I mentioned would be
supported, because then it becomes possible to enable checksums on a
primary with only a failover as long as page copies are not directly
involved and that all operations go through WAL. And that would be
quite nice.
--
Michael

#98Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#68)
Re: Offline enabling/disabling of data checksums

Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:

On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <sk@zsrv.org> wrote:

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and
"started streaming WAL from primary" without any issue. I have
master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but
standby think we have checksums.

Enabling or disabling the checksums offline on the master quite
clearly requires a rebuild of the standby, there is no other way (this
is one of the reasons for the online enabling in that patch, so I
still hope we can get that done -- but not for this version).

You would have the same with PITR backups for example.

I'd like to get back to PITR. 

I thought about this a bit and actually I think PITR might be fine in
the sense that if you enabled or disabled the cluster after the last
basebackup and then do PITR with the avaiable WAL beyond that, you would
get a working cluster, just with the checksum state the cluster had at
the time of the basebackup. I think that would be entirely accetable, so
long as nothing else breaks?

I made some quick tests and did see no errors, but maybe I am missing
something?

And especially if you have some tool that does block or segment level
differential.

This might be the case, but not sure if PostgreSQL core must worry about
it? Obviously the documentation must be made explicit about these kinds
of cases.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#99Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#90)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:

I have been able to grab some time to incorporate the feedback gathered
on this thread, and please find attached a new version of the patch to
add --enable/--disable.

Some more feedback:

1. There's a typo in line 578 which makes it fail to compile:

|src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
| }y

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

3. There's (I think) leftover debug output in the following places:

|+ printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
|+ controlfile_path_temp);

|+ printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
|+ controlfile_path);

|+ printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably fine).

|+ printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#100Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#99)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:

1. There's a typo in line 578 which makes it fail to compile:

|src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
| }y

I am wondering where you got this one. My local branch does not have
it, and the patch I sent does not seem to have it either.

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

PG_MODE_DISABLE needs controlfile_path as well. We could make the
cleanup only available when using --enable, the code just looked more
simple in its current shape. I think it's just more simple to set
everything unconditionally. This code may become more complicated in
the future.

3. There's (I think) leftover debug output in the following places:

|+ printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
|+ controlfile_path_temp);

|+ printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
|+ controlfile_path);

|+ printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably
fine).

This is wanted. Many folks have been complaning on this thread about
crashes and such, surely we want logs about what happens :)

|+ printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.

Perhaps having them under --verbose makes more sense?
--
Michael

#101Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#100)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:

On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:

1. There's a typo in line 578 which makes it fail to compile:

src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
}y

I am wondering where you got this one. My local branch does not have
it, and the patch I sent does not seem to have it either.

Mea culpa, I must've fat fingered something in the editor before
applying your patch, sorry. I should've double-checked.

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

PG_MODE_DISABLE needs controlfile_path as well. We could make the
cleanup only available when using --enable, the code just looked more
simple in its current shape. I think it's just more simple to set
everything unconditionally. This code may become more complicated in
the future.

Ok.

3. There's (I think) leftover debug output in the following places:

+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
+				controlfile_path_temp);
+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
+				controlfile_path);
+		printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably
fine).

This is wanted. Many folks have been complaning on this thread about
crashes and such, surely we want logs about what happens :)

+ printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.

Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#102Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#54)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Bonjour Michaël-san,

Yes, that would be nice, for now I have focused. For pg_resetwal yes
we could do it easily. Would you like to send a patch?

Here is a proposal for "pg_resetwal".

The implementation basically removes a lot of copy paste and calls the
new update_controlfile function instead. I like removing useless code:-)

The reserwal implementation was doing a rm/create cycle, which was leaving
a small window for losing the controlfile. Not neat.

I do not see the value of *not* fsyncing the control file when writing it,
as it is by definition very precious, so I added a fsync. The server side
branch uses the backend available "pg_fsync", which complies with server
settings there and can do nothing if fsync is disabled.

Maybe the two changes could be committed separately.

--
Fabien.

Attachments:

controlfile-update-1.patchtext/x-diff; name=controlfile-update-1.patchDownload
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..dd085e16ab 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -918,18 +918,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-					 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
 	/*
 	 * Adjust fields as needed to force an empty XLOG starting at
 	 * newXlogSegNo.
@@ -961,53 +949,7 @@ RewriteControlFile(void)
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
-	/* Contents are protected with a CRC */
-	INIT_CRC32C(ControlFile.crc);
-	COMP_CRC32C(ControlFile.crc,
-				(char *) &ControlFile,
-				offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile.crc);
-
-	/*
-	 * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
-	 * the excess over sizeof(ControlFileData).  This reduces the odds of
-	 * premature-EOF errors when reading pg_control.  We'll still fail when we
-	 * check the contents of the file, but hopefully with a more specific
-	 * error than "couldn't read pg_control".
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, &ControlFile, sizeof(ControlFileData));
-
-	unlink(XLOG_CONTROL_FILE);
-
-	fd = open(XLOG_CONTROL_FILE,
-			  O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
-			  pg_file_create_mode);
-	if (fd < 0)
-	{
-		fprintf(stderr, _("%s: could not create pg_control file: %s\n"),
-				progname, strerror(errno));
-		exit(1);
-	}
-
-	errno = 0;
-	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		fprintf(stderr, _("%s: could not write pg_control file: %s\n"),
-				progname, strerror(errno));
-		exit(1);
-	}
-
-	if (fsync(fd) != 0)
-	{
-		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
-		exit(1);
-	}
-
-	close(fd);
+	update_controlfile(XLOG_CONTROL_FILE, progname, &ControlFile);
 }
 
 
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..78ce8b020f 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -144,9 +144,9 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * contents to write are included in "ControlFile". Not that it is
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
@@ -216,6 +216,21 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 	}
 
+#ifndef FRONTEND
+	if (pg_fsync(fd) != 0)
+		ereport(PANIC,
+				(errcode_for_file_access(),
+				 errmsg("could not fsync file \"%s\": %m",
+						ControlFilePath)));
+#else
+	if (fsync(fd) != 0)
+	{
+		fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+				progname, ControlFilePath, strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+#endif
+
 #ifndef FRONTEND
 	if (CloseTransientFile(fd))
 		ereport(PANIC,
#103Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#102)
Re: Offline enabling/disabling of data checksums

On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote:

The implementation basically removes a lot of copy paste and calls the new
update_controlfile function instead. I like removing useless code:-)

Yes, I spent something like 10 minutes looking at that code yesterday
and I agree that removing the control file to recreate it is not
really necessary, even if the window between its removal and
recreation is short.

I do not see the value of *not* fsyncing the control file when writing it,
as it is by definition very precious, so I added a fsync. The server side
branch uses the backend available "pg_fsync", which complies with server
settings there and can do nothing if fsync is disabled.

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal. Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected. So while your patch is
on a good track, I would suggest to do the following things to
complete it:
- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value. The
default is to flush the file.
- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.
- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code. The version in
xlog.c uses BasicOpenFile(), so we should use also that in
update_controlfile() instead of OpenTransientFile(). As any errors
result in a PANIC we don't care about leaking fds.
--
Michael

#104Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#103)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Michaᅵl-san,

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

I could remove the two "catalog/" includes from pg_resetwal, I assume that
you meant these ones.

Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal. Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.

So while your patch is on a good track, I would suggest to do the
following things to complete it:

- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value. The
default is to flush the file.

Hmmm. I just did that, but what about just a boolean? What other options
could be required? Maybe some locking/checking?

- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.

Done.

- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.

I was keeping that one for another patch because it touches the backend
code, but it makes sense to do that in one go for consistency.

I kept the initial no-parameter function which calls the new one with 4
parameters, though, because it looks more homogeneous this way in the
backend code. This is debatable.

The version in xlog.c uses BasicOpenFile(), so we should use also that
in update_controlfile() instead of OpenTransientFile(). As any errors
result in a PANIC we don't care about leaking fds.

Done.

Attached is an update.

--
Fabien.

Attachments:

controlfile-update-2.patchtext/x-diff; name=controlfile-update-2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 54d3c558c6..7f782e255c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -40,6 +40,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "common/controldata_utils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -4757,48 +4758,7 @@ ReadControlFile(void)
 void
 UpdateControlFile(void)
 {
-	int			fd;
-
-	INIT_CRC32C(ControlFile->crc);
-	COMP_CRC32C(ControlFile->crc,
-				(char *) ControlFile,
-				offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile->crc);
-
-	fd = BasicOpenFile(XLOG_CONTROL_FILE,
-					   O_RDWR | PG_BINARY);
-	if (fd < 0)
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", XLOG_CONTROL_FILE)));
-
-	errno = 0;
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE);
-	if (write(fd, ControlFile, sizeof(ControlFileData)) != sizeof(ControlFileData))
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not write to file \"%s\": %m",
-						XLOG_CONTROL_FILE)));
-	}
-	pgstat_report_wait_end();
-
-	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
-	if (pg_fsync(fd) != 0)
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not fsync file \"%s\": %m",
-						XLOG_CONTROL_FILE)));
-	pgstat_report_wait_end();
-
-	if (close(fd))
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						XLOG_CONTROL_FILE)));
+	update_controlfile(".", XLOG_CONTROL_FILE, ControlFile, CF_OPT_NONE);
 }
 
 /*
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 2af8713216..334903711e 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -49,8 +49,7 @@
 #include "access/multixact.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
-#include "catalog/catversion.h"
-#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
 #include "common/fe_memutils.h"
 #include "common/file_perm.h"
 #include "common/restricted_token.h"
@@ -918,18 +917,6 @@ PrintNewControlValues(void)
 static void
 RewriteControlFile(void)
 {
-	int			fd;
-	char		buffer[PG_CONTROL_FILE_SIZE];	/* need not be aligned */
-
-	/*
-	 * For good luck, apply the same static assertions as in backend's
-	 * WriteControlFile().
-	 */
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_MAX_SAFE_SIZE,
-					 "pg_control is too large for atomic disk writes");
-	StaticAssertStmt(sizeof(ControlFileData) <= PG_CONTROL_FILE_SIZE,
-					 "sizeof(ControlFileData) exceeds PG_CONTROL_FILE_SIZE");
-
 	/*
 	 * Adjust fields as needed to force an empty XLOG starting at
 	 * newXlogSegNo.
@@ -961,53 +948,7 @@ RewriteControlFile(void)
 	ControlFile.max_prepared_xacts = 0;
 	ControlFile.max_locks_per_xact = 64;
 
-	/* Contents are protected with a CRC */
-	INIT_CRC32C(ControlFile.crc);
-	COMP_CRC32C(ControlFile.crc,
-				(char *) &ControlFile,
-				offsetof(ControlFileData, crc));
-	FIN_CRC32C(ControlFile.crc);
-
-	/*
-	 * We write out PG_CONTROL_FILE_SIZE bytes into pg_control, zero-padding
-	 * the excess over sizeof(ControlFileData).  This reduces the odds of
-	 * premature-EOF errors when reading pg_control.  We'll still fail when we
-	 * check the contents of the file, but hopefully with a more specific
-	 * error than "couldn't read pg_control".
-	 */
-	memset(buffer, 0, PG_CONTROL_FILE_SIZE);
-	memcpy(buffer, &ControlFile, sizeof(ControlFileData));
-
-	unlink(XLOG_CONTROL_FILE);
-
-	fd = open(XLOG_CONTROL_FILE,
-			  O_RDWR | O_CREAT | O_EXCL | PG_BINARY,
-			  pg_file_create_mode);
-	if (fd < 0)
-	{
-		fprintf(stderr, _("%s: could not create pg_control file: %s\n"),
-				progname, strerror(errno));
-		exit(1);
-	}
-
-	errno = 0;
-	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
-	{
-		/* if write didn't set errno, assume problem is no disk space */
-		if (errno == 0)
-			errno = ENOSPC;
-		fprintf(stderr, _("%s: could not write pg_control file: %s\n"),
-				progname, strerror(errno));
-		exit(1);
-	}
-
-	if (fsync(fd) != 0)
-	{
-		fprintf(stderr, _("%s: fsync error: %s\n"), progname, strerror(errno));
-		exit(1);
-	}
-
-	close(fd);
+	update_controlfile(XLOG_CONTROL_FILE, progname, &ControlFile, CF_OPT_NONE);
 }
 
 
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 7f1d6bf48a..d1908912ff 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -377,7 +377,7 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
-	update_controlfile(datadir_target, progname, &ControlFile_new);
+	update_controlfile(datadir_target, progname, &ControlFile_new, CF_OPT_NONE);
 
 	pg_log(PG_PROGRESS, "syncing target data directory\n");
 	syncTargetDirectory();
diff --git a/src/common/controldata_utils.c b/src/common/controldata_utils.c
index 71e67a2eda..65d9acc88e 100644
--- a/src/common/controldata_utils.c
+++ b/src/common/controldata_utils.c
@@ -31,6 +31,7 @@
 #include "port/pg_crc32c.h"
 #ifndef FRONTEND
 #include "storage/fd.h"
+#include "pgstat.h"
 #endif
 
 /*
@@ -144,13 +145,13 @@ get_controlfile(const char *DataDir, const char *progname, bool *crc_ok_p)
  * update_controlfile()
  *
  * Update controlfile values with the contents given by caller.  The
- * contents to write are included in "ControlFile".  Note that it is up
- * to the caller to fsync the updated file, and to properly lock
- * ControlFileLock when calling this routine in the backend.
+ * contents to write are included in "ControlFile". Not that it is
+ * to the caller to properly lock ControlFileLock when calling this
+ * routine in the backend.
  */
 void
 update_controlfile(const char *DataDir, const char *progname,
-				   ControlFileData *ControlFile)
+				   ControlFileData *ControlFile, bits16 options)
 {
 	int			fd;
 	char		buffer[PG_CONTROL_FILE_SIZE];
@@ -182,7 +183,7 @@ update_controlfile(const char *DataDir, const char *progname,
 	snprintf(ControlFilePath, sizeof(ControlFilePath), "%s/%s", DataDir, XLOG_CONTROL_FILE);
 
 #ifndef FRONTEND
-	if ((fd = OpenTransientFile(ControlFilePath, O_WRONLY | PG_BINARY)) == -1)
+	if ((fd = BasicOpenFile(ControlFilePath, O_RDWR | PG_BINARY)) < 0)
 		ereport(PANIC,
 				(errcode_for_file_access(),
 				 errmsg("could not open file \"%s\": %m",
@@ -198,6 +199,9 @@ update_controlfile(const char *DataDir, const char *progname,
 #endif
 
 	errno = 0;
+#ifndef FRONTEND
+	pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE);
+#endif
 	if (write(fd, buffer, PG_CONTROL_FILE_SIZE) != PG_CONTROL_FILE_SIZE)
 	{
 		/* if write didn't set errno, assume problem is no disk space */
@@ -215,19 +219,34 @@ update_controlfile(const char *DataDir, const char *progname,
 		exit(EXIT_FAILURE);
 #endif
 	}
+#ifndef FRONTEND
+	pgstat_report_wait_end();
+#endif
 
+	if (options & CF_OPT_NOSYNC == 0)
+	{
 #ifndef FRONTEND
-	if (CloseTransientFile(fd))
-		ereport(PANIC,
-				(errcode_for_file_access(),
-				 errmsg("could not close file \"%s\": %m",
-						ControlFilePath)));
+		pgstat_report_wait_start(WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE);
+		if (pg_fsync(fd) != 0)
+			ereport(PANIC,
+					(errcode_for_file_access(),
+					 errmsg("could not fsync file \"%s\": %m",
+							ControlFilePath)));
+		pgstat_report_wait_end();
 #else
+		if (fsync(fd) != 0)
+		{
+			fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+					progname, ControlFilePath, strerror(errno));
+			exit(EXIT_FAILURE);
+		}
+#endif
+	}
+
 	if (close(fd) < 0)
 	{
 		fprintf(stderr, _("%s: could not close file \"%s\": %s\n"),
 				progname, ControlFilePath, strerror(errno));
 		exit(EXIT_FAILURE);
 	}
-#endif
 }
diff --git a/src/include/common/controldata_utils.h b/src/include/common/controldata_utils.h
index 95317ebacf..054454b319 100644
--- a/src/include/common/controldata_utils.h
+++ b/src/include/common/controldata_utils.h
@@ -12,10 +12,15 @@
 
 #include "catalog/pg_control.h"
 
+enum controlfile_update_bitmask_options {
+	CF_OPT_NONE = 0x00,
+	CF_OPT_NOSYNC = 0x01
+};
+
 extern ControlFileData *get_controlfile(const char *DataDir,
 										const char *progname,
 										bool *crc_ok_p);
 extern void update_controlfile(const char *DataDir, const char *progname,
-							   ControlFileData *ControlFile);
+							   ControlFileData *ControlFile, bits16 options);
 
 #endif							/* COMMON_CONTROLDATA_UTILS_H */
#105Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#104)
Re: Offline enabling/disabling of data checksums

On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:

I could remove the two "catalog/" includes from pg_resetwal, I assume that
you meant these ones.

Not exactly. What I meant is that if you try to call directly
fsync_fname and fsync_parent_path from file_utils.h, then you get into
trouble because of xlog.h.. Sure you can remove also the ones you
removed.

Hmmm. I just did that, but what about just a boolean? What other options
could be required? Maybe some locking/checking?

It is already expected from the caller to properly take
ControlFileLock. Note I tend to worry too much about the
extensibility of published APIs these days as well, so perhaps just a
boolean would be fine, please let me reconsider that after some sleep,
and it is not like the contents of this routine are going to become
much complicated either, except potentially to control the flags on
open(). :p

I kept the initial no-parameter function which calls the new one with 4
parameters, though, because it looks more homogeneous this way in the
backend code. This is debatable.

True, this actually makes back-patching a bit easier, and there are 13
calls of UpdateControlFile().

Attached is an update.

Thanks, I'll take a look at that tomorrow. You have one error at the
end of update_controlfile(), where close() could issue a frontend-like
error for the backend, calling exit() on the way. That's not good.
(No need to send a new patch, I'll fix it myself.)
--
Michael

#106Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#105)
Re: Offline enabling/disabling of data checksums

You have one error at the end of update_controlfile(), where close()
could issue a frontend-like error for the backend, calling exit() on the
way. That's not good. (No need to send a new patch, I'll fix it
myself.)

Indeed. I meant to merge the "if (close(fd))", but ended merging the error
generation as well.

--
Fabien

#107Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#104)
Re: Offline enabling/disabling of data checksums

On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:

I kept the initial no-parameter function which calls the new one with 4
parameters, though, because it looks more homogeneous this way in the
backend code. This is debatable.

From a compatibility point of view, your position actually makes
sense, at least to me and after sleeping on it as UpdateControlFile is
not static, and that there are interactions with the other local
routines to read and write the control file because of the variable
ControlFile at the top of xlog.c. So I have kept the original
interface, being now only a wrapper of the new routine.

Attached is an update.

Thanks, I have committed the patch after fixing a couple of things.
After considering the interface, I have switched to a single boolean
as I could not actually imagine with what kind of fancy features this
could be extended further more. If I am wrong, let's adjust it later
on. Here are my notes about the fixes:
- pg_resetwal got broken because the path to the control file was
incorrect. Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.
- The previously-mentioned problem with close() in the new routine is
fixed.
- Header comments at the top of update_controlfile were a bit messed
up (s/Not/Note/, missed an "up" as well).
- pg_rewind was issuing a flush of the control file even if --no-sync
was used.
- Nit: incorrect header order in controldata_utils.c. I have kept the
backend-only includes grouped though.
--
Michael

#108Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#107)
Re: Offline enabling/disabling of data checksums

Bonjour Michaël,

Here are my notes about the fixes:

Thanks for the fixes.

- pg_resetwal got broken because the path to the control file was
incorrect. Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.

Hmmm… I thought I had done that with "make check-world":-(

- pg_rewind was issuing a flush of the control file even if --no-sync
was used.

Indeed, I missed this one.

- Nit: incorrect header order in controldata_utils.c. I have kept the
backend-only includes grouped though.

I'll pay attention to that the next time.

Thanks for the push.

--
Fabien.

#109Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#101)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 15, 2019 at 01:37:27PM +0100, Michael Banck wrote:

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:

Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.

I would still keep them to be honest. I don't know, if others find
the tool too chatty we could always rework that part and tune it.

Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.
While reviewing, I have found a problem in the docs (forgot a <para>
markup previously), and there was a problem with the parent path fsync
causing an interruption to not return the correct error code, and
actually we should just use durable_rename() in this case (if
--no-sync gets in then pg_mv_file() should be used of course).

I have also been thinking about what we could add in the
documentation, so this version adds a draft to describe the cases
where enabling checksums can lead to corruption when involving
multiple nodes in a cluster and tools doing physical copy of relation
blocks.

I have not done the --no-sync part yet on purpose, as that will most
likely conflict based on the feedback received for this version..
--
Michael

Attachments:

0001-Add-options-to-enable-and-disable-checksums-in-pg_ch.patchtext/x-diff; charset=us-asciiDownload
From a85112d87ec4bc4b00d22c105b9958a2c70c3758 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 18 Mar 2019 17:12:15 +0900
Subject: [PATCH] Add options to enable and disable checksums in pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When using --disable, only the control file is updated and then
flushed.  When using --enable, the process gets more complicated as the
operation can be long:
- Rename the control file to a temporary name, to prevent a parallel
startup of Postgres.
- Scan all files and update their checksums.
- Rename back the control file.
- Flush the data directory.
- Update the control file and then flush it, to make the change
durable.
If the operation is interrupted, the control file gets moved back in
place.

If no mode is specified in the options, then --check is used for
compatibility with older versions of pg_verify_checksums (now renamed to
pg_checksums in v12).

Author: Michael Banck, Michael Paquier
Reviewed-by: Fabien Coelho, Magnus Hagander, Sergei Kornilov
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  72 ++++++-
 src/bin/pg_checksums/pg_checksums.c   | 280 +++++++++++++++++++++++---
 src/bin/pg_checksums/t/002_actions.pl |  76 +++++--
 src/tools/pgindent/typedefs.list      |   1 +
 4 files changed, 381 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..a7f4ef1024 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,24 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   Checking checksums requires to scan every file holding them in the data
+   folder.  Disabling checksums requires only an update of the file
+   <filename>pg_control</filename>.  Enabling checksums first renames
+   the file <filename>pg_control</filename> to
+   <filename>pg_control.pg_checksums_in_progress</filename> to prevent
+   a parallel startup of the cluster, then it updates all files with
+   checksums, and it finishes by renaming and updating
+   <filename>pg_control</filename> to mark checksums as enabled.
   </para>
  </refsect1>
 
@@ -60,6 +74,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
@@ -119,4 +164,21 @@ PostgreSQL documentation
    </varlistentry>
   </variablelist>
  </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>
+
 </refentry>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index b7ebc11017..b30ddababb 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -14,37 +15,80 @@
 #include "postgres_fe.h"
 
 #include <dirent.h>
+#include <signal.h>
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
 static ControlFileData *ControlFile;
-
+static char *DataDir = NULL;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static char controlfile_path[MAXPGPATH];
+static char controlfile_path_temp[MAXPGPATH];
+
+
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"
+
+
+#ifndef WIN32
+#define pg_mv_file			rename
+#else
+#define pg_mv_file			pgrename
+#endif
+
+static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums\n"));
+	printf(_("                         This is the default mode if nothing is specified.\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -54,6 +98,26 @@ usage(void)
 	printf(_("Report bugs to <pgsql-bugs@lists.postgresql.org>.\n"));
 }
 
+/*
+ * Clean up the temporary control file when enabling checksums in the
+ * event of an interruption.
+ */
+static void
+signal_cleanup(int signum)
+{
+	/* nothing to do if there is no temporary control file */
+	if (access(controlfile_path_temp, F_OK) != 0)
+		exit(signum);
+
+	if (durable_rename(controlfile_path_temp, controlfile_path, progname) != 0)
+	{
+		/* error is already logged on failure */
+		exit(1);
+	}
+
+	exit(signum);
+}
+
 /*
  * List of files excluded from checksum validation.
  *
@@ -61,6 +125,7 @@ usage(void)
  */
 static const char *const skip[] = {
 	"pg_control",
+	"pg_control.pg_checksums_in_progress",
 	"pg_filenode.map",
 	"pg_internal.init",
 	"PG_VERSION",
@@ -90,8 +155,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(mode == PG_MODE_ENABLE ||
+		   mode == PG_MODE_CHECK);
+
+	flags = (mode == PG_MODE_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +192,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (mode == PG_MODE_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (mode == PG_MODE_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (mode == PG_MODE_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (mode == PG_MODE_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,12 +334,14 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
 
-	char	   *DataDir = NULL;
 	int			c;
 	int			option_index;
 	bool		crc_ok;
@@ -262,10 +364,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				mode = PG_MODE_CHECK;
+				break;
+			case 'd':
+				mode = PG_MODE_DISABLE;
+				break;
+			case 'e':
+				mode = PG_MODE_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +423,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (mode != PG_MODE_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -339,29 +459,133 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		mode == PG_MODE_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/*
+	 * Allocate the control file paths here, as this gets used in various
+	 * phases.
+	 */
+	snprintf(controlfile_path, sizeof(controlfile_path),
+			 "%s/%s", DataDir, CONTROL_FILE_PATH);
+	snprintf(controlfile_path_temp, sizeof(controlfile_path_temp),
+			 "%s/%s", DataDir, CONTROL_FILE_PATH_TEMP);
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+	/* Prevent leaving behind any intermediate state */
+	pqsignal(SIGINT, signal_cleanup);
+	pqsignal(SIGTERM, signal_cleanup);
 
-	if (badblocks > 0)
-		return 1;
+	/*
+	 * The operation is good to move on with all the sanity checks done.
+	 * Enabling checksums can take a long time as all the files need to
+	 * be scanned and rewritten.  Hence, first, prevent any parallel startup
+	 * of the instance by renaming the control file when enabling checksums
+	 * so that it cannot be started by accident during the operation.
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
+				controlfile_path_temp);
+		if (pg_mv_file(controlfile_path, controlfile_path_temp) != 0)
+		{
+			fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
+					progname, controlfile_path, controlfile_path_temp,
+					strerror(errno));
+			exit(1);
+		}
+	}
+
+	/* Operate on all files if checking or enabling checksums */
+	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
+
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (mode == PG_MODE_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+
+			if (badblocks > 0)
+				exit(1);
+		}
+	}
+
+	/*
+	 * Now that enabling data checksums is done, first put the control
+	 * file back in place and then flush the data directory.  The control
+	 * file is updated and flushed in a follow-up step to never have the
+	 * data folder into an inconsistent state should a crash happen
+	 * in-between.
+	 */
+	if (mode == PG_MODE_ENABLE)
+	{
+		printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
+				controlfile_path);
+		if (pg_mv_file(controlfile_path_temp, controlfile_path) != 0)
+		{
+			fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
+					progname, controlfile_path_temp, controlfile_path,
+					strerror(errno));
+			exit(1);
+		}
+
+		printf(_("Syncing data folder\n"));
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+	}
+
+	/*
+	 * Finally update and flush the control file.
+	 */
+	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
+	{
+		printf(_("Updating control file\n"));
+		ControlFile->data_checksum_version =
+			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+
+		/* Note that this flushes the control file */
+		update_controlfile(DataDir, progname, ControlFile, true);
+
+		/*
+		 * Flush the parent path to make the change durable.
+		 */
+		if (fsync_parent_path(controlfile_path, progname) != 0)
+		{
+			/* error is already logged on failure */
+			exit(1);
+		}
+
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (mode == PG_MODE_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b301bce4b1..195b146974 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1710,6 +1710,7 @@ PgBenchExprType
 PgBenchFunction
 PgBenchValue
 PgBenchValueType
+PgChecksumMode
 PgFdwAnalyzeState
 PgFdwDirectModifyState
 PgFdwModifyState
-- 
2.20.1

#110Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#109)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.

Patch applies cleanly, compiles, make check-world seems ok, doc build ok.

It would help if the patch includes a version number. I assume that this
is v7.

Doc looks ok.

Moving the controlfile looks like an effective way to prevent any
concurrent start, as the fs operation is probably atomic and especially if
external tools uses the same trick. However this is not the case yet, eg
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method
could be unified, possibly with some functions in "controlfile_utils.c".

However, I think that there still is a race condition because of the order
in which it is implemented:

pg_checksums reads control file
pg_checksums checks control file contents...
** cluster may be started and the control file updated
pg_checksums moves the (updated) control file
pg_checksums proceeds on a running cluster
pg_checksums moves back the control file
pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

pg_checksums moves the control file
pg_checksums reads, checks, proceeds, updates
pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function
to allow a suffix. No big deal.

Ok, this might not work, because of the following, less likely, race
condition:

postmaster opens control file RW
pg_checksums moves control file, posmater open file handle follows
...

So ISTM that we really need some locking to have something clean.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

Help line about --check not simplified as suggested in a prior review,
although you said you would take it into account.

Tests look ok.

--
Fabien.

#111Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#110)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 19, 2019 at 11:48:25AM +0100, Fabien COELHO wrote:

Moving the controlfile looks like an effective way to prevent any concurrent
start, as the fs operation is probably atomic and especially if external
tools uses the same trick. However this is not the case yet, eg
"pg_resetwal" uses a "postmaster.pid" hack instead.

pg_upgrade does so. Note that pg_resetwal does not check either that
the PID in the file is actually running.

Probably the method could be unified, possibly with some functions
in "controlfile_utils.c".

Hm. postmaster.pid is just here to make sure that the instance is not
started at all, while we require the instance to be stopped cleanly
with other tools, so that's not really consistent in my opinion to
combine both.

Ok, this might not work, because of the following, less likely, race
condition:

postmaster opens control file RW
pg_checksums moves control file, postmater open file handle follows
...

So ISTM that we really need some locking to have something clean.

We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions. I am not sure that
it is worth complicating the code more.

Help line about --check not simplified as suggested in a prior review,
although you said you would take it into account.

Oops, it looks like this got lost because of the successive rebases.
I am sure to have updated it at some point.. Anyway, thanks for
pointing it out, I got that fixed on my local branch.
--
Michael

#112Michael Banck
michael.banck@credativ.de
In reply to: Fabien COELHO (#110)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 19.03.2019, 11:48 +0100 schrieb Fabien COELHO:

Moving the controlfile looks like an effective way to prevent any
concurrent start, as the fs operation is probably atomic and especially if
external tools uses the same trick. However this is not the case yet, eg
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method
could be unified, possibly with some functions in "controlfile_utils.c".

However, I think that there still is a race condition because of the order
in which it is implemented:

pg_checksums reads control file
pg_checksums checks control file contents...
** cluster may be started and the control file updated
pg_checksums moves the (updated) control file
pg_checksums proceeds on a running cluster
pg_checksums moves back the control file
pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

pg_checksums moves the control file
pg_checksums reads, checks, proceeds, updates
pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function
to allow a suffix. No big deal.

Ok, this might not work, because of the following, less likely, race
condition:

postmaster opens control file RW
pg_checksums moves control file, posmater open file handle follows
...

So ISTM that we really need some locking to have something clean.

I think starting the postmaster during offline maintenance is already
quite some pilot error. As pg_checksums can potentially run for hours
though, I agree it is important to disable the cluster in the meantime.

There's really not a lot going on between pg_checksums reading the
control file and moving it away - what you propose above sounds a bit
like overengineering to me.

If anything, we could include the postmaster.pid check from pg_resetwal
after we have renamed the control file to make absolutely sure that the
cluster is offline. Once the control file is gone and there is no
postmaster.pid, it surely cannot be pg_checksums' problem anymore if a
postmaster is started regardless of maintenance.

I leave that to Michael to decide whether he thinks the above is
warranted.

I think the more important open issue is what to do about PITR and
streaming replication, see my replies to Magnus elsewhere in the thread.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

pg_ugprade does it the same way, possibly both could be converted to
pgrename, dunno.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#113Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#111)
Re: Offline enabling/disabling of data checksums

Ok, this might not work, because of the following, less likely, race
condition:

postmaster opens control file RW
pg_checksums moves control file, postmater open file handle follows
...

So ISTM that we really need some locking to have something clean.

We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions. I am not sure that
it is worth complicating the code more.

Hmmm. Possibly. The point is that anything only needs to be implemented
once. The whole point of pg is to have ACID transactional properties, but
it does not achieve that on the controlfile, which I find paradoxical:-)

Now if there is a race condition opportunity, ISTM that it should be as
short as possible. Renaming before manipulating seems safer if other
commands proceeds like that as well. Probably if pg always rename *THEN*
open before doing anything in all commands it could be safe, provided that
the renaming is atomic, which I think is the case.

That would avoid locking, at the price of a small probability of having a
controlfile in a controlfile.command-that-failed-at-the-wrong-time state.
Maybe it is okay. Maybe there is a need to be able to force the
state back to something to recover from such unlikely event, but probably
it does already exists (eg postmaster could be dead without releasing the
controlfile state).

--
Fabien.

#114Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#109)
Re: Offline enabling/disabling of data checksums

Hi,

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

- Andres

#115Michael Banck
michael.banck@credativ.de
In reply to: Andres Freund (#114)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#116Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#115)
Re: Offline enabling/disabling of data checksums

On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

Hi,

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly. This is completely unacceptable *AND UNNECESSARY*.

Greetings,

Andres Freund

#117Michael Banck
michael.banck@credativ.de
In reply to: Andres Freund (#116)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:

On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly. This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

If pg_checksums crashes due to I/O failures or other causes I can see
how possibly the block it was currently writing might need low level
surgery, but in that case we are in the domain of forensics already I
guess and that still does not pertain to the control file?

Sorry for being obtuse, I don't get it.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#118Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#117)
Re: Offline enabling/disabling of data checksums

Hi,

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:

On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly. This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that? Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

Greetings,

Andres Freund

#119Michael Banck
michael.banck@credativ.de
In reply to: Andres Freund (#118)
Re: Offline enabling/disabling of data checksums

Hi,

Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:

On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly. This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that? Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

Ok, thanks for explaining. 

I guess if we check for the temporary name in postmaster during startup
if pg_control isn't there then a more generally useful name like
"pg_control.maintenance" should be picked. We could then spit out a nice
error message or hint like "the cluster has been disabled for
maintenance. In order to start it up anyway, rename
pg_control.maintenance to pg_control" or so.

In any case, this would be more of a operational or availability issue
and not a data-loss issue, as I feared from your previous mails.

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

I think the hazard of another DBA (or some automated configuration
management or HA tool for that matter) looking at postmaster.pid,
deciding that it is not a legit file from a running instance, deleting
it and then starting up Postgres while pg_checksums is still at work is
worse than the above scenario, but maybe if we make the content of
postmaster.pid clear enough (like "maintenance in progress"?) it would
be enough of a hint? Or do you have concrete suggestions on how this
should work?

I had the feeling (ab)using postmaster.pid for this would fly less than
using the same scheme as pg_upgrade does, but I'm fine doing it either
way.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#120Andres Freund
andres@anarazel.de
In reply to: Michael Banck (#119)
Re: Offline enabling/disabling of data checksums

Hi,

On 2019-03-19 17:30:16 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:

On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:

+/*
+ * Locations of persistent and temporary control files.  The control
+ * file gets renamed into a temporary location when enabling checksums
+ * to prevent a parallel startup of Postgres.
+ */
+#define CONTROL_FILE_PATH		"global/pg_control"
+#define CONTROL_FILE_PATH_TEMP	CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this. This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly. This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that? Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

Ok, thanks for explaining.�

I guess if we check for the temporary name in postmaster during startup
if pg_control isn't there then a more generally useful name like
"pg_control.maintenance" should be picked. We could then spit out a nice
error message or hint like "the cluster has been disabled for
maintenance. In order to start it up anyway, rename
pg_control.maintenance to pg_control" or so.

To be very clear: I am going to try to stop any patch with this
mechanism from going into the tree. I think it's an absurdly bad
idea. There'd need to be significant support from a number of other
committers to convince me otherwise.

In any case, this would be more of a operational or availability issue
and not a data-loss issue, as I feared from your previous mails.

It's just about undistinguishable for normal users.

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

I think the hazard of another DBA (or some automated configuration
management or HA tool for that matter) looking at postmaster.pid,
deciding that it is not a legit file from a running instance, deleting
it and then starting up Postgres while pg_checksums is still at work is
worse than the above scenario, but maybe if we make the content of
postmaster.pid clear enough (like "maintenance in progress"?) it would
be enough of a hint?

Err, how would such a tool decide to do that safely? And if it did so,
how would it not cause problems in postgres' normal operation, given
that that postmaster.pid is crucial to prevent two postgres instances
running at the same time?

Or do you have concrete suggestions on how this should work?

create a postmaster.pid with the pid of pg_checksums. That'll trigger a
postgres start from checking whether that pid is still alive. There'd
probably need to be a tiny change to CreateLockFile() to prevent it from
checking whether any shared memory is connected. FWIW, it'd probably
actually be good if pg_checksums (and some other tools), did most if not
all the checks in CreateLockFile().

I'm not sure it needs to be this patch's responsibility to come up with
a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
don't really have a lockout mechanism, and it hasn't caused a ton of
problems. I think it'd be good to invent something better, but it can't
be some half assed approach that'll lead to people think their database
is gone.

I had the feeling (ab)using postmaster.pid for this would fly less than
using the same scheme as pg_upgrade does, but I'm fine doing it either
way.

I don't think pg_upgrade is a valid comparison, given that the goal
there is to permanently disable the cluster. It also emits a hint about
it. And only does so at the end of a run.

Greetings,

Andres Freund

#121Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#120)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 19, 2019 at 09:47:17AM -0700, Andres Freund wrote:

I'm not sure it needs to be this patch's responsibility to come up with
a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
don't really have a lockout mechanism, and it hasn't caused a ton of
problems. I think it'd be good to invent something better, but it can't
be some half assed approach that'll lead to people think their database
is gone.

Amen. Take it as you wish, but that's actually what I was mentioning
upthread one week ago where I argued that it is a problem, but not a
problem of this patch and that this problems concerns other tools:
/messages/by-id/20190313093150.GE2988@paquier.xyz
And then, my position has been overthrown by anybody on this thread.
So I am happy to see somebody chiming in and say the same thing.

Honestly, I think that what I sent last week, with a patch in its
simplest form, would be enough:
/messages/by-id/20190313021621.GP13812@paquier.xyz

In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.
--
Michael

#122Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#121)
2 attachment(s)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 20, 2019 at 08:09:07AM +0900, Michael Paquier wrote:

In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.

FWIW, the simplest version is just like the attached.
--
Michael

Attachments:

v8-0001-Add-options-to-enable-and-disable-checksums-in-pg.patchtext/x-diff; charset=us-asciiDownload
From c71c1c8d6d5093bcea90b75cbb0c1348a823f08d Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 20 Mar 2019 14:14:33 +0900
Subject: [PATCH v8 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  50 +++++++-
 src/bin/pg_checksums/pg_checksums.c   | 173 ++++++++++++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl |  76 ++++++++---
 src/tools/pgindent/typedefs.list      |   1 +
 4 files changed, 254 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..32bdb0f5c2 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,19 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling checksums will only update the file
+   <filename>pg_control</filename>.
   </para>
  </refsect1>
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index b7ebc11017..339f6ad7f5 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -35,16 +37,38 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static PgChecksumMode mode = PG_MODE_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums (default)\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -90,8 +114,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(mode == PG_MODE_ENABLE ||
+		   mode == PG_MODE_CHECK);
+
+	flags = (mode == PG_MODE_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +151,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (mode == PG_MODE_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (mode == PG_MODE_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (mode == PG_MODE_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (mode == PG_MODE_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,7 +293,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -262,10 +324,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				mode = PG_MODE_CHECK;
+				break;
+			case 'd':
+				mode = PG_MODE_DISABLE;
+				break;
+			case 'e':
+				mode = PG_MODE_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +383,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (mode != PG_MODE_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -339,29 +419,70 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		mode == PG_MODE_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/* Operate on all files if checking or enabling checksums */
+	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (mode == PG_MODE_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
-		return 1;
+			if (badblocks > 0)
+				exit(1);
+		}
+	}
+
+	/*
+	 * Finally update the control file, flushing the data directory at the
+	 * end.
+	 */
+	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
+	{
+		/* Update control file */
+		ControlFile->data_checksum_version =
+			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+
+		printf(_("Syncing data directory\n"));
+		update_controlfile(DataDir, progname, ControlFile, false);
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (mode == PG_MODE_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b301bce4b1..195b146974 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1710,6 +1710,7 @@ PgBenchExprType
 PgBenchFunction
 PgBenchValue
 PgBenchValueType
+PgChecksumMode
 PgFdwAnalyzeState
 PgFdwDirectModifyState
 PgFdwModifyState
-- 
2.20.1

v8-0002-Add-option-N-no-sync-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From e8eb4fca2f66e0cd29e7bef4a3a1fbcc50d74c5b Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 20 Mar 2019 14:16:18 +0900
Subject: [PATCH v8 2/2] Add option -N/--no-sync to pg_checksums

This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    | 16 ++++++++++++++++
 src/bin/pg_checksums/pg_checksums.c   | 11 +++++++++--
 src/bin/pg_checksums/t/002_actions.pl | 10 +++++-----
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 32bdb0f5c2..a8f4dafc1e 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_checksums</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_checksums</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the updated data folder corrupt.  Generally, this option is useful
+        for testing but should not be used on a production installation.
+        This option has no effect when using <literal>--check</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 339f6ad7f5..7bf82de958 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -35,6 +35,7 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static bool do_sync = true;
 static bool verbose = false;
 
 typedef enum
@@ -69,6 +70,7 @@ usage(void)
 	printf(_("  -c, --check            check data checksums (default)\n"));
 	printf(_("  -d, --disable          disable data checksums\n"));
 	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("  -N, --no-sync          do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -297,6 +299,7 @@ main(int argc, char *argv[])
 		{"pgdata", required_argument, NULL, 'D'},
 		{"disable", no_argument, NULL, 'd'},
 		{"enable", no_argument, NULL, 'e'},
+		{"no-sync", no_argument, NULL, 'N'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -324,7 +327,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNr:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -337,6 +340,9 @@ main(int argc, char *argv[])
 			case 'e':
 				mode = PG_MODE_ENABLE;
 				break;
+			case 'N':
+				do_sync = false;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -474,7 +480,8 @@ main(int argc, char *argv[])
 
 		printf(_("Syncing data directory\n"));
 		update_controlfile(DataDir, progname, ControlFile, false);
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (do_sync)
+			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
 
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 3ab18a6b89..41575c5245 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -101,11 +101,11 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Enable checksums.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	   "checksums successfully enabled in cluster");
 
 # Successive attempt to enable checksums fails.
-command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+command_fails(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	      "enabling checksums fails if already enabled");
 
 # Control file should know that checksums are enabled.
@@ -113,12 +113,12 @@ command_like(['pg_controldata', $pgdata],
 	     qr/Data page checksum version:.*1/,
 	     'checksums enabled in control file');
 
-# Disable checksums again.
+# Disable checksums again.  Flush result here as that should be cheap.
 command_ok(['pg_checksums', '--disable', '-D', $pgdata],
 	   "checksums successfully disabled in cluster");
 
 # Successive attempt to disable checksums fails.
-command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+command_fails(['pg_checksums', '--disable', '--no-sync', '-D', $pgdata],
 	      "disabling checksums fails if already disabled");
 
 # Control file should know that checksums are disabled.
@@ -127,7 +127,7 @@ command_like(['pg_controldata', $pgdata],
 		 'checksums disabled in control file');
 
 # Enable checksums again for follow-up tests.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 		   "checksums successfully enabled in cluster");
 
 # Control file should know that checksums are enabled.
-- 
2.20.1

#123Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#118)
Re: Offline enabling/disabling of data checksums

Hallo Andres,

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

I did not suggest to rename the control file, but as it is already done by
another command it did not look like a bad idea in itself, or at least an
already used bad idea:-)

I'd be okay with anything that works consistently accross all commands
that may touch a cluster and are mutually exclusive (postmater, pg_rewind,
pg_resetwal, pg_upgrade, pg_checksums…), without underlying race
conditions. It could be locking, a control file state, a special file
(which one ? what is the procedure to create/remove it safely and avoid
potential race conditions ?), possibly "postmaster.pid", whatever really.

I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
it does not do anymore what the file names says, but if it really works
and is used consistently by all commands, why not. In case of unexpected
problems, the file will probably have to be removed/fixed by hand. I also
think that the implemented mechanism should be made available in
"control_utils.c", not duplicated in every command.

--
Fabien.

#124Andres Freund
andres@anarazel.de
In reply to: Fabien COELHO (#123)
Re: Offline enabling/disabling of data checksums

Hi,

On 2019-03-20 07:55:39 +0100, Fabien COELHO wrote:

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

I did not suggest to rename the control file, but as it is already done by
another command it did not look like a bad idea in itself, or at least an
already used bad idea:-)

pg_upgrade in link mode intentionally wants to *permanently* disable a
cluster. And it explicitly writes a log message about it. That's not a
case to draw inferrence for this case.

I'd be okay with anything that works consistently accross all commands that
may touch a cluster and are mutually exclusive (postmater, pg_rewind,
pg_resetwal, pg_upgrade, pg_checksums…), without underlying race conditions.
It could be locking, a control file state, a special file (which one ? what
is the procedure to create/remove it safely and avoid potential race
conditions ?), possibly "postmaster.pid", whatever really.

I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
it does not do anymore what the file names says, but if it really works and
is used consistently by all commands, why not. In case of unexpected
problems, the file will probably have to be removed/fixed by hand. I also
think that the implemented mechanism should be made available in
"control_utils.c", not duplicated in every command.

That's just a separate feature.

Greetings,

Andres Freund

#125Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Andres Freund (#124)
Re: Offline enabling/disabling of data checksums

Hallo Andres,

[...]

pg_upgrade in link mode intentionally wants to *permanently* disable a
cluster. And it explicitly writes a log message about it. That's not a
case to draw inferrence for this case.

Ok. My light knowledge of pg_upgrade inner working does not extend to this
level of precision.

I'd be okay with anything that works consistently accross all commands
[...]

I'll admit that I'm moderately enthousiastic about "posmaster.pid" because
it does not do anymore what the file names says, but if it really works and
is used consistently by all commands, why not. In case of unexpected
problems, the file will probably have to be removed/fixed by hand. I also
think that the implemented mechanism should be made available in
"control_utils.c", not duplicated in every command.

That's just a separate feature.

Possibly, although I'm not sure what in the above is a "separate feature",
I assume from the "pg_checksum --enable" implementation.

Is it the fact that there could (should, IMO) be some mechanisms to ensure
that mutually exclusive direct cluster-modification commands are not run
concurrently?

As "pg_checksums -e" is a potentially long running command, the likelyhood
of self-inflected wounds is raised significantly: I could do absurd things
on an enable-checksum-in-progress cluster on a previous version of the
patch. Thus as a reviewer I'm suggesting to fix the issue.

Or is it the fact that fixing on some critical errors would possibly
involve some manual intervention at some point?

Or is it something else?

--
Fabien.

#126Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#122)
Re: Offline enabling/disabling of data checksums

Michaël-san,

In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.

Hmmm… so nothing:-)

I think that this feature is useful, in complement to a possible
online-enabling server-managed version.

About patch v8 part 1: applies cleanly, compiles, global & local make
check ok, doc build ok.

I think that a clear warning not to run any cluster command in parallel,
under pain of possible cluster corruption, and possibly other caveats
about replication, should appear in the documentation.

I also think that some mechanism should be used to prevent such occurence,
whether in this patch or another.

About "If enabling or disabling checksums, the exit status is nonzero if
the operation failed." I must admit that enabling/disabling an already
enabled/disabled cluster is still not really a failure for me, because the
end status is that the cluster is in the state required by the user.

I still think that the control file update should be made *after* the data
directory has been synced, otherwise the control file could be updated
while some data files are not. It just means exchanging two lines.

About patch v8 part 2: applies cleanly, compiles, global & local make
check ok.

The added -N option is not documented.

If the conditional sync is moved before the file update, the file update
should pass do_sync to update_controlfile.

--
Fabien.

#127Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#126)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote:

Hmmm… so nothing:-)

The core of the feature is still here, fortunately.

I think that a clear warning not to run any cluster command in parallel,
under pain of possible cluster corruption, and possibly other caveats about
replication, should appear in the documentation.

I still have the following extra documentation in my notes:
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>
This sounds kind of enough for me but..

I also think that some mechanism should be used to prevent such occurence,
whether in this patch or another.

I won't counter-argue on that.

I still think that the control file update should be made *after* the data
directory has been synced, otherwise the control file could be updated while
some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

If the conditional sync is moved before the file update, the file update
should pass do_sync to update_controlfile.

For the durability of the operation, the current order is
intentional.
--
Michael

#128Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#127)
Re: Offline enabling/disabling of data checksums

Michaᅵl-san,

I think that a clear warning not to run any cluster command in parallel,
under pain of possible cluster corruption, and possibly other caveats about
replication, should appear in the documentation.

I still have the following extra documentation in my notes:

Ok, it should have been included in the patch.

+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,

ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an
Oracle cluster in postgres lingo, though.

+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.

I think that the motivation/risks should appear before the solution. "As
xyz ..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems
specific to "pg_rewind", are linked to the previous advice, which seems
rather to refer to streaming replication?

+   When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>

Should not disabling in reverse order be safe? the checksum are not
checked afterwards?

This sounds kind of enough for me but..

ISTM that the risks could be stated more clearly.

I also think that some mechanism should be used to prevent such occurence,
whether in this patch or another.

I won't counter-argue on that.

This answer is ambiguous.

I still think that the control file update should be made *after* the data
directory has been synced, otherwise the control file could be updated while
some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

I do not understand. The issue I'm refering to is, when enabling:

- data files are updated in fs cache
- control file is updated in fs cache
* fsync is called
- updated control file gets written
- data files are being written...
but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.

If the conditional sync is moved before the file update, the file update
should pass do_sync to update_controlfile.

For the durability of the operation, the current order is intentional.

See my point above: I think that this order can lead to cluster
corruption. If not, please be kind enough to try to explain in more
details why I'm wrong.

--
Fabien.

#129Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#128)
Re: Offline enabling/disabling of data checksums

On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:

I think that the motivation/risks should appear before the solution. "As xyz
..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems specific
to "pg_rewind", are linked to the previous advice, which seems rather to
refer to streaming replication?

Do you have a better idea of formulation? If you have a failover
which makes use of pg_rewind, or use some backup tool which does
incremental copy of physical blocks like pg_rman, then you have a risk
to mess up instances in a cluster which is the risk I am trying to
outline here. It is actually fine to do the following actually if
everything is WAL-based as checksums are only computed once a shared
buffer is flushed on a single instance. Imagine for example a
primary-standby with checksums disabled:
- Shutdown cleanly standby, enable checksums.
- Plug back standby to cluster, let it replay up to the latest point.
- Shutdown cleanly primary.
- Promote standby.
- Enable checksums on the previous primary.
- Add recovery parameters and recommend the primary back to the
cluster.
- All nodes have checksums enabled, without rebuilding any of them.
Per what I could see on this thread, folks tend to point out that
we should *not* include such recommendations in the docs, as it is too
easy to do a pilot error.

+   When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  </para>
+ </refsect1>

Should not disabling in reverse order be safe? the checksum are not checked
afterwards?

I don't quite understand your comment about the ordering. If all
the standbys are destroyed first, then enabling/disabling checksums
happens at a single place.

After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two. I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve. If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.
--
Michael

#130Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#129)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 21, 2019 at 07:59:24AM +0900, Michael Paquier wrote:

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two. I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve. If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.

Anyway, as this stuff is very useful for upgrade scenarios
a-la-pg_upgrade, for backup validation and as it does not produce
false positives, I would really like to get something committed for
v12 in its simplest form... Are there any recommendations that people
would like to add to the documentation?
--
Michael

#131Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#129)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:

I think that the motivation/risks should appear before the solution. "As xyz
..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems specific
to "pg_rewind", are linked to the previous advice, which seems rather to
refer to streaming replication?

Do you have a better idea of formulation?

I can try, but I must admit that I'm fuzzy about the actual issue. Is
there a problem on a streaming replication with inconsistent checksum
settings, or not?

You seem to suggest that the issue is more about how some commands or
backup tools operate on a cluster.

I'll reread the thread carefully and will make a proposal.

Imagine for example a primary-standby with checksums disabled: [...]

Yep, that's cool.

Should not disabling in reverse order be safe? the checksum are not checked
afterwards?

I don't quite understand your comment about the ordering. If all the
standbys are destroyed first, then enabling/disabling checksums happens
at a single place.

Sure. I was suggesting that disabling on replicated clusters is possibly
safer, but do not know the detail of replication & checksumming with
enough precision to be that sure about it.

After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.

The fact that other commands are exposed to the same potential risk is not
a very good argument not to fix it.

I am not saying that it is not a problem in some cases, but I am saying
that this is not a problem that this patch should solve.

As solving the issue involves exchanging two lines and turning one boolean
parameter to true, I do not see why it should not be done. Fixing the
issue takes much less time than writing about it...

And if other commands can be improved fine with me.

If we were to do something about that, it could make sense to make
fsync_pgdata() smarter so as the control file is flushed last there, or
define flush strategies there.

ISTM that this would not work: The control file update can only be done
*after* the fsync to describe the cluster actual status, otherwise it is
just a question of luck whether the cluster is corrupt on an crash while
fsyncing. The enforced order of operation, with a barrier in between, is
the important thing here.

--
Fabien.

#132Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#130)
Re: Offline enabling/disabling of data checksums

Anyway, as this stuff is very useful for upgrade scenarios
a-la-pg_upgrade, for backup validation and as it does not produce false
positives, I would really like to get something committed for v12 in its
simplest form...

Fine with me, the detailed doc is not a showstopper and can be improved
later one.

Are there any recommendations that people would like to add to the
documentation?

For me, I just want at least a clear warning on potential risks.

--
Fabien.

#133Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#131)
2 attachment(s)
Re: Offline enabling/disabling of data checksums

On Thu, Mar 21, 2019 at 08:17:32AM +0100, Fabien COELHO wrote:

I can try, but I must admit that I'm fuzzy about the actual issue. Is there
a problem on a streaming replication with inconsistent checksum settings, or
not?

You seem to suggest that the issue is more about how some commands or backup
tools operate on a cluster.

Yes. That's what I am writing about. Imagine for example this case
with pg_rewind:
- primary has checksums enabled.
- standby has checksums disabled.
- a hard crash of the primary happens, there is a failover to the
standby which gets promoted.
- The primary's host is restarted, it is started and stopped once
cleanly to have a clear point in its past timeline where WAL forked
thanks to the generation of at least the shutdown checkpoint generated
by the clean stop.
- pg_rewind is run, copying some pages from the promoted standby,
which don't have checksums, to the primary with checksums enabled, and
causing some pages to have an incorrect checksum.

There is another tool I know of which is called pg_rman, which is a
backup tool able to take incremental backups in the shape of a delta
of relation blocks. Then imagine the following:
- One instance of Postgres runs, has checksums disabled.
- pg_rman takes a full backup of it.
- Checksums are enabled on this instance.
- An incremental backup from the previous full backup point is taken.
If I recall correctly pg_rman takes a copy of the new control file as
well, which tracks checksums as being enabled.
- A crash happens, the data folder is dead.
- Rollback to the previous backup is done, and we restore up to a
point after the incremental backup.
- And you finish with a cluster which has checksums enabled, but as
the initial full backup had checksums disabled, not all the pages may
be in a correct state.

So I think that it makes sense to tell to be careful within the
documentation, but being too much careful in the tool discards also
many possibilities (see the example of the clean failover where it is
possible to enable checksums with no actual downtime). And this part
has a lot of value.

ISTM that this would not work: The control file update can only be done
*after* the fsync to describe the cluster actual status, otherwise it is
just a question of luck whether the cluster is corrupt on an crash while
fsyncing. The enforced order of operation, with a barrier in between, is the
important thing here.

Done the switch for this case. For pg_rewind actually I think that
this is an area where its logic could be improved a bit. So first
the data folder is synced, and then the control file is updated. It
took less time to change the code than to write this paragraph,
including the code compilation and one run of the TAP tests,
confirmed.

I have added in the docs a warning about a host crash while doing the
operation, with a recommendation to check the state of the checksums
on the data folder should it happen, and the previous portion of the
docs about clusters. Your suggestion sounds adapted. I would be
tempted to add a bigger warning in pg_rewind or pg_basebackup about
that, but that's a different story for another time.

Does that look fine to you?
--
Michael

Attachments:

v9-0001-Add-options-to-enable-and-disable-checksums-in-pg.patchtext/x-diff; charset=us-asciiDownload
From 7ef4a14421b9999d148fa24f3107e8ef8be8b348 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 22 Mar 2019 09:21:18 +0900
Subject: [PATCH v9 1/2] Add options to enable and disable checksums in
 pg_checksums

An offline cluster can now work with more modes in pg_checksums:
- --enable can enable checksums in a cluster, updating all blocks with a
correct checksum, and update the control file at the end.
- --disable can disable checksums in a cluster, updating the the control
file.
- --check is an extra option able to verify checksums for a cluster.

When running --enable or --disable, the data folder gets fsync'd for
durability.  If no mode is specified in the options, then --check is
used for compatibility with older versions of pg_verify_checksums (now
renamed to pg_checksums in v12).

Author: Michael Banck
Reviewed-by: Fabien Coelho, Michael Paquier
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    |  72 ++++++++++-
 src/bin/pg_checksums/pg_checksums.c   | 175 ++++++++++++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl |  76 ++++++++---
 src/tools/pgindent/typedefs.list      |   1 +
 4 files changed, 278 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 6a47dda683..fda85e7ea0 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -16,7 +16,7 @@ PostgreSQL documentation
 
  <refnamediv>
   <refname>pg_checksums</refname>
-  <refpurpose>verify data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
+  <refpurpose>enable, disable or check data checksums in a <productname>PostgreSQL</productname> database cluster</refpurpose>
  </refnamediv>
 
  <refsynopsisdiv>
@@ -36,10 +36,19 @@ PostgreSQL documentation
  <refsect1 id="r1-app-pg_checksums-1">
   <title>Description</title>
   <para>
-   <application>pg_checksums</application> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <application>pg_checksums</application> checks, enables or disables data
+   checksums in a <productname>PostgreSQL</productname> cluster.  The server
+   must be shut down cleanly before running
+   <application>pg_checksums</application>. The exit status is zero if there
+   are no checksum errors when checking them, and nonzero if at least one
+   checksum failure is detected. If enabling or disabling checksums, the
+   exit status is nonzero if the operation failed.
+  </para>
+
+  <para>
+   While checking or enabling checksums needs to scan or write every file in
+   the cluster, disabling checksums will only update the file
+   <filename>pg_control</filename>.
   </para>
  </refsect1>
 
@@ -60,6 +69,37 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-c</option></term>
+      <term><option>--check</option></term>
+      <listitem>
+       <para>
+        Checks checksums. This is the default mode if nothing else is
+        specified.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <term><option>--disable</option></term>
+      <listitem>
+       <para>
+        Disables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-e</option></term>
+      <term><option>--enable</option></term>
+      <listitem>
+       <para>
+        Enables checksums.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
@@ -119,4 +159,26 @@ PostgreSQL documentation
    </varlistentry>
   </variablelist>
  </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently. When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   <xref linkend="app-pgrewind"/>), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes. Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreating the cluster nodes from scratch is
+   also safe.
+  </para>
+  <para>
+   If the event of a crash of the operating system while enabling or
+   disabling checksums, the data folder may have checksums in an inconsistent
+   state, in which case it is recommended to check the state of checksums
+   in the data folder.
+  </para>
+  </refsect1>
 </refentry>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index b7ebc11017..f640fda14b 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -1,7 +1,8 @@
 /*-------------------------------------------------------------------------
  *
  * pg_checksums.c
- *	  Verifies page level checksums in an offline cluster.
+ *	  Checks, enables or disables page level checksums for an offline
+ *	  cluster
  *
  * Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -17,14 +18,15 @@
 #include <sys/stat.h>
 #include <unistd.h>
 
-#include "catalog/pg_control.h"
+#include "access/xlog_internal.h"
 #include "common/controldata_utils.h"
+#include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "getopt_long.h"
 #include "pg_getopt.h"
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/checksum_impl.h"
-#include "storage/fd.h"
 
 
 static int64 files = 0;
@@ -35,16 +37,38 @@ static ControlFileData *ControlFile;
 static char *only_relfilenode = NULL;
 static bool verbose = false;
 
+typedef enum
+{
+	PG_MODE_CHECK,
+	PG_MODE_DISABLE,
+	PG_MODE_ENABLE
+} PgChecksumMode;
+
+/*
+ * Filename components.
+ *
+ * XXX: fd.h is not declared here as frontend side code is not able to
+ * interact with the backend-side definitions for the various fsync
+ * wrappers.
+ */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
+static PgChecksumMode mode = PG_MODE_CHECK;
+
 static const char *progname;
 
 static void
 usage(void)
 {
-	printf(_("%s verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
+	printf(_("%s enables, disables or verifies data checksums in a PostgreSQL database cluster.\n\n"), progname);
 	printf(_("Usage:\n"));
 	printf(_("  %s [OPTION]... [DATADIR]\n"), progname);
 	printf(_("\nOptions:\n"));
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
+	printf(_("  -c, --check            check data checksums (default)\n"));
+	printf(_("  -d, --disable          disable data checksums\n"));
+	printf(_("  -e, --enable           enable data checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -90,8 +114,14 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	int			flags;
+
+	Assert(mode == PG_MODE_ENABLE ||
+		   mode == PG_MODE_CHECK);
+
+	flags = (mode == PG_MODE_ENABLE) ? O_RDWR : O_RDONLY;
+	f = open(fn, PG_BINARY | flags, 0);
 
-	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
@@ -121,18 +151,47 @@ scan_file(const char *fn, BlockNumber segmentno)
 			continue;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
-		if (csum != header->pd_checksum)
+		if (mode == PG_MODE_CHECK)
 		{
-			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
-						progname, fn, blockno, csum, header->pd_checksum);
-			badblocks++;
+			if (csum != header->pd_checksum)
+			{
+				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+					fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
+							progname, fn, blockno, csum, header->pd_checksum);
+				badblocks++;
+			}
+		}
+		else if (mode == PG_MODE_ENABLE)
+		{
+			/* Set checksum in page header */
+			header->pd_checksum = csum;
+
+			/* Seek back to beginning of block */
+			if (lseek(f, -BLCKSZ, SEEK_CUR) < 0)
+			{
+				fprintf(stderr, _("%s: seek failed for block %d in file \"%s\": %s\n"), progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
+
+			/* Write block with checksum */
+			if (write(f, buf.data, BLCKSZ) != BLCKSZ)
+			{
+				fprintf(stderr, "%s: could not update checksum of block %d in file \"%s\": %s\n",
+						progname, blockno, fn, strerror(errno));
+				exit(1);
+			}
 		}
 	}
 
 	if (verbose)
-		fprintf(stderr,
-				_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+	{
+		if (mode == PG_MODE_CHECK)
+			fprintf(stderr,
+					_("%s: checksums verified in file \"%s\"\n"), progname, fn);
+		if (mode == PG_MODE_ENABLE)
+			fprintf(stderr,
+					_("%s: checksums enabled in file \"%s\"\n"), progname, fn);
+	}
 
 	close(f);
 }
@@ -234,7 +293,10 @@ int
 main(int argc, char *argv[])
 {
 	static struct option long_options[] = {
+		{"check", no_argument, NULL, 'c'},
 		{"pgdata", required_argument, NULL, 'D'},
+		{"disable", no_argument, NULL, 'd'},
+		{"enable", no_argument, NULL, 'e'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -262,10 +324,19 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "D:r:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
+			case 'c':
+				mode = PG_MODE_CHECK;
+				break;
+			case 'd':
+				mode = PG_MODE_DISABLE;
+				break;
+			case 'e':
+				mode = PG_MODE_ENABLE;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -312,6 +383,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Relfilenode checking only works in --check mode */
+	if (mode != PG_MODE_CHECK && only_relfilenode)
+	{
+		fprintf(stderr, _("%s: relfilenode option only possible with --check\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
 	/* Check if cluster is running */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
@@ -339,29 +419,72 @@ main(int argc, char *argv[])
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
+		fprintf(stderr, _("%s: cluster must be shut down\n"), progname);
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version == 0)
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_CHECK)
 	{
 		fprintf(stderr, _("%s: data checksums are not enabled in cluster\n"), progname);
 		exit(1);
 	}
+	if (ControlFile->data_checksum_version == 0 &&
+		mode == PG_MODE_DISABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already disabled in cluster.\n"), progname);
+		exit(1);
+	}
+	if (ControlFile->data_checksum_version > 0 &&
+		mode == PG_MODE_ENABLE)
+	{
+		fprintf(stderr, _("%s: data checksums are already enabled in cluster.\n"), progname);
+		exit(1);
+	}
 
-	/* Scan all files */
-	scan_directory(DataDir, "global");
-	scan_directory(DataDir, "base");
-	scan_directory(DataDir, "pg_tblspc");
+	/* Operate on all files if checking or enabling checksums */
+	if (mode == PG_MODE_CHECK || mode == PG_MODE_ENABLE)
+	{
+		scan_directory(DataDir, "global");
+		scan_directory(DataDir, "base");
+		scan_directory(DataDir, "pg_tblspc");
 
-	printf(_("Checksum scan completed\n"));
-	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
-	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
-	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
-	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+		printf(_("Checksum operation completed\n"));
+		printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
+		printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+		if (mode == PG_MODE_CHECK)
+		{
+			printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 
-	if (badblocks > 0)
-		return 1;
+			if (badblocks > 0)
+				exit(1);
+		}
+	}
+
+	/*
+	 * Finally make the data durable on disk if enabling or disabling
+	 * checksums.  Flush first the data directory for safety, and then
+	 * update the control file to keep the switch consistency.
+	 */
+	if (mode == PG_MODE_ENABLE || mode == PG_MODE_DISABLE)
+	{
+		ControlFile->data_checksum_version =
+			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
+
+		printf(_("Syncing data directory\n"));
+		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+
+		printf(_("Updating control file\n"));
+		update_controlfile(DataDir, progname, ControlFile, true);
+
+		if (verbose)
+			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+		if (mode == PG_MODE_ENABLE)
+			printf(_("Checksums enabled in cluster\n"));
+		else
+			printf(_("Checksums disabled in cluster\n"));
+	}
 
 	return 0;
 }
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 97284e8930..3ab18a6b89 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 45;
+use Test::More tests => 62;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -38,8 +38,8 @@ sub check_relation_corruption
 
 	# Checksums are correct for single relfilenode as the table is not
 	# corrupted yet.
-	command_ok(['pg_checksums',  '-D', $pgdata,
-		'-r', $relfilenode_corrupted],
+	command_ok(['pg_checksums',  '--check', '-D', $pgdata, '-r',
+			   $relfilenode_corrupted],
 		"succeeds for single relfilenode on tablespace $tablespace with offline cluster");
 
 	# Time to create some corruption
@@ -49,15 +49,15 @@ sub check_relation_corruption
 	close $file;
 
 	# Checksum checks on single relfilenode fail
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata, '-r',
-								$relfilenode_corrupted],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata,
+							  '-r', $relfilenode_corrupted],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
 							  "fails with corrupted data for single relfilenode on tablespace $tablespace");
 
 	# Global checksum checks fail as well
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 							  1,
 							  [qr/Bad checksums:.*1/],
 							  [qr/checksum verification failed/],
@@ -67,22 +67,22 @@ sub check_relation_corruption
 	$node->start;
 	$node->safe_psql('postgres', "DROP TABLE $table;");
 	$node->stop;
-	$node->command_ok(['pg_checksums', '-D', $pgdata],
+	$node->command_ok(['pg_checksums', '--check', '-D', $pgdata],
 	        "succeeds again after table drop on tablespace $tablespace");
 
 	$node->start;
 	return;
 }
 
-# Initialize node with checksums enabled.
+# Initialize node with checksums disabled.
 my $node = get_new_node('node_checksum');
-$node->init(extra => ['--data-checksums']);
+$node->init();
 my $pgdata = $node->data_dir;
 
-# Control file should know that checksums are enabled.
+# Control file should know that checksums are disabled.
 command_like(['pg_controldata', $pgdata],
-	     qr/Data page checksum version:.*1/,
-		 'checksums enabled in control file');
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
 
 # These are correct but empty files, so they should pass through.
 append_to_file "$pgdata/global/99999", "";
@@ -100,13 +100,59 @@ append_to_file "$pgdata/global/pgsql_tmp_123", "foo";
 mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
+# Enable checksums.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+	   "checksums successfully enabled in cluster");
+
+# Successive attempt to enable checksums fails.
+command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+	      "enabling checksums fails if already enabled");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+	     'checksums enabled in control file');
+
+# Disable checksums again.
+command_ok(['pg_checksums', '--disable', '-D', $pgdata],
+	   "checksums successfully disabled in cluster");
+
+# Successive attempt to disable checksums fails.
+command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+	      "disabling checksums fails if already disabled");
+
+# Control file should know that checksums are disabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*0/,
+		 'checksums disabled in control file');
+
+# Enable checksums again for follow-up tests.
+command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+		   "checksums successfully enabled in cluster");
+
+# Control file should know that checksums are enabled.
+command_like(['pg_controldata', $pgdata],
+	     qr/Data page checksum version:.*1/,
+		 'checksums enabled in control file');
+
 # Checksums pass on a newly-created cluster
-command_ok(['pg_checksums',  '-D', $pgdata],
+command_ok(['pg_checksums', '--check', '-D', $pgdata],
 		   "succeeds with offline cluster");
 
+# Checksums are verified if no other arguments are specified
+command_ok(['pg_checksums', '-D', $pgdata],
+		   "verifies checksums as default action");
+
+# Specific relation files cannot be requested when action is --disable
+# or --enable.
+command_fails(['pg_checksums', '--disable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --disable");
+command_fails(['pg_checksums', '--enable', '-r', '1234', '-D', $pgdata],
+	      "fails when relfilenodes are requested and action is --enable");
+
 # Checks cannot happen with an online cluster
 $node->start;
-command_fails(['pg_checksums',  '-D', $pgdata],
+command_fails(['pg_checksums', '--check', '-D', $pgdata],
 			  "fails with online cluster");
 
 # Check corruption of table on default tablespace.
@@ -133,7 +179,7 @@ sub fail_corrupt
 	my $file_name = "$pgdata/global/$file";
 	append_to_file $file_name, "foo";
 
-	$node->command_checks_all([ 'pg_checksums', '-D', $pgdata],
+	$node->command_checks_all([ 'pg_checksums', '--check', '-D', $pgdata],
 						  1,
 						  [qr/^$/],
 						  [qr/could not read block 0 in file.*$file\":/],
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b301bce4b1..195b146974 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1710,6 +1710,7 @@ PgBenchExprType
 PgBenchFunction
 PgBenchValue
 PgBenchValueType
+PgChecksumMode
 PgFdwAnalyzeState
 PgFdwDirectModifyState
 PgFdwModifyState
-- 
2.20.1

v9-0002-Add-option-N-no-sync-to-pg_checksums.patchtext/x-diff; charset=us-asciiDownload
From cee5e9f3ed9e190569a99548545801564b087fec Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Fri, 22 Mar 2019 09:24:14 +0900
Subject: [PATCH v9 2/2] Add option -N/--no-sync to pg_checksums

This is an option consistent with what pg_dump, pg_rewind and
pg_basebackup provide which is useful for leveraging the I/O effort when
testing things, not to be used in a production environment.

Author: Michael Paquier
Reviewed-by: Fabien Coelho
Discussion: https://postgr.es/m/20181221201616.GD4974@nighthawk.caipicrew.dd-dns.de
---
 doc/src/sgml/ref/pg_checksums.sgml    | 16 ++++++++++++++++
 src/bin/pg_checksums/pg_checksums.c   | 17 +++++++++++++----
 src/bin/pg_checksums/t/002_actions.pl | 10 +++++-----
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index fda85e7ea0..ce761cc662 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-N</option></term>
+      <term><option>--no-sync</option></term>
+      <listitem>
+       <para>
+        By default, <command>pg_checksums</command> will wait for all files
+        to be written safely to disk.  This option causes
+        <command>pg_checksums</command> to return without waiting, which is
+        faster, but means that a subsequent operating system crash can leave
+        the updated data folder corrupt.  Generally, this option is useful
+        for testing but should not be used on a production installation.
+        This option has no effect when using <literal>--check</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index f640fda14b..5265a30d97 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -35,6 +35,7 @@ static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
+static bool do_sync = true;
 static bool verbose = false;
 
 typedef enum
@@ -69,6 +70,7 @@ usage(void)
 	printf(_("  -c, --check            check data checksums (default)\n"));
 	printf(_("  -d, --disable          disable data checksums\n"));
 	printf(_("  -e, --enable           enable data checksums\n"));
+	printf(_("  -N, --no-sync          do not wait for changes to be written safely to disk\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -r RELFILENODE         check only relation with specified relfilenode\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
@@ -297,6 +299,7 @@ main(int argc, char *argv[])
 		{"pgdata", required_argument, NULL, 'D'},
 		{"disable", no_argument, NULL, 'd'},
 		{"enable", no_argument, NULL, 'e'},
+		{"no-sync", no_argument, NULL, 'N'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -324,7 +327,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:der:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNr:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -337,6 +340,9 @@ main(int argc, char *argv[])
 			case 'e':
 				mode = PG_MODE_ENABLE;
 				break;
+			case 'N':
+				do_sync = false;
+				break;
 			case 'v':
 				verbose = true;
 				break;
@@ -472,11 +478,14 @@ main(int argc, char *argv[])
 		ControlFile->data_checksum_version =
 			(mode == PG_MODE_ENABLE) ? PG_DATA_CHECKSUM_VERSION : 0;
 
-		printf(_("Syncing data directory\n"));
-		fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		if (do_sync)
+		{
+			printf(_("Syncing data directory\n"));
+			fsync_pgdata(DataDir, progname, PG_VERSION_NUM);
+		}
 
 		printf(_("Updating control file\n"));
-		update_controlfile(DataDir, progname, ControlFile, true);
+		update_controlfile(DataDir, progname, ControlFile, do_sync);
 
 		if (verbose)
 			printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 3ab18a6b89..41575c5245 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -101,11 +101,11 @@ mkdir "$pgdata/global/pgsql_tmp";
 append_to_file "$pgdata/global/pgsql_tmp/1.1", "foo";
 
 # Enable checksums.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	   "checksums successfully enabled in cluster");
 
 # Successive attempt to enable checksums fails.
-command_fails(['pg_checksums', '--enable', '-D', $pgdata],
+command_fails(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 	      "enabling checksums fails if already enabled");
 
 # Control file should know that checksums are enabled.
@@ -113,12 +113,12 @@ command_like(['pg_controldata', $pgdata],
 	     qr/Data page checksum version:.*1/,
 	     'checksums enabled in control file');
 
-# Disable checksums again.
+# Disable checksums again.  Flush result here as that should be cheap.
 command_ok(['pg_checksums', '--disable', '-D', $pgdata],
 	   "checksums successfully disabled in cluster");
 
 # Successive attempt to disable checksums fails.
-command_fails(['pg_checksums', '--disable', '-D', $pgdata],
+command_fails(['pg_checksums', '--disable', '--no-sync', '-D', $pgdata],
 	      "disabling checksums fails if already disabled");
 
 # Control file should know that checksums are disabled.
@@ -127,7 +127,7 @@ command_like(['pg_controldata', $pgdata],
 		 'checksums disabled in control file');
 
 # Enable checksums again for follow-up tests.
-command_ok(['pg_checksums', '--enable', '-D', $pgdata],
+command_ok(['pg_checksums', '--enable', '--no-sync', '-D', $pgdata],
 		   "checksums successfully enabled in cluster");
 
 # Control file should know that checksums are enabled.
-- 
2.20.1

#134Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#133)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 22.03.2019, 09:27 +0900 schrieb Michael Paquier:

I have added in the docs a warning about a host crash while doing the
operation, with a recommendation to check the state of the checksums
on the data folder should it happen, and the previous portion of the
docs about clusters. Your suggestion sounds adapted. I would be
tempted to add a bigger warning in pg_rewind or pg_basebackup about
that, but that's a different story for another time.

Does that look fine to you?

Don't we need a big warning that the cluster must not be started during
operation of pg_checksums as well, now that we don't disallow it?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#135Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#134)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:

Don't we need a big warning that the cluster must not be started during
operation of pg_checksums as well, now that we don't disallow it?

The same applies to pg_rewind and pg_basebackup, so I would classify
that as a pilot error. How would you formulate that in the docs if
you add it.
--
Michael

#136Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#135)
Re: Offline enabling/disabling of data checksums

Hi,

Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier:

On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote:

Don't we need a big warning that the cluster must not be started during
operation of pg_checksums as well, now that we don't disallow it?

The same applies to pg_rewind and pg_basebackup, so I would classify
that as a pilot error.

How would it apply to pg_basebackup? The cluster is running while the
base backup is taken and I believe the control file is written at the
end so you can't start another instance off the backup directory until
the base backup has finished.

It would apply to pg_rewind, but pg_rewind's runtime is not scaling with
cluster size, does it? pg_checksums will run for hours on large clusters
so the window of errors is much larger and I don't think you can easily
compare the two.

How would you formulate that in the docs if you add it.

(I would try to make sure you can't start the cluster but that seems off
the table for now)

How about this:

+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When enabling checksums in a cluster, the operation can potentially take a
+   long time if the data directory is large.  During this operation, the 
+   cluster or other programs that write to the data directory must not be 
+   started or else data-loss will occur.
+  </para>
+
+  <para>
+   When disabling or enabling checksums in a cluster of multiple instances,
[...]

Also, the following is not very clear to me:

+   If the event of a crash of the operating system while enabling or

s/If/In/

+   disabling checksums, the data folder may have checksums in an inconsistent
+   state, in which case it is recommended to check the state of checksums
+   in the data folder.

How is the user supposed to check the state of checksums? Do you mean
that if the user intended to enable checksums and the box dies in
between, they should check whether checksums are actually enabled and
re-run if not? Because it could also mean running pg_checksums --check
on the cluster, which wouldn't work in that case as the control file has
not been updated yet.

Maybe it could be formulated like "If pg_checksums is aborted or killed
in its operation while enabling or disabling checksums, the cluster
will have the same state with respect of checksums as before the
operation and pg_checksums needs to be restarted."?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#137Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#136)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 22, 2019 at 10:04:02AM +0100, Michael Banck wrote:

How about this:

+ <refsect1>
+  <title>Notes</title>
+  <para>
+   When enabling checksums in a cluster, the operation can potentially take a
+   long time if the data directory is large.  During this operation, the 
+   cluster or other programs that write to the data directory must not be 
+   started or else data-loss will occur.
+  </para>

Sounds fine to me. Will add an extra paragraph on that.

Maybe it could be formulated like "If pg_checksums is aborted or killed
in its operation while enabling or disabling checksums, the cluster
will have the same state with respect of checksums as before the
operation and pg_checksums needs to be restarted."?

We could use that as well. With the current patch, and per the
suggestions from Fabien, this holds true as the control file is
updated and flushed last.
--
Michael

#138Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#133)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

Does that look fine to you?

Mostly.

Patch v9 part 1 applies cleanly, compiles, global and local check ok, doc
build ok.

On write(), the error message is not translatable whereas it is for all
others.

I agree that a BIG STRONG warning is needed about not to start the cluster
under pain of possible data corruption. I still think that preventing this
is desirable, preferably before v12.

Otherwise, my remaining non showstopper (committer's opinion matters more)
issues:

Doc: A postgres cluster is like an Oracle instance. I'd use "replicated
setup" instead of "cluster", and "cluster" instead of "instance. I'll try
to improve the text, possibly over the week-end.

Enabling/disabling an already enabled/disabled cluster should not be a
failure for me, because the cluster is left under the prescribed state.

Patch v9 part 2 applies cleanly, compiles, global and local check ok, doc
build ok.

Ok for me.

--
Fabien.

#139Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#133)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Done the switch for this case. For pg_rewind actually I think that this
is an area where its logic could be improved a bit. So first the data
folder is synced, and then the control file is updated.

Attached is a quick patch about "pg_rewind", so that the control file is
updated after everything else is committed to disk.

--
Fabien.

Attachments:

rewind-fsync-1.patchtext/x-diff; name=rewind-fsync-1.patchDownload
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 3dcadb9b40..c1e6d7cd07 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -351,9 +351,12 @@ main(int argc, char **argv)
 
 	progress_report(true);
 
-	pg_log(PG_PROGRESS, "\ncreating backup label and updating control file\n");
+	pg_log(PG_PROGRESS, "\ncreating backup label\n");
 	createBackupLabel(chkptredo, chkpttli, chkptrec);
 
+	pg_log(PG_PROGRESS, "syncing target data directory\n");
+	syncTargetDirectory();
+
 	/*
 	 * Update control file of target. Make it ready to perform archive
 	 * recovery when restarting.
@@ -362,6 +365,7 @@ main(int argc, char **argv)
 	 * source server. Like in an online backup, it's important that we recover
 	 * all the WAL that was generated while we copied the files over.
 	 */
+	pg_log(PG_PROGRESS, "updating control file\n");
 	memcpy(&ControlFile_new, &ControlFile_source, sizeof(ControlFileData));
 
 	if (connstr_source)
@@ -377,11 +381,9 @@ main(int argc, char **argv)
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
+
 	update_controlfile(datadir_target, progname, &ControlFile_new, do_sync);
 
-	pg_log(PG_PROGRESS, "syncing target data directory\n");
-	syncTargetDirectory();
-
 	printf(_("Done!\n"));
 
 	return 0;
#140Christoph Berg
myon@debian.org
In reply to: Fabien COELHO (#139)
Re: Offline enabling/disabling of data checksums

Re: Fabien COELHO 2019-03-22 <alpine.DEB.2.21.1903221514390.2198@lancre>

Attached is a quick patch about "pg_rewind", so that the control file is
updated after everything else is committed to disk.

update_controlfile(datadir_target, progname, &ControlFile_new, do_sync);

- pg_log(PG_PROGRESS, "syncing target data directory\n");
- syncTargetDirectory();

Doesn't the control file still need syncing?

Christoph

#141Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Christoph Berg (#140)
Re: Offline enabling/disabling of data checksums

Hello Christoph,

- pg_log(PG_PROGRESS, "syncing target data directory\n");
- syncTargetDirectory();

Doesn't the control file still need syncing?

Indeed it does, and it is done in update_controlfile if the last argument
is true. Basically update_controlfile latest version always fsync the
control file, unless explicitely told not to do so. The options to do that
are really there only to speed up non regression tests.

--
Fabien.

#142Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#138)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 22, 2019 at 02:59:31PM +0100, Fabien COELHO wrote:

On write(), the error message is not translatable whereas it is for all
others.

Fixed.

I agree that a BIG STRONG warning is needed about not to start the cluster
under pain of possible data corruption. I still think that preventing this
is desirable, preferably before v12.

For now the docs mention that in a paragraph as Michael Banck has
suggested. Not sure that this deserves a warning portion.

Otherwise, my remaining non showstopper (committer's opinion matters more)
issues:

Doc: A postgres cluster is like an Oracle instance. I'd use "replicated
setup" instead of "cluster", and "cluster" instead of "instance. I'll try to
improve the text, possibly over the week-end.

Right. I have reworded that using your suggestions.

And committed the main part. I'll look after the --no-sync part in a
bit.
--
Michael

#143Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#141)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 22, 2019 at 07:02:36PM +0100, Fabien COELHO wrote:

Indeed it does, and it is done in update_controlfile if the last argument is
true. Basically update_controlfile latest version always fsync the control
file, unless explicitely told not to do so. The options to do that are
really there only to speed up non regression tests.

For the control file, it would not really matter much, and the cost
would be really coming from syncing the data directory, still for
correctness it is better to have a full all-or-nothing switch. Small
buildfarm machines also like the --no-sync flavors a lot.
--
Michael

#144Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#142)
Re: Offline enabling/disabling of data checksums

On Sat, Mar 23, 2019 at 08:16:07AM +0900, Michael Paquier wrote:

And committed the main part. I'll look after the --no-sync part in a
bit.

--no-sync is committed as well now.
--
Michael

#145Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#139)
Re: Offline enabling/disabling of data checksums

On Fri, Mar 22, 2019 at 03:18:26PM +0100, Fabien COELHO wrote:

Attached is a quick patch about "pg_rewind", so that the control file is
updated after everything else is committed to disk.

Could you start a new thread about that please? This one has already
been used for too many things.
--
Michael

#146Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Michael Paquier (#142)
1 attachment(s)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

Here is an attempt at improving the Notes.

Mostly it is a reordering from more important (cluster corruption) to less
important (if interrupted a restart is needed), some reordering from
problem to solutions instead of solution/problem/solution, some sentence
simplification.

--
Fabien.

Attachments:

checksums-doc-1.patchtext/x-diff; name=checksums-doc-1.patchDownload
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1f4d4ab8b4..869d742aae 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -179,29 +179,27 @@ PostgreSQL documentation
  <refsect1>
   <title>Notes</title>
   <para>
-   When disabling or enabling checksums in a replication setup of multiple
-   clusters, it is recommended to stop all the clusters before doing
-   the switch to all the clusters consistently. When using a replication
-   setup with tools which perform direct copies of relation file blocks
-   (for example <xref linkend="app-pgrewind"/>), enabling or disabling
-   checksums can lead to page corruptions in the shape of incorrect
-   checksums if the operation is not done consistently across all nodes.
-   Destroying all the standbys in the setup first, enabling or disabling
-   checksums on the primary and finally recreating the standbys from
-   scratch is also safe.
+   Enabling checksums in a large cluster can potentially take a long time.
+   During this operation, the cluster or other programs that write to the
+   data directory must not be started or else data loss may occur.
   </para>
   <para>
-   If <application>pg_checksums</application> is aborted or killed in
-   its operation while enabling or disabling checksums, the cluster
-   will have the same state with respect of checksums as before the
-   operation and <application>pg_checksums</application> needs to be
-   restarted.
+   When using a replication setup with tools which perform direct copies
+   of relation file blocks (for example <xref linkend="app-pgrewind"/>),
+   enabling or disabling checksums can lead to page corruptions in the
+   shape of incorrect checksums if the operation is not done consistently
+   across all nodes.
+   For enabling or disabling checksums in a replication setup,
+   it is thus recommended to stop all the clusters before switching
+   them all consistently.
+   Destroying all standbys, performing the operation on the primary and
+   finally recreating the standbys from scratch is also safe.
   </para>
   <para>
-   When enabling checksums in a cluster, the operation can potentially
-   take a long time if the data directory is large. During this operation,
-   the cluster or other programs that write to the data directory must not
-   be started or else data loss may occur.
-   </para>
+   If <application>pg_checksums</application> is aborted or killed
+   while enabling or disabling checksums, the cluster will have the
+   same checksum status as before the operation and
+   <application>pg_checksums</application> needs to be restarted.
+  </para>
  </refsect1>
 </refentry>
#147Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#146)
Re: Offline enabling/disabling of data checksums

On Sat, Mar 23, 2019 at 02:14:02PM +0100, Fabien COELHO wrote:

Here is an attempt at improving the Notes.

Mostly it is a reordering from more important (cluster corruption) to less
important (if interrupted a restart is needed), some reordering from problem
to solutions instead of solution/problem/solution, some sentence
simplification.

So, the ordering of the notes for each paragraph is as follows:
1) Replication issues when mixing different checksum setups across
nodes.
2) Consistency of the operations if killed.
3) Don't start Postgres while the operation runs.

Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2). Do others have any opinion? I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations. This part makes sense.

I am not sure that "checksum status" is a correct term. It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.
--
Michael

#148Fabien COELHO
fabien.coelho@mines-paristech.fr
In reply to: Michael Paquier (#147)
Re: Offline enabling/disabling of data checksums

Bonjour Michaᅵl,

Here is an attempt at improving the Notes. [...]

So, the ordering of the notes for each paragraph is as follows: 1)
Replication issues when mixing different checksum setups across nodes.
2) Consistency of the operations if killed. 3) Don't start Postgres
while the operation runs.

Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2).

Yes. I suggest to emphasize cluster corruption risks by putting them
first.

Do others have any opinion? I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations.

Yep.

This part makes sense.

Yep!

I am not sure that "checksum status" is a correct term. It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.

Possibly, I cannot say.

--
Fabien.

#149Michael Paquier
michael@paquier.xyz
In reply to: Fabien COELHO (#148)
Re: Offline enabling/disabling of data checksums

On Tue, Mar 26, 2019 at 01:41:38PM +0100, Fabien COELHO wrote:

I am not sure that "checksum status" is a correct term. It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.

Possibly, I cannot say.

I have put more thoughts into this part, and committed the
reorganization as you mainly suggested.
--
Michael