Online enabling of checksums

Started by Magnus Haganderalmost 8 years ago198 messages

magnus@hagander.net

almost 8 years ago

1 attachment(s)

*Once more, here is an attempt to solve the problem of on-line enabling of
checksums that me and Daniel have been hacking on for a bit. See for
example
/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
</messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com>
and
/messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
</messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
for some previous discussions.Base design:Change the checksum flag to
instead of on and off be an enum. off/inprogress/on. When checksums are off
and on, they work like today. When checksums are in progress, checksums are
*written* but not verified. State can go from “off” to “inprogress”, from
“inprogress” to either “on” or “off”, or from “on” to “off”.Two new
functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes to
disable. The enable one will change the state to inprogress, and then start
a background worker (the “checksumhelper launcher”). This worker in turn
will start one sub-worker (“checksumhelper worker”) in each database
(currently all done sequentially). This worker will enumerate all
tables/indexes/etc in the database and validate their checksums. If there
is no checksum, or the checksum is incorrect, it will compute a new
checksum and write it out. When all databases have been processed, the
checksum state changes to “on” and the launcher shuts down. At this point,
the cluster has checksums enabled as if it was initdb’d with checksums
turned on.If the cluster shuts down while “inprogress”, the DBA will have
to manually either restart the worker (by calling pg_enable_checksums()) or
turn checksums off again. Checksums “in progress” only carries a cost and
no benefit.The change of the checksum state is WAL logged with a new xlog
record. All the buffers written by the background worker are forcibly
enabled full page writes to make sure the checksum is fully updated on the
standby even if no actual contents of the buffer changed.We’ve also
included a small commandline tool, bin/pg_verify_checksums, that can be run
against an offline cluster to validate all checksums. Future improvements
includes being able to use the background worker/launcher to perform an
online check as well. Being able to run more parallel workers in the
checksumhelper might also be of interest.The patch includes two sets of
tests, an isolation test turning on checksums while one session is writing
to the cluster and another is continuously reading, to simulate turning on
checksums in a production database. There is also a TAP test which enables
checksums with streaming replication turned on to test the new xlog record.
The isolation test ran into the 1024 character limit of the isolation test
lexer, with a separate patch and discussion at
/messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se
</messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se>*

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums.patchtext/x-patch; charset=US-ASCII; name=online_checksums.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4c998fe51f..dc05ac3e55 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8537,7 +8537,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..8000ce89df 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19412,6 +19412,64 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Initiates data checksums for the cluster. This will switch the data checksums mode
+        to <literal>in progress</literal> and start a background worker that will process
+        all data in the database and enable checksums for it. When all data pages have had
+        checksums enabled, the cluster will automatically switch to checksums
+        <literal>on</literal>.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..076de243a0
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-o <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..4c9c0ca631 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7769,6 +7839,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker."),
+				 errhint("either disable or enable checksums by calling the pg_data_checksums_enable()/disable() functions")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9522,6 +9602,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..9b3ba3fb74 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,36 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher())
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..44535f9976
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,622 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(void)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);
+
+		/*
+		 * If checksum was not set or was invalid, mark the buffer as dirty
+		 * and force a full page write. If the checksum was already valid, we
+		 * can leave it since we know that any other process writing the
+		 * buffer will update the checksum.
+		 */
+		if (checksum != pagehdr->pd_checksum)
+		{
+			START_CRIT_SECTION();
+			MarkBufferDirty(buf);
+			log_newpage_buffer(buf, false);
+			END_CRIT_SECTION();
+		}
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in %s", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in %s completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case with.
+		 * Any database that still exists but failed we retry for a limited
+		 * number of times before giving up. Any database that remains in
+		 * failed state after that will fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in %s, giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database %s dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database %s does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1db7845d5a..039b63bb05 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -419,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1665,17 +1679,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -3955,6 +3958,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10203,6 +10217,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 }
 
 static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
+static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
 #ifndef USE_SSL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..9f5a5848ee
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..0df59df861
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,293 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_oid = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -o oid         check only relation with specified oid\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			if (only_oid)
+			{
+				if (strcmp(only_oid, de->d_name) == 0 ||
+					(strncmp(only_oid, de->d_name, strlen(only_oid)) == 0 &&
+					 strlen(de->d_name) > strlen(only_oid) &&
+					 de->d_name[strlen(only_oid)] == '_')
+					)
+				{
+					/* Either it's the same oid, or it's a relation fork of it */
+					scan_file(fn);
+				}
+			}
+			else
+				scan_file(fn);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fo:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'o':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid oid: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_oid = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2a5321315a..4ffc9aa07f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5544,6 +5544,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..13b6eaf13e
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(void);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index 73abf163f1..f30f4eb3a6 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Magnus Hagander (#1)

1 attachment(s)

Re: Online enabling of checksums

Re-sending this one with proper formatting. Apologies for the horrible
gmail-screws-up-the-text-part of the last one!

No change to patch or text, just the formatting.

//Magnus

Once more, here is an attempt to solve the problem of on-line enabling of
checksums that me and Daniel have been hacking on for a bit. See for
example
/messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
and
/messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com
for some previous discussions.

Base design:

Change the checksum flag to instead of on and off be an enum.
off/inprogress/on. When checksums are off and on, they work like today.
When checksums are in progress, checksums are *written* but not verified.
State can go from “off” to “inprogress”, from “inprogress” to either “on”
or “off”, or from “on” to “off”.

Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes to
disable. The enable one will change the state to inprogress, and then start
a background worker (the “checksumhelper launcher”). This worker in turn
will start one sub-worker (“checksumhelper worker”) in each database
(currently all done sequentially). This worker will enumerate all
tables/indexes/etc in the database and validate their checksums. If there
is no checksum, or the checksum is incorrect, it will compute a new
checksum and write it out. When all databases have been processed, the
checksum state changes to “on” and the launcher shuts down. At this point,
the cluster has checksums enabled as if it was initdb’d with checksums
turned on.

If the cluster shuts down while “inprogress”, the DBA will have to manually
either restart the worker (by calling pg_enable_checksums()) or turn
checksums off again. Checksums “in progress” only carries a cost and no
benefit.

The change of the checksum state is WAL logged with a new xlog record. All
the buffers written by the background worker are forcibly enabled full page
writes to make sure the checksum is fully updated on the standby even if no
actual contents of the buffer changed.

We’ve also included a small commandline tool, bin/pg_verify_checksums, that
can be run against an offline cluster to validate all checksums. Future
improvements includes being able to use the background worker/launcher to
perform an online check as well. Being able to run more parallel workers in
the checksumhelper might also be of interest.

The patch includes two sets of tests, an isolation test turning on
checksums while one session is writing to the cluster and another is
continuously reading, to simulate turning on checksums in a production
database. There is also a TAP test which enables checksums with streaming
replication turned on to test the new xlog record. The isolation test ran
into the 1024 character limit of the isolation test lexer, with a separate
patch and discussion at
/messages/by-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se

Attachments:

online_checksums.patchtext/x-patch; charset=US-ASCII; name=online_checksums.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4c998fe51f..dc05ac3e55 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8537,7 +8537,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..8000ce89df 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19412,6 +19412,64 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Initiates data checksums for the cluster. This will switch the data checksums mode
+        to <literal>in progress</literal> and start a background worker that will process
+        all data in the database and enable checksums for it. When all data pages have had
+        checksums enabled, the cluster will automatically switch to checksums
+        <literal>on</literal>.
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..076de243a0
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-o <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..4c9c0ca631 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7769,6 +7839,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker."),
+				 errhint("either disable or enable checksums by calling the pg_data_checksums_enable()/disable() functions")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9522,6 +9602,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..9b3ba3fb74 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,36 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher())
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..44535f9976
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,622 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(void)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);
+
+		/*
+		 * If checksum was not set or was invalid, mark the buffer as dirty
+		 * and force a full page write. If the checksum was already valid, we
+		 * can leave it since we know that any other process writing the
+		 * buffer will update the checksum.
+		 */
+		if (checksum != pagehdr->pd_checksum)
+		{
+			START_CRIT_SECTION();
+			MarkBufferDirty(buf);
+			log_newpage_buffer(buf, false);
+			END_CRIT_SECTION();
+		}
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in %s", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in %s completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case with.
+		 * Any database that still exists but failed we retry for a limited
+		 * number of times before giving up. Any database that remains in
+		 * failed state after that will fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in %s, giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database %s dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database %s does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1db7845d5a..039b63bb05 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -419,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1665,17 +1679,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -3955,6 +3958,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10203,6 +10217,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 }
 
 static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
+static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
 #ifndef USE_SSL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..9f5a5848ee
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..0df59df861
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,293 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_oid = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -o oid         check only relation with specified oid\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			if (only_oid)
+			{
+				if (strcmp(only_oid, de->d_name) == 0 ||
+					(strncmp(only_oid, de->d_name, strlen(only_oid)) == 0 &&
+					 strlen(de->d_name) > strlen(only_oid) &&
+					 de->d_name[strlen(only_oid)] == '_')
+					)
+				{
+					/* Either it's the same oid, or it's a relation fork of it */
+					scan_file(fn);
+				}
+			}
+			else
+				scan_file(fn);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fo:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'o':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid oid: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_oid = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2a5321315a..4ffc9aa07f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5544,6 +5544,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..13b6eaf13e
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(void);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index 73abf163f1..f30f4eb3a6 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#1)

Re: Online enabling of checksums

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Peter Eisentraut (#3)

Re: Online enabling of checksums

Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be handy....

So far I've found only this [0,1] (without code) and threads mentioned by Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?

Best regards, Andrey Borodin.

[0]: /messages/by-id/E2B195BF-7AA1-47AF-85BE-0E936D157902@endpoint.com </messages/by-id/E2B195BF-7AA1-47AF-85BE-0E936D157902@endpoint.com
[1]: /messages/by-id/7A00D9D1-535A-4C37-94C7-02296AAF063F@endpoint.com </messages/by-id/7A00D9D1-535A-4C37-94C7-02296AAF063F@endpoint.com
[2]: /messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com </messages/by-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp=-7OJWBbcg@mail.gmail.com
[3]: /messages/by-id/FF393672-5608-46D6-9224-6620EC532693@endpoint.com

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andrey Borodin (#4)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 4:47 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will
help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.
com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be
handy....

So far I've found only this [0,1] (without code) and threads mentioned by
Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?

I'm pretty sure Peter means the second. Which could be interesting, but as
he says, not the topic for this patch.

I'm not entirely sure which the others ones are. Auto-Vacuum obviously is
one, which doesn't use the worker infrastructure. But I'm not sure which
the others are referring to?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Peter Eisentraut (#3)

Re: Online enabling of checksums

On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Greetings,

Andres Freund

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#6)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:

On 2/21/18 15:53, Magnus Hagander wrote:

*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and

then

start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers. I wonder whether we can do something to make this
easier and less repetitive. Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case? That would be
very useful, but is probably a fairly non-trivial task (TM).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#7)

Re: Online enabling of checksums

Hi,

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.

That would be very useful, but is probably a fairly non-trivial task
(TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

Greetings,

Andres Freund

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#8)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:

On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de>

wrote:

I suspect I'm going to get some grief for this, but I think the time

has

come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P

Hah! Are you handing out binoculars? :)

Are you saying this should be done *in general*, or specifically for
background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.

Yeah, it'd probably be easier. I don't know exactly what it'd involve but
clearly less.

In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

That would be very useful, but is probably a fairly non-trivial task
(TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

I'll take your word for it :) I haven't dug into that part.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#10

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#9)

Re: Online enabling of checksums

On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
wrote:
In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of items stored in shared memory. Or even just check, I assume there'd be a catalog flag somewhere, whether a database / table / object of granularity has already been processed and use locking to prevent concurrent access.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#11

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#5)

Re: Online enabling of checksums

On 2/22/18 12:38, Magnus Hagander wrote:

I'm not entirely sure which the others ones are. Auto-Vacuum obviously
is one, which doesn't use the worker infrastructure. But I'm not sure
which the others are referring to?

autovacuum, subscription workers, auto prewarm

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#10)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 8:52 PM, Andres Freund <andres@anarazel.de> wrote:

On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net>
wrote:

On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
wrote:
In this particular case that would at least phase 1 simplify it because
we'd only need one process instead of worker/launcher. However, if we'd
ever want to parallellize it -- or any other process of the style, like
autovacuum -- you'd still need a launcher+worker combo. So making that
particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of
items stored in shared memory. Or even just check, I assume there'd be a
catalog flag somewhere, whether a database / table / object of granularity
has already been processed and use locking to prevent concurrent access.

You could do that, but then you've moving the complexity to managing that
list in shared memory instead. I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#13

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#12)

Re: Online enabling of checksums

Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:

You could do that, but then you've moving the complexity to managing that
list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.

I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

Regards,

Andres Freund

#14

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#13)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:23 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:

You could do that, but then you've moving the complexity to managing that
list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.

Yes, you probably need something like that if you want to be able to
parallelize on things inside each database. If you are OK parallelizing
things on a per-database level, you don't need it.

I'm not sure that's any easier... And
certainly adding a catalog flag for a usecase like this one is not making
it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had
it from the beginning.

Accepting that the process will start over (but only read, not re-write,
the blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#15

Michael Paquier

michael@paquier.xyz

almost 8 years ago

In reply to: Andres Freund (#6)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup. It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.
--
Michael

#16

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Michael Paquier (#15)

Re: Online enabling of checksums

How does:

On 2018-02-23 11:48:16 +0900, Michael Paquier wrote:

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup.

relate to:

It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.

#17

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Michael Paquier (#15)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:48 PM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

I'd like to see that. Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup. It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.

My position is that allowing processes to change databases is a good
idea but (1) it will probably take some work to get correct and (2) it
probably won't be super-fast due to the need to flush absolutely every
bit of state in sight that might've been influenced by the choice of
database.

I also agree with Andres that this email is not very easy to
understand, although my complaint is not so much that I don't see how
the parts relate as that you seem to be contradicting yourself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Peter Eisentraut (#11)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 9:09 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 2/22/18 12:38, Magnus Hagander wrote:

I'm not entirely sure which the others ones are. Auto-Vacuum obviously
is one, which doesn't use the worker infrastructure. But I'm not sure
which the others are referring to?

autovacuum, subscription workers, auto prewarm

Oh, for some reason I thought you were thinking in pending patches. Yeah,
for those it makes sense -- though autovacuum isn't (currently) using
background workers for what it does, the rest certainly makes sense to do
something with.

But as you say, that's a separate patch :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#19

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Magnus Hagander (#14)

Re: Online enabling of checksums

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had it
from the beginning.

Accepting that the process will start over (but only read, not re-write, the
blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Robert Haas (#19)

Re: Online enabling of checksums

On 02/24/2018 01:34 AM, Robert Haas wrote:

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had it
from the beginning.

Accepting that the process will start over (but only read, not re-write, the
blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#21

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Tomas Vondra (#20)

Re: Online enabling of checksums

Hi,

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

I think that'd incur a much larger complexity cost.

Greetings,

Andres Freund

#22

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Andres Freund (#21)

Re: Online enabling of checksums

On 02/24/2018 03:11 AM, Andres Freund wrote:

Hi,

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

Hmmm, right. I've been focused on tables and kinda forgot that the other
objects need to be transformed too ... :-/

But perhaps you meant something like "position" instead of just a simple
true/false flag?

I think that'd incur a much larger complexity cost.

Yep, that was part of the point that I was getting to - that actually
addressing the issue would be more expensive than simple flags. But as
you pointed out, that was not quite ... well thought through.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23

Stephen Frost

sfrost@snowman.net

almost 8 years ago

In reply to: Tomas Vondra (#22)

Re: Online enabling of checksums

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 02/24/2018 03:11 AM, Andres Freund wrote:

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

Hmmm, right. I've been focused on tables and kinda forgot that the other
objects need to be transformed too ... :-/

There's also something of a difference between just scanning a table or
index, where you don't have to do much in the way of actual writes
because most of the table already has valid checksums, and having to
actually write out all the changes.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

I think that'd incur a much larger complexity cost.

Yep, that was part of the point that I was getting to - that actually
addressing the issue would be more expensive than simple flags. But as
you pointed out, that was not quite ... well thought through.

No, but it's also not entirely wrong. Huge tables aren't uncommon.

That said, I'm not entirely convinced that these new flags would be as
unnoticed as is being suggested here, but rather than focus on either
side of that, I'm thinking about what we want to have *next*- we know
that enabling/disabling checksums is an issue that needs to be solved,
and this patch is making progress towards that, but the next question is
what does one do when a page has been detected as corrupted? Are there
flag fields which would be useful to have at a per-relation level to
support some kind of corrective action or setting that says "don't care
about checksums on this table, even though the entire database is
supposed to have valid checksums, but instead do X with failed pages" or
similar.

Beyond dealing with corruption-recovery cases, are there other use cases
for having a given table not have checksums?

Would it make sense to introduce a flag or field which indicates that an
entire table's pages has some set of attributes, of which 'checksums' is
just one attribute? Perhaps a page version, which potentially allows us
to have a way to change page layouts in the future?

I'm happy to be told that we simply don't have enough information at
this point to make anything larger than a relchecksums field-level
decision, but perhaps these thoughts will spark an idea about how we
could define something a bit broader with clear downstream usefulness
that happens to also cover the "does this relation have checksums?"
question.

Thanks!

Stephen

#24

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#1)

Re: Online enabling of checksums

Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#25

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#1)

Re: Online enabling of checksums

Hi,

On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:

We’ve also included a small commandline tool, bin/pg_verify_checksums,
that can be run against an offline cluster to validate all checksums.

The way it is coded in the patch will make pg_verify_checksums fail for
heap files with multiple segments, i.e. tables over 1 GB, becuase the
block number is consecutive and you start over from 0:

$ pgbench -i -s 80 -h /tmp
[...]
$ pg_verify_checksums -D data1
pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
in file 6D61, calculated 6D5F
pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
in file 7BE5, calculated 7BE7
[...]
Checksum scan completed
Data checksum version: 1
Files scanned: 943
Blocks scanned: 155925
Bad checksums: 76

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

#26

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Robert Haas (#19)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net>
wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that

had it

from the beginning.

Accepting that the process will start over (but only read, not re-write,

the

blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it

in

the form of entries in the catalogs. Since this is a on-time operation

(or

for many people, a zero-time operation), paying that cost that one time

is

probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

Is it really that invisible? Given how much we argue over adding single
counters to the stats system, I'm not sure it's quite that low.

We did consider doing it at a per-table basis as well. But this is also an
overhead that has to be paid forever, whereas the risk of having to read
the database files more than once (because it'd only have to read them on
the second pass, not write anything) is a one-off operation. And for all
those that have initialized with checksums in the first place don't have to
pay any overhead at all in the current design.

I very strongly doubg it's a "very noticeable operational problem". People
don't restart their databases very often... Let's say it takes 2-3 weeks to
complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the only
effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

The other point to it is that this keeps the code a lot simpler. That is
both good for having a chance at all to finish it and get it into 11 (and
it can then be improved upon to add for example incremental support in 12,
or something like that). And of course, simple code means less overhead in
the form of maintenance and effects on other parts of the system down the
road.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#27

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#24)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 4:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.

Ah yes. I actually have it on my TODO to work on that, but I forgot to put
that in the email I sent out. Apologies for that, and thanks for pointing
it out!

Right now you have to set the limit in the configuration file. That's of
course not the way we want to have it long term (but as long as it is that
way it should at least be documented). My plan is to either pick it up from
the current session that calls pg_enable_data_checksums(), or to simply
pass it down as parameters to the function directly. I'm thinking the
second one (pass a cost_delay and a cost_limit as optional parameters to
the function) is the best one because as you say actually overloading it on
the user visible GUCs seems a bit ugly. Once there I think the easiest is
to just pass it down to the workers through the shared memory segment.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#28

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Michael Banck (#25)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 6:14 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:

We’ve also included a small commandline tool, bin/pg_verify_checksums,
that can be run against an offline cluster to validate all checksums.

The way it is coded in the patch will make pg_verify_checksums fail for
heap files with multiple segments, i.e. tables over 1 GB, becuase the
block number is consecutive and you start over from 0:

$ pgbench -i -s 80 -h /tmp
[...]
$ pg_verify_checksums -D data1
pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
in file 6D61, calculated 6D5F
pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
in file 7BE5, calculated 7BE7
[...]
Checksum scan completed
Data checksum version: 1
Files scanned: 943
Blocks scanned: 155925
Bad checksums: 76

Yikes. I could've sworn I tested that, but it's pretty obvious I didn't, at
least not in this version. Thanks for the note, will fix and post a new
version!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#29

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#26)

Re: Online enabling of checksums

On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:

Is it really that invisible? Given how much we argue over adding single
counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?

We did consider doing it at a per-table basis as well. But this is also an
overhead that has to be paid forever, whereas the risk of having to read
the database files more than once (because it'd only have to read them on
the second pass, not write anything) is a one-off operation. And for all
those that have initialized with checksums in the first place don't have to
pay any overhead at all in the current design.

Why does it have to be paid forever?

I very strongly doubg it's a "very noticeable operational problem". People
don't restart their databases very often... Let's say it takes 2-3 weeks to
complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the only
effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

Greetings,

Andres Freund

#30

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#29)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:

Is it really that invisible? Given how much we argue over adding single
counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?

It's probably at least partially unrelated, you are right. I may have
misread our reluctance to add more values there as a general reluctancy to
add more values to central columns.

We did consider doing it at a per-table basis as well. But this is also

an

overhead that has to be paid forever, whereas the risk of having to read
the database files more than once (because it'd only have to read them on
the second pass, not write anything) is a one-off operation. And for all
those that have initialized with checksums in the first place don't have

to

pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not that
big an overhead given that there are already plenty of columns there. But
the point being you can never remove that column, and it will be there for
users who never even considered running without checksums. It's certainly
not a large overhead, but it's also not zero.

I very strongly doubg it's a "very noticeable operational problem". People

don't restart their databases very often... Let's say it takes 2-3 weeks

to

complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the

only

effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By
far.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#31

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#30)

Re: Online enabling of checksums

Hi,

On 2018-02-24 22:56:57 +0100, Magnus Hagander wrote:

On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:

We did consider doing it at a per-table basis as well. But this is also

an

overhead that has to be paid forever, whereas the risk of having to read
the database files more than once (because it'd only have to read them on
the second pass, not write anything) is a one-off operation. And for all
those that have initialized with checksums in the first place don't have

to

pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not that
big an overhead given that there are already plenty of columns there. But
the point being you can never remove that column, and it will be there for
users who never even considered running without checksums. It's certainly
not a large overhead, but it's also not zero.

But it can be removed in the next major version, if we decide it's a
good idea? We're not bound on compatibility for catalog layout.

FWIW' there's some padding space available where we currently could
store two booleans without any space overhead. Also, If we decide that
the boolean columns (which don't matter much in comparison to the rest
of the data, particularly relname), we can compress them into a
bitfield.

I don't think this is a valid reason for not supporting
interrupability. You can make fair arguments about adding incremental
support incrementally and whatnot, but the catalog size argument doesn't
seem part of a valid argument.

I very strongly doubg it's a "very noticeable operational problem". People

don't restart their databases very often... Let's say it takes 2-3 weeks

to

complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the

only

effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By
far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

Greetings,

Andres Freund

#32

Greg Stark

stark@mit.edu

almost 8 years ago

In reply to: Magnus Hagander (#1)

Re: Online enabling of checksums

The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.

Hm. That doesn't sound necessary to me. If you generate a checkpoint
(or just wait until a new checkpoint has started) then go through and
do a normal xlog record for every page (any xlog record, a noop record
even) then the normal logic for full page writes ought to be
sufficient. If the noop record doesn't need a full page write it's
because someone else has already come in and done one and that one
will set the checksum. In fact if any page has an lsn > the checkpoint
start lsn for the checkpoint after the flag was flipped then you
wouldn't need to issue any record at all.

#33

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Greg Stark (#32)

Re: Online enabling of checksums

On Sun, Feb 25, 2018 at 1:21 AM, Greg Stark <stark@mit.edu> wrote:

The change of the checksum state is WAL logged with a new xlog record.

All the buffers written by the background worker are forcibly enabled full
page writes to make sure the checksum is fully updated on the standby even
if no actual contents of the buffer changed.

Hm. That doesn't sound necessary to me. If you generate a checkpoint
(or just wait until a new checkpoint has started) then go through and
do a normal xlog record for every page (any xlog record, a noop record
even) then the normal logic for full page writes ought to be
sufficient. If the noop record doesn't need a full page write it's
because someone else has already come in and done one and that one
will set the checksum. In fact if any page has an lsn > the checkpoint
start lsn for the checkpoint after the flag was flipped then you
wouldn't need to issue any record at all.

What would be the actual benefit though? We'd have to invent a noop WAL
record, and just have some other part of the system do the full page write?
So why not just send the full page in the first place?

Also if that wasn't clear -- we only do the full page write if there isn't
already a checksum on the page and that checksum is correct.

(We do trigger a checkpoint at the end, and wait for it to complete)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#34

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#31)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 11:06 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-02-24 22:56:57 +0100, Magnus Hagander wrote:

On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de>

wrote:

We did consider doing it at a per-table basis as well. But this is

also

an

overhead that has to be paid forever, whereas the risk of having to

read

the database files more than once (because it'd only have to read

them on

the second pass, not write anything) is a one-off operation. And for

all

those that have initialized with checksums in the first place don't

have

to

pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not

that

big an overhead given that there are already plenty of columns there. But
the point being you can never remove that column, and it will be there

for

users who never even considered running without checksums. It's certainly
not a large overhead, but it's also not zero.

But it can be removed in the next major version, if we decide it's a
good idea? We're not bound on compatibility for catalog layout.

Sure.

But we can also *add* it in the next major version, if we decide it's a
good idea?

FWIW' there's some padding space available where we currently could

store two booleans without any space overhead. Also, If we decide that
the boolean columns (which don't matter much in comparison to the rest
of the data, particularly relname), we can compress them into a
bitfield.

I don't think this is a valid reason for not supporting
interrupability. You can make fair arguments about adding incremental
support incrementally and whatnot, but the catalog size argument doesn't
seem part of a valid argument.

Fair enough.

I very strongly doubg it's a "very noticeable operational problem".

People

don't restart their databases very often... Let's say it takes 2-3

weeks

to

complete a run in a fairly large database. How many such large

databases

actually restart that frequently? I'm not sure I know of any. And the

only

effect of it is you have to start the process over (but read-only for

the

part you have already done). It's certainly not ideal, but I don't

agree

it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By
far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

That I can understand. But in a scenario like that, you can also stop doing
that for the period of time when you're rebuilding checksums, if re-reading
the database over and over again is an issue.

Note, I'm not saying it wouldn't be nice to have the incremental
functionality. I'm just saying it's not needed in a first version.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#35

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Magnus Hagander (#27)

1 attachment(s)

Re: Online enabling of checksums

On Sat, Feb 24, 2018 at 10:48 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Sat, Feb 24, 2018 at 4:29 AM, Tomas Vondra <
tomas.vondra@2ndquadrant.com> wrote:

Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.

Ah yes. I actually have it on my TODO to work on that, but I forgot to put
that in the email I sent out. Apologies for that, and thanks for pointing
it out!

Right now you have to set the limit in the configuration file. That's of
course not the way we want to have it long term (but as long as it is that
way it should at least be documented). My plan is to either pick it up from
the current session that calls pg_enable_data_checksums(), or to simply
pass it down as parameters to the function directly. I'm thinking the
second one (pass a cost_delay and a cost_limit as optional parameters to
the function) is the best one because as you say actually overloading it on
the user visible GUCs seems a bit ugly. Once there I think the easiest is
to just pass it down to the workers through the shared memory segment.

PFA an updated patch that adds this, and also fixes the problem in
pg_verify_checksums spotted by Michael Banck.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums2.patchtext/x-patch; charset=US-ASCII; name=online_checksums2.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4c998fe51f..dc05ac3e55 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8537,7 +8537,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..a011ea1d8f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..076de243a0
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-o <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..4c9c0ca631 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7769,6 +7839,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker."),
+				 errhint("either disable or enable checksums by calling the pg_data_checksums_enable()/disable() functions")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9522,6 +9602,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..0d10fd4c89 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,45 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int cost_delay = PG_GETARG_INT32(0);
+	int cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5652e9ee6d..90e57874e7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1023,6 +1023,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..618c2d9257
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,630 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values  set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);
+
+		/*
+		 * If checksum was not set or was invalid, mark the buffer as dirty
+		 * and force a full page write. If the checksum was already valid, we
+		 * can leave it since we know that any other process writing the
+		 * buffer will update the checksum.
+		 */
+		if (checksum != pagehdr->pd_checksum)
+		{
+			START_CRIT_SECTION();
+			MarkBufferDirty(buf);
+			log_newpage_buffer(buf, false);
+			END_CRIT_SECTION();
+		}
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in %s", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in %s completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case with.
+		 * Any database that still exists but failed we retry for a limited
+		 * number of times before giving up. Any database that remains in
+		 * failed state after that will fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in %s, giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database %s dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database %s does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1db7845d5a..039b63bb05 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -419,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1665,17 +1679,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -3955,6 +3958,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10203,6 +10217,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 }
 
 static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
+static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
 #ifndef USE_SSL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..9f5a5848ee
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..c596e6fd95
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,308 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_oid = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -o oid         check only relation with specified oid\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char *forkpath, *segmentpath;
+			int segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number in order to
+			 * mix it into the checksum. Then also cut off at the fork boundary, to get
+			 * the oid (relfilenode) the file belongs to for filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename %s\n"), progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_oid && strcmp(only_oid, de->d_name) != 0)
+				/* Oid not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fo:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'o':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid oid: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_oid = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c00d055940..e3f05cf2a4 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5558,6 +5558,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 16 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..7f296264a9
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index 73abf163f1..f30f4eb3a6 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"
diff --git a/src/test/isolation/specscanner.l b/src/test/isolation/specscanner.l
index 481b32d1d7..7d371ebbca 100644
--- a/src/test/isolation/specscanner.l
+++ b/src/test/isolation/specscanner.l
@@ -12,7 +12,7 @@
 
 static int	yyline = 1;			/* line number for error reporting */
 
-static char litbuf[1024];
+static char litbuf[2048];
 static int litbufpos = 0;
 
 static void addlitchar(char c);

#36

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Stephen Frost (#23)

Re: Online enabling of checksums

On 02/24/2018 03:51 AM, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 02/24/2018 03:11 AM, Andres Freund wrote:

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

Hmmm, right. I've been focused on tables and kinda forgot that the other
objects need to be transformed too ... :-/

There's also something of a difference between just scanning a table or
index, where you don't have to do much in the way of actual writes
because most of the table already has valid checksums, and having to
actually write out all the changes.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

I think that'd incur a much larger complexity cost.

Yep, that was part of the point that I was getting to - that actually
addressing the issue would be more expensive than simple flags. But as
you pointed out, that was not quite ... well thought through.

No, but it's also not entirely wrong. Huge tables aren't uncommon.

That said, I'm not entirely convinced that these new flags would be
as unnoticed as is being suggested here, but rather than focus on
either side of that, I'm thinking about what we want to have *next*-
we know that enabling/disabling checksums is an issue that needs to
be solved, and this patch is making progress towards that, but the
next question is what does one do when a page has been detected as
corrupted? Are there flag fields which would be useful to have at a
per-relation level to support some kind of corrective action or
setting that says "don't care about checksums on this table, even
though the entire database is supposed to have valid checksums, but
instead do X with failed pages" or similar.

Those questions are definitely worth asking, and I agree our ability to
respond to data corruption (incorrect checksums) needs improvements. But
I don't really see how a single per-relation flag will make any that any
easier?

Perhaps there are other flags/fields that might help, like for example
the maximum number of checksum errors per relation (although I don't
consider that very useful in practice), but that seems rather unrelated
to this patch.

Beyond dealing with corruption-recovery cases, are there other use
cases for having a given table not have checksums?

Well, I see checksums are a way to detect data corruption caused by
storage, so if you have tablespaces backed by different storage systems,
you could disable checksums for objects on the storage you 100% trust.
That would limit the overhead of computing checksums.

But then again, this seems entirely unrelated to the patch discussed
here. That would obviously require flags in catalogs, and if the patch
eventually adds flags those would need to be separate.

Would it make sense to introduce a flag or field which indicates that
an entire table's pages has some set of attributes, of which
'checksums' is just one attribute? Perhaps a page version, which
potentially allows us to have a way to change page layouts in the
future?

I'm happy to be told that we simply don't have enough information at
this point to make anything larger than a relchecksums field-level
decision, but perhaps these thoughts will spark an idea about how we
could define something a bit broader with clear downstream
usefulness that happens to also cover the "does this relation have
checksums?" question.

I don't know. But I think we need to stop moving the goalposts further
and further away, otherwise we won't get anything until PostgreSQL 73.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#37

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#34)

Re: Online enabling of checksums

On 02/25/2018 03:57 PM, Magnus Hagander wrote:

...

I very strongly doubg it's a "very noticeable operational problem". People

don't restart their databases very often... Let's say it takes 2-3 weeks

to

complete a run in a fairly large database. How many such large databases
actually restart that frequently? I'm not sure I know of any. And the

only

effect of it is you have to start the process over (but read-only for the
part you have already done). It's certainly not ideal, but I don't agree
it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By
far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

That I can understand. But in a scenario like that, you can also stop
doing that for the period of time when you're rebuilding checksums, if
re-reading the database over and over again is an issue.

Note, I'm not saying it wouldn't be nice to have the incremental
functionality. I'm just saying it's not needed in a first version.

I agree with this sentiment. I don't think we can make each patch
perfect for everyone - certainly not in v1 :-/

Sure, it would be great to allow resume after a restart, but if that
means we won't get anything in PG 11 then I think it's not a good
service to our users. OTOH if the patch without a resume addresses the
issue for 99% of users, and we can improve it in PG 12, why not? That
seems exactly like the incremental thing we do for many other features.

So +1 to not make the "incremental resume" mandatory. If we can support
it, great! But I think the patch may seem less complex than it actually
is, and figuring out how the resume should work will take some time.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#38

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#26)

Re: Online enabling of checksums

On 02/24/2018 10:45 PM, Magnus Hagander wrote:

On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>> wrote:

I would prefer that yes. But having to re-read 9TB is still significantly
better than not being able to turn on checksums at all (state today). And
adding a catalog column for it will carry the cost of the migration
*forever*, both for clusters that never have checksums and those that had it
from the beginning.

Accepting that the process will start over (but only read, not re-write, the
blocks that have already been processed) in case of a crash does
significantly simplify the process, and reduce the long-term cost of it in
the form of entries in the catalogs. Since this is a on-time operation (or
for many people, a zero-time operation), paying that cost that one time is
probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way. I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature. The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation. You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

Is it really that invisible? Given how much we argue over adding
single counters to the stats system, I'm not sure it's quite that
low.

I'm a bit unsure where would the flags be stored - I initially assumed
pg_database/pg_class, but now I see mentions of the stats system.

But I wonder why should this be stored in a catalog at all? The info is
only needed by the bgworker(s), so they could easily flush the current
status to a file every now and then and fsync it. Then after restart, if
you find a valid file, use it to resume from the last OK position. If
not, start from scratch.

FWIW this is pretty much what the stats collector does.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#39

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Tomas Vondra (#38)

Re: Online enabling of checksums

On 26 Feb 2018, at 05:48, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 02/24/2018 10:45 PM, Magnus Hagander wrote:

Is it really that invisible? Given how much we argue over adding
single counters to the stats system, I'm not sure it's quite that
low.

I'm a bit unsure where would the flags be stored - I initially assumed
pg_database/pg_class, but now I see mentions of the stats system.

But I wonder why should this be stored in a catalog at all? The info is
only needed by the bgworker(s), so they could easily flush the current
status to a file every now and then and fsync it. Then after restart, if
you find a valid file, use it to resume from the last OK position. If
not, start from scratch.

Since this allows checksums to be turned off as well, storing a flag in the
catalog would mean a issuing a fairly wide update in that case to switch it to
False, which might not be ideal. Not that I expect (hope) turning checksums
off on a cluster will be a common operation, but thats a fairly large side
effect of doing so.

cheers ./daniel

#40

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Magnus Hagander (#35)

Re: Online enabling of checksums

Hi, Magus!

25 февр. 2018 г., в 21:17, Magnus Hagander <magnus@hagander.net> написал(а):

PFA an updated patch that adds this, and also fixes the problem in pg_verify_checksums spotted by Michael Banck.

I had to ALLOW_CONNECTIONS to template0 to make it work
postgres=# 2018-02-27 21:29:05.993 +05 [57259] ERROR: Database template0 does not allow connections.
Is it a problem with my installation or some regression in the patch?

2018-02-27 21:40:47.132 +05 [57402] HINT: either disable or enable checksums by calling the pg_data_checksums_enable()/disable() functions
Function names are wrong in this hint: pg_enable_data_checksums()

The code is nice and clear. One minor spot in the comment

This option can only _be_ enabled when data checksums are enabled.

Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for offline checksum enabling, without WAL-logging, surely.

Thanks for the patch!

Best regards, Andrey Borodin.

#41

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Andrey Borodin (#40)

1 attachment(s)

Re: Online enabling of checksums

On 28 Feb 2018, at 00:51, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

25 февр. 2018 г., в 21:17, Magnus Hagander <magnus@hagander.net> написал(а):

PFA an updated patch that adds this, and also fixes the problem in pg_verify_checksums spotted by Michael Banck.

I had to ALLOW_CONNECTIONS to template0 to make it work
postgres=# 2018-02-27 21:29:05.993 +05 [57259] ERROR: Database template0 does not allow connections.
Is it a problem with my installation or some regression in the patch?

This is due to a limitation that apply to bgworkers, in order to add checksums
the bgworker must connect to the database and template0 does not allow
connections by default. There is a discussion, and patch, for lifting this
restriction but until this makes it in (if it does), the user will have to
allow connections manually. For reference, the thread for allowing bypassing
allowconn in bgworker is here:

/messages/by-id/CABUevEwWT9ZmonBMRFF0owneoN3DAPgOVzwHAN0bUkxaqY3eNQ@mail.gmail.com

2018-02-27 21:40:47.132 +05 [57402] HINT: either disable or enable checksums by calling the pg_data_checksums_enable()/disable() functions
Function names are wrong in this hint: pg_enable_data_checksums()

Fixed. The format of this hint (and errmsg) is actually also incorrect, which
I fixed while in there.

The code is nice and clear. One minor spot in the comment

This option can only _be_ enabled when data checksums are enabled.

Fixed.

Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for offline checksum enabling, without WAL-logging, surely.

While outside the scope of the patch in question (since it deals with enabling
checksums online), such a utility should be perfectly possible to write.

Thanks for reviewing!

cheers ./daniel

Attachments:

online_checksums3.diffapplication/octet-stream; name=online_checksums3.diffDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 00fc364c0a..bf6f694640 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8541,7 +8541,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only be enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..a011ea1d8f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..076de243a0
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-o <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..56aaa88de1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7768,6 +7838,16 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9521,6 +9601,22 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..0d10fd4c89 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,45 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int cost_delay = PG_GETARG_INT32(0);
+	int cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5652e9ee6d..90e57874e7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1023,6 +1023,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..618c2d9257
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,630 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values  set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);
+
+		/*
+		 * If checksum was not set or was invalid, mark the buffer as dirty
+		 * and force a full page write. If the checksum was already valid, we
+		 * can leave it since we know that any other process writing the
+		 * buffer will update the checksum.
+		 */
+		if (checksum != pagehdr->pd_checksum)
+		{
+			START_CRIT_SECTION();
+			MarkBufferDirty(buf);
+			log_newpage_buffer(buf, false);
+			END_CRIT_SECTION();
+		}
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in %s", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in %s completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case with.
+		 * Any database that still exists but failed we retry for a limited
+		 * number of times before giving up. Any database that remains in
+		 * failed state after that will fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in %s, giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database %s dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database %s does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1db7845d5a..039b63bb05 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -418,6 +421,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1664,17 +1678,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -3955,6 +3958,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10202,6 +10216,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
 static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -590,6 +590,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..9f5a5848ee
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..c596e6fd95
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,308 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_oid = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -o oid         check only relation with specified oid\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char *forkpath, *segmentpath;
+			int segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number in order to
+			 * mix it into the checksum. Then also cut off at the fork boundary, to get
+			 * the oid (relfilenode) the file belongs to for filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename %s\n"), progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_oid && strcmp(only_oid, de->d_name) != 0)
+				/* Oid not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fo:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'o':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid oid: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_oid = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c00d055940..e3f05cf2a4 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5558,6 +5558,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 2 0 16 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..7f296264a9
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index 73abf163f1..f30f4eb3a6 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # We don't build or execute examples/, locale/, or thread/ by default,
 # but we do want "make clean" etc to recurse into them.  Likewise for
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"
diff --git a/src/test/isolation/specscanner.l b/src/test/isolation/specscanner.l
index 481b32d1d7..7d371ebbca 100644
--- a/src/test/isolation/specscanner.l
+++ b/src/test/isolation/specscanner.l
@@ -12,7 +12,7 @@
 
 static int	yyline = 1;			/* line number for error reporting */
 
-static char litbuf[1024];
+static char litbuf[2048];
 static int litbufpos = 0;
 
 static void addlitchar(char c);

#42

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Magnus Hagander (#33)

Re: Online enabling of checksums

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net> wrote:

Also if that wasn't clear -- we only do the full page write if there isn't
already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#43

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 8 years ago

In reply to: Magnus Hagander (#35)

Re: Online enabling of checksums

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name). I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#44

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Alvaro Herrera (#43)

Re: Online enabling of checksums

On 02/28/2018 08:42 PM, Alvaro Herrera wrote:

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name). I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

<varlistentry>
<term><option>-o <replaceable>relfilenode</replaceable></option></term>
<listitem>
<para>
Only validate checksums in the relation with specified relfilenode.
</para>
</listitem>
</varlistentry>

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#45

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Tomas Vondra (#44)

Re: Online enabling of checksums

On 01 Mar 2018, at 05:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 02/28/2018 08:42 PM, Alvaro Herrera wrote:

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name). I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

I'd argue this is merely a mistake in the --help text.

Agreed, the --help text isn’t really clear in this case and should be updated
to say something along the lines of:

printf(_(" -o relfilenode check only relation with specified relfilenode\n"));

cheers ./daniel

#46

Craig Ringer

craig@2ndquadrant.com

almost 8 years ago

In reply to: Alvaro Herrera (#43)

Re: Online enabling of checksums

On 1 March 2018 at 03:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name). I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

I see this mistake/misunderstanding enough that I'd quite like to change
how we generate relfilenode IDs, making them totally independent of the oid
space.

Unsure how practical it is, but it'd be so nice to get rid of that trap.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#47

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Robert Haas (#42)

Re: Online enabling of checksums

28 февр. 2018 г., в 22:06, Robert Haas <robertmhaas@gmail.com> написал(а):

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net> wrote:

Also if that wasn't clear -- we only do the full page write if there isn't
already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

This seems 100% valid concern. If pages can be binary different (on master and standby), we have to log act of checksumming a page.
And WAL replay would have to verify that his checksum is OK.
What is unclear to me is what standby should do if he sees incorrect checksum? Change his page? Request page from master? Shutdown?

Or should we just WAL-log page whenever it is checksummed by worker? Even if the checksum was correct?

Best regards, Andrey Borodin.

#48

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Daniel Gustafsson (#41)

Re: Online enabling of checksums

28 февр. 2018 г., в 6:22, Daniel Gustafsson <daniel@yesql.se> написал(а):

Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for offline checksum enabling, without WAL-logging, surely.

While outside the scope of the patch in question (since it deals with enabling
checksums online), such a utility should be perfectly possible to write.

I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner), patch works fine.
Do we provide backporting for such features?

Best regards, Andrey Borodin.

#49

Michael Paquier

michael@paquier.xyz

almost 8 years ago

In reply to: Andrey Borodin (#48)

Re: Online enabling of checksums

On Thu, Mar 01, 2018 at 12:56:35PM +0500, Andrey Borodin wrote:

I've tried to rebase this patch to 10 and, despite minor rebase issues
(oids, bgw_type, changes to specscanner), patch works fine. Do we
provide backporting for such features?

New features are not backported in upstream. Project policy is only to
address bugs to keep the branches already released stable a maximum.
--
Michael

#50

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Andrey Borodin (#48)

Re: Online enabling of checksums

On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:

I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner), patch works fine.
Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

- Andres

#51

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#50)

Re: Online enabling of checksums

On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:

On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:

I've tried to rebase this patch to 10 and, despite minor rebase issues

(oids, bgw_type, changes to specscanner), patch works fine.

Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

Yeah. And definitely not something that both changes the format of
pg_control (by adding new possible values to the checksum field) *and* adds
a new WAL record type...

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#52

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#51)

Re: Online enabling of checksums

On 2018-03-01 16:18:48 +0100, Magnus Hagander wrote:

On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:

On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:

I've tried to rebase this patch to 10 and, despite minor rebase issues

(oids, bgw_type, changes to specscanner), patch works fine.

Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

Yeah. And definitely not something that both changes the format of
pg_control (by adding new possible values to the checksum field) *and* adds
a new WAL record type...

And even more so, I'm not even sure it makes sense to try to get this
into v11. This is a medium-large complicated feature, submitted to the
last CF for v11. That's pretty late. Now, Magnus is a committer, but
nevertheless...

Greetings,

Andres Freund

#53

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#52)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 8:44 AM, Andres Freund <andres@anarazel.de> wrote:

On 2018-03-01 16:18:48 +0100, Magnus Hagander wrote:

On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de>

wrote:

On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:

I've tried to rebase this patch to 10 and, despite minor rebase

issues

(oids, bgw_type, changes to specscanner), patch works fine.

Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

Yeah. And definitely not something that both changes the format of
pg_control (by adding new possible values to the checksum field) *and*

adds

a new WAL record type...

And even more so, I'm not even sure it makes sense to try to get this
into v11. This is a medium-large complicated feature, submitted to the
last CF for v11. That's pretty late. Now, Magnus is a committer, but
nevertheless...

See, this is why I'm trying my hardest to avoid scope-creep in it at least
:P

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#54

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#44)

Re: Online enabling of checksums

On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:

On 02/28/2018 08:42 PM, Alvaro Herrera wrote:

I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name). I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

<varlistentry>
<term><option>-o <replaceable>relfilenode</replaceable></option></term>
<listitem>
<para>
Only validate checksums in the relation with specified relfilenode.
</para>
</listitem>
</varlistentry>

Yeah, that one is my fault. It used to say oid all over but I noticed and
fixed it. Except I clearly missed the --help.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#55

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Robert Haas (#42)

Re: Online enabling of checksums

On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Also if that wasn't clear -- we only do the full page write if there

isn't

already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

Do we ever make hintbit changes on the standby for example? If so, it would
definitely cause problems. I didn't realize we did, actually...

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page
write.
* Worker completes and flips checksums on which replicates. At this point,
if the replica reads the page, boom.

I guess we have to remove that optimisation. It's definitely a bummer, but
I don't think it's an absolute dealbreaker.

We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt that's worth
it?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#56

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 8 years ago

In reply to: Magnus Hagander (#54)

Re: Online enabling of checksums

Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

Yeah, that one is my fault. It used to say oid all over but I noticed and
fixed it. Except I clearly missed the --help.

Obviously option names are completely arbitrary -- you could say
"-P relfilenode" and it'd still be 'correct', since it works as
documented. But we try to make these options mnemotechnic when we can,
and I don't see any relation between "-o" and "relfilenode", so I
suggest picking some other letter. There's a whole alphabet out there.

Either "-r" or "-f" works better for me than "-o".

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#57

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Alvaro Herrera (#56)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 3:17 PM, Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <

tomas.vondra@2ndquadrant.com

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect.

Secondly,

the SGML docs actually say:

Yeah, that one is my fault. It used to say oid all over but I noticed and
fixed it. Except I clearly missed the --help.

Obviously option names are completely arbitrary -- you could say
"-P relfilenode" and it'd still be 'correct', since it works as
documented. But we try to make these options mnemotechnic when we can,
and I don't see any relation between "-o" and "relfilenode", so I
suggest picking some other letter. There's a whole alphabet out there.

Either "-r" or "-f" works better for me than "-o".

I have no problem with changing it to -r. -f seems a bit wrong to me, as it
might read as a file. And in the future we might want to implement the
ability to take full filename (with path), in which case it would make
sense to use -f for that.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#58

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#57)

Re: Online enabling of checksums

On 03/02/2018 03:22 PM, Magnus Hagander wrote:

On Fri, Mar 2, 2018 at 3:17 PM, Alvaro Herrera <alvherre@alvh.no-ip.org
<mailto:alvherre@alvh.no-ip.org>> wrote:

Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

Yeah, that one is my fault. It used to say oid all over but I noticed and
fixed it. Except I clearly missed the --help.

Obviously option names are completely arbitrary -- you could say
"-P relfilenode" and it'd still be 'correct', since it works as
documented. But we try to make these options mnemotechnic when we can,
and I don't see any relation between "-o" and "relfilenode", so I
suggest picking some other letter. There's a whole alphabet out there.

Either "-r" or "-f" works better for me than "-o".

I have no problem with changing it to -r. -f seems a bit wrong to me, as
it might read as a file. And in the future we might want to implement
the ability to take full filename (with path), in which case it would
make sense to use -f for that.

+1 to -r

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#59

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Magnus Hagander (#55)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 8:35 AM, Magnus Hagander <magnus@hagander.net> wrote:

Do we ever make hintbit changes on the standby for example? If so, it would
definitely cause problems. I didn't realize we did, actually...

We do not.

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does *not*
replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page
write.
* Worker completes and flips checksums on which replicates. At this point,
if the replica reads the page, boom.

Exactly.

I guess we have to remove that optimisation. It's definitely a bummer, but I
don't think it's an absolute dealbreaker.

I don't disagree.

We could say that we keep the optimisation if wal_level=minimal for example,
because then we know there is no replica. But I doubt that's worth it?

I don't have a strong feeling about this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#60

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Andres Freund (#52)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 2:44 AM, Andres Freund <andres@anarazel.de> wrote:

And even more so, I'm not even sure it makes sense to try to get this
into v11. This is a medium-large complicated feature, submitted to the
last CF for v11. That's pretty late. Now, Magnus is a committer, but
nevertheless...

Yeah, I would also favor bumping this one out to a later release. I
think there is a significant risk either that the design is flawed --
and as evidence, I offer that I found a flaw in it which I noticed
only because of a passing remark in an email, not because I opened the
patch -- or that the design boxes us into a corner such that it will
be hard to improve this later. I think that there are is a good
chance that there are other serious problems with this patch and to be
honest I don't really want to go try to find them right this minute; I
want to work on other patches that were submitted earlier and have
been waiting for a long time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#61

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#55)

Re: Online enabling of checksums

On 03/02/2018 02:35 PM, Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>> wrote:

Also if that wasn't clear -- we only do the full page write if there isn't
already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

Do we ever make hintbit changes on the standby for example? If so, it
would definitely cause problems. I didn't realize we did, actually...

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full
page write.
* Worker completes and flips checksums on which replicates. At this
point, if the replica reads the page, boom.

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

I guess we have to remove that optimisation. It's definitely a
bummer, but I don't think it's an absolute dealbreaker.

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt
that's worth it?

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#62

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#61)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/02/2018 02:35 PM, Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com
<mailto:robertmhaas@gmail.com>> wrote:

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>> wrote:

Also if that wasn't clear -- we only do the full page write if

there isn't

already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and

there

the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

Do we ever make hintbit changes on the standby for example? If so, it
would definitely cause problems. I didn't realize we did, actually...

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full
page write.
* Worker completes and flips checksums on which replicates. At this
point, if the replica reads the page, boom.

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

No, in that case the master will issue full page writes for *all* pages,
since they dind't hvae a checksum. The current patch only avoids doing that
if the checksum on the master is correct, which it isn't when you start
from checksums=off. So this particular problem only shows up if you
iterate between off/on/off multiple times.

I guess we have to remove that optimisation. It's definitely a
bummer, but I don't think it's an absolute dealbreaker.

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

Yes.

The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

This is very close to what it does now, except it does not wait for a
checkpoint in #2. Why does it need that?

BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

Actually, that was my original idea. I changed it for testing, and shuld go
change it back.

We could say that we keep the optimisation if wal_level=minimal for

example, because then we know there is no replica. But I doubt
that's worth it?

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.

Yeah, I doubt there are a lot of people using "minimal" these days, not
since we changed the default.

//Magnus

#63

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#62)

Re: Online enabling of checksums

On 03/02/2018 11:01 PM, Magnus Hagander wrote:

On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

On 03/02/2018 02:35 PM, Magnus Hagander wrote:

On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>
<mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>> wrote:

On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
<magnus@hagander.net <mailto:magnus@hagander.net>

<mailto:magnus@hagander.net <mailto:magnus@hagander.net>>> wrote:

> Also if that wasn't clear -- we only do the full page write if there isn't
> already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect. Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

Do we ever make hintbit changes on the standby for example? If so, it
would definitely cause problems. I didn't realize we did, actually...

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full
page write.
* Worker completes and flips checksums on which replicates. At this
point, if the replica reads the page, boom.

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

No, in that case the master will issue full page writes for *all* pages,
since they dind't hvae a checksum. The current patch only avoids doing
that if the checksum on the master is correct, which it isn't when you
start from checksums=off. So this particular problem only shows up if
you iterate between off/on/off multiple times.

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.

I guess we have to remove that optimisation. It's definitely a
bummer, but I don't think it's an absolute dealbreaker.

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

Yes.

The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

This is very close to what it does now, except it does not wait for a
checkpoint in #2. Why does it need that?

To guarantee that the page has a FPW with all the hint bits, before we
start messing with the checksums (or that setting the checksum itself
triggers a FPW).

BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

Actually, that was my original idea. I changed it for testing, and shuld
go change it back.

We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt
that's worth it?

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.

Yeah, I doubt there are a lot of people using "minimal" these days, not
since we changed the default.

Yeah. Although as I said, it depends on how much code would be needed to
enable that optimization (I guess not much). If someone is running with
wal_level=minimal intentionally, why not to help them.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#64

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Tomas Vondra (#63)

Re: Online enabling of checksums

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.

No, the problem is if - as is much more likely - the checksum is not
still valid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#65

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#64)

Re: Online enabling of checksums

On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.

No, the problem is if - as is much more likely - the checksum is not
still valid.

Hmm, on second thought ... maybe I didn't think this through carefully
enough. If the checksum matches on the master by chance, and the page
is the same on the standby, then we're fine, right? It's a weird
accident, but nothing is actually broken. The failure scenario is
where the standby has a version of the page with a bad checksum, but
the master has a good checksum. So for example: checksums disabled,
master modifies the page (which is replicated), master sets some hint
bits (coincidentally making the checksum match), now we try to turn
checksums on and don't re-replicate the page because the checksum
already looks correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#66

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Robert Haas (#65)

Re: Online enabling of checksums

On 03/03/2018 01:38 PM, Robert Haas wrote:

On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be
WAL-logged), enable checksums again and still get a valid
checksum even with the new hint bits? That's possible, albeit
unlikely.

No, the problem is if - as is much more likely - the checksum is
not still valid.

Hmm, on second thought ... maybe I didn't think this through
carefully enough. If the checksum matches on the master by chance,
and the page is the same on the standby, then we're fine, right? It's
a weird accident, but nothing is actually broken. The failure
scenario is where the standby has a version of the page with a bad
checksum, but the master has a good checksum. So for example:
checksums disabled, master modifies the page (which is replicated),
master sets some hint bits (coincidentally making the checksum
match), now we try to turn checksums on and don't re-replicate the
page because the checksum already looks correct.

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

/*
* If checksum was not set or was invalid, mark the buffer as dirty
* and force a full page write. If the checksum was already valid, we
* can leave it since we know that any other process writing the
* buffer will update the checksum.
*/
if (checksum != pagehdr->pd_checksum)
{
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();
}

That would mean this optimization - only doing the write when the
checksum does not match - is broken.

If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#67

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#66)

Re: Online enabling of checksums

On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/03/2018 01:38 PM, Robert Haas wrote:

On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be
WAL-logged), enable checksums again and still get a valid
checksum even with the new hint bits? That's possible, albeit
unlikely.

No, the problem is if - as is much more likely - the checksum is
not still valid.

Hmm, on second thought ... maybe I didn't think this through
carefully enough. If the checksum matches on the master by chance,
and the page is the same on the standby, then we're fine, right? It's
a weird accident, but nothing is actually broken. The failure
scenario is where the standby has a version of the page with a bad
checksum, but the master has a good checksum. So for example:
checksums disabled, master modifies the page (which is replicated),
master sets some hint bits (coincidentally making the checksum
match), now we try to turn checksums on and don't re-replicate the
page because the checksum already looks correct.

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

/*
* If checksum was not set or was invalid, mark the buffer as dirty
* and force a full page write. If the checksum was already valid, we
* can leave it since we know that any other process writing the
* buffer will update the checksum.
*/
if (checksum != pagehdr->pd_checksum)
{
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();
}

That would mean this optimization - only doing the write when the
checksum does not match - is broken.

Yes. I think that was the conclusion of this, as posted in
/messages/by-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu=aDpGBT8Sk0RQE_g@mail.gmail.com
:)

If that's the case, it probably makes restarts/resume more expensive,

because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.

Yes, it definitely does. It's not a dealbreaker, but it's certainly a bit
painful not to be able to resume as cheap.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#68

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#67)

Re: Online enabling of checksums

On 03/03/2018 05:08 PM, Magnus Hagander wrote:

On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

On 03/03/2018 01:38 PM, Robert Haas wrote:

On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be
WAL-logged), enable checksums again and still get a valid
checksum even with the new hint bits? That's possible, albeit
unlikely.

No, the problem is if - as is much more likely - the checksum is
not still valid.

Hmm, on second thought ... maybe I didn't think this through
carefully enough. If the checksum matches on the master by chance,
and the page is the same on the standby, then we're fine, right? It's
a weird accident, but nothing is actually broken. The failure
scenario is where the standby has a version of the page with a bad
checksum, but the master has a good checksum. So for example:
checksums disabled, master modifies the page (which is replicated),
master sets some hint bits (coincidentally making the checksum
match), now we try to turn checksums on and don't re-replicate the
page because the checksum already looks correct.

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

/*
* If checksum was not set or was invalid, mark the buffer as dirty
* and force a full page write. If the checksum was already valid, we
* can leave it since we know that any other process writing the
* buffer will update the checksum.
*/
if (checksum != pagehdr->pd_checksum)
{
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();
}

That would mean this optimization - only doing the write when the
checksum does not match - is broken.

Yes. I think that was the conclusion of this, as posted
in /messages/by-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu=aDpGBT8Sk0RQE_g@mail.gmail.com
:)

Oh, right. I did have a "deja vu" feeling, when writing that. Good that
I came to the same conclusion, though.

If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.

Yes, it definitely does. It's not a dealbreaker, but it's certainly
a bit painful not to be able to resume as cheap.

Yeah. It probably makes the more elaborate resuming more valuable, but I
still think it's not a "must have" for PG11.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#69

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#68)

1 attachment(s)

Re: Online enabling of checksums

On Sat, Mar 3, 2018 at 5:17 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/03/2018 05:08 PM, Magnus Hagander wrote:

On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

On 03/03/2018 01:38 PM, Robert Haas wrote:

On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com

<mailto:robertmhaas@gmail.com>> wrote:

On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@

2ndquadrant.com>>

wrote:

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be
WAL-logged), enable checksums again and still get a valid
checksum even with the new hint bits? That's possible, albeit
unlikely.

No, the problem is if - as is much more likely - the checksum is
not still valid.

Hmm, on second thought ... maybe I didn't think this through
carefully enough. If the checksum matches on the master by chance,
and the page is the same on the standby, then we're fine, right?

It's

a weird accident, but nothing is actually broken. The failure
scenario is where the standby has a version of the page with a bad
checksum, but the master has a good checksum. So for example:
checksums disabled, master modifies the page (which is replicated),
master sets some hint bits (coincidentally making the checksum
match), now we try to turn checksums on and don't re-replicate the
page because the checksum already looks correct.

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same

page

data? That is, this block in ProcessSingleRelationFork:

/*
* If checksum was not set or was invalid, mark the buffer as dirty
* and force a full page write. If the checksum was already valid,

we

* can leave it since we know that any other process writing the
* buffer will update the checksum.
*/
if (checksum != pagehdr->pd_checksum)
{
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();
}

That would mean this optimization - only doing the write when the
checksum does not match - is broken.

Yes. I think that was the conclusion of this, as posted
in /messages/by-id/CABUevExDZu__

5KweT8fr3Ox45YcuvTDEEu%3DaDpGBT8Sk0RQE_g%40mail.gmail.com

:)

Oh, right. I did have a "deja vu" feeling, when writing that. Good that
I came to the same conclusion, though.

If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.

Yes, it definitely does. It's not a dealbreaker, but it's certainly
a bit painful not to be able to resume as cheap.

Yeah. It probably makes the more elaborate resuming more valuable, but I
still think it's not a "must have" for PG11.

Attached is a rebased patch which removes this optimization, updates the
pg_proc entry for the new format, and changes pg_verify_checksums to use -r
instead of -o for relfilenode.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums3.patchtext/x-patch; charset=US-ASCII; name=online_checksums3.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 00fc364c0a..bf6f694640 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8541,7 +8541,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only be enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..a011ea1d8f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..56aaa88de1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7769,6 +7839,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9522,6 +9602,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..0d10fd4c89 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,45 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int cost_delay = PG_GETARG_INT32(0);
+	int cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5e6e8a64f6..a2965d35d4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..6aa71bcf30
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,631 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values  set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.
+		 * We have to re-write the page to wal even if the checksum hasn't
+		 * changed, because if there is a replica it might have a slightly
+		 * different version of the page with an invalid checksum, caused
+		 * by unlogged changes (e.g. hintbits) on the master happening while
+		 * checksums were off. This can happen if there was a valid checksum
+		 * on the page at one point in the past, so only when checksums
+		 * are first on, then off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in %s", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in %s", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in %s completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case with.
+		 * Any database that still exists but failed we retry for a limited
+		 * number of times before giving up. Any database that remains in
+		 * failed state after that will fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in %s, giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database %s dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database %s does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1db7845d5a..039b63bb05 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -419,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1665,17 +1679,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -3955,6 +3958,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10203,6 +10217,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 }
 
 static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
+static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
 #ifndef USE_SSL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..9f5a5848ee
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,42 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..a4bfe7284d
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,308 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char *forkpath, *segmentpath;
+			int segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number in order to
+			 * mix it into the checksum. Then also cut off at the fork boundary, to get
+			 * the relfilenode the file belongs to for filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename %s\n"), progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0fdb42f639..ca9904293d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5556,6 +5556,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 16 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..7f296264a9
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index 3de9428299..6bde2bc879 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#70

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#69)

Re: Online enabling of checksums

Hi,

On Sat, Mar 03, 2018 at 07:23:31PM +0100, Magnus Hagander wrote:

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..a4bfe7284d
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,308 @@

[...]

+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group

Weird copyright statement for a new file, did you base it off another
one, or just copy-pasted the boilerplate?

[...]

+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);

The error message sounds a bit strange to me, I would expect the
filename after "in file [...]", but you print the expected checksum.
Also, 'invalid' sounds a bit like we found something which is malformed
checksum (no hex), so maybe "checksum mismatch in file, expected %X,
found %X" or something?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

#71

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Michael Banck (#70)

Re: Online enabling of checksums

On 04 Mar 2018, at 15:24, Michael Banck <michael.banck@credativ.de> wrote:

+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
The error message sounds a bit strange to me, I would expect the
filename after "in file [...]", but you print the expected checksum.
Also, 'invalid' sounds a bit like we found something which is malformed
checksum (no hex), so maybe "checksum mismatch in file, expected %X,
found %X" or something?

Agreed. Looking at our current error messages, “in file” is conventionally
followed by the filename. I do however think “calculated” is better than
“expected” since it conveys clearly that the compared checksum is calculated by
pg_verify_checksum and not read from somewhere.

How about something like this?

_(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
progname, fn, blockno, csum, header->pd_checksum);

cheers ./daniel

#72

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Daniel Gustafsson (#71)

Re: Online enabling of checksums

Hi,

Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:

Agreed. Looking at our current error messages, “in file” is conventionally
followed by the filename. I do however think “calculated” is better than
“expected” since it conveys clearly that the compared checksum is calculated by
pg_verify_checksum and not read from somewhere.

How about something like this?

_(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
progname, fn, blockno, csum, header->pd_checksum);

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

https://github.com/uptimejp/postgres-toolkit/issues/48

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

#73

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Michael Banck (#72)

Re: Online enabling of checksums

On Mon, Mar 5, 2018 at 10:43 AM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:

Agreed. Looking at our current error messages, “in file” is

conventionally

followed by the filename. I do however think “calculated” is better than
“expected” since it conveys clearly that the compared checksum is

calculated by

pg_verify_checksum and not read from somewhere.

How about something like this?

_(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found

%X”),

progname, fn, blockno, csum, header->pd_checksum);

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

Maybe we should just say "on disk" for the one that's on disk, would that
break the confusion? So "calculated %X, found %X on disk"?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#74

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#73)

Re: Online enabling of checksums

Hi,

On Mon, Mar 05, 2018 at 11:09:02AM +0100, Magnus Hagander wrote:

On Mon, Mar 5, 2018 at 10:43 AM, Michael Banck <michael.banck@credativ.de>
wrote:

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

Maybe we should just say "on disk" for the one that's on disk, would that
break the confusion? So "calculated %X, found %X on disk"?

I found that there is a precedent in bufpage.c:

| ereport(WARNING,
| (ERRCODE_DATA_CORRUPTED,
| errmsg("page verification failed, calculated checksum %u but expected %u",
| checksum, p->pd_checksum)));

apart from the fact that it doesn't print out the hex value (which I
find strange), it sounds like a sensible message to me. But "found %X on
disk" would work as well I guess.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

#75

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#69)

Re: Online enabling of checksums

Hi,

I had a closer look at v3 of the patch now.

On Sat, Mar 03, 2018 at 07:23:31PM +0100, Magnus Hagander wrote:

Attached is a rebased patch which removes this optimization, updates the
pg_proc entry for the new format, and changes pg_verify_checksums to use -r
instead of -o for relfilenode.

The patch applies fine with minimal fuzz and compiles with no warnings;
make check and the added isolation tests, as well as the added checksum
tests pass.

If I blindly run "SELECT pg_enable_data_checksums();" on new cluster, I
get:

|postgres=# SELECT pg_enable_data_checksums();
| pg_enable_data_checksums
|--------------------------
| t
|(1 row)
|
|postgres=# SHOW data_checksums ;
| data_checksums
|----------------
| inprogress
|(1 row)

However, inspecting the log one sees:

|2018-03-10 14:15:57.702 CET [3313] ERROR: Database template0 does not allow connections.
|2018-03-10 14:15:57.702 CET [3313] HINT: Allow connections using ALTER DATABASE and try again.
|2018-03-10 14:15:57.702 CET [3152] LOG: background worker "checksum helper launcher" (PID 3313) exited with exit code 1

and the background worker is no longer running without any obvious hint
to the client.

I am aware that this is discussed already, but as-is the user experience
is pretty bad, I think pg_enable_data_checksums() should either bail
earlier if it cannot connect to all databases, or it should be better
documented.

Otherwise, if I allow connections to template0, the patch works as
expected, I have not managed to break it so far.

Some further review comments:

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@

[...]

+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>

Maybe document the above issue here, unless it is clear that the
templat0-needs-to-allow-connections issue will go away before the patch
is pushed.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..56aaa88de1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c

[...]

+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");

The string used in "SHOW data_checksums" is "inprogress", not
"in_progress".

[...]

@@ -7769,6 +7839,16 @@ StartupXLOG(void)
CompleteCommitTsInitialization();

/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));

Again, string is "inprogress", not "in progress", not sure if that
matters.

diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..6aa71bcf30
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,631 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages

Backend or Background?

[...]

+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values  set on start */

double space.

[...]

+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);

This looks like it does not take the segment number into account;
however it is also unclear to me what the purpose of this is, as
checksum is never validated against the pagehdr, and nothing is done
with it. Indeed, I even get a compiler warning about pagehdr and
checksum:

git/postgresql/build/../src/backend/postmaster/checksumhelper.c:
In function ‘ProcessSingleRelationFork’:
git/postgresql/build/../src/backend/postmaster/checksumhelper.c:155:11:
warning: variable ‘checksum’ set but not used
[-Wunused-but-set-variable]
uint16 checksum;
^~~~~~~~
git/postgresql/build/../src/backend/postmaster/checksumhelper.c:154:14:
warning: variable ‘pagehdr’ set but not used [-Wunused-but-set-variable]
PageHeader pagehdr;
^~~~~~~

I guess the above block running pg_checksum_page() is a leftover from
previous versions of the patch and should be removed...

+		/*
+		 * Mark the buffer as dirty and force a full page write.
+		 * We have to re-write the page to wal even if the checksum hasn't
+		 * changed, because if there is a replica it might have a slightly
+		 * different version of the page with an invalid checksum, caused
+		 * by unlogged changes (e.g. hintbits) on the master happening while
+		 * checksums were off. This can happen if there was a valid checksum
+		 * on the page at one point in the past, so only when checksums
+		 * are first on, then off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();

... seeing how MarkBufferDirty(buf) is now run unconditionally.

[...]

+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db
+	 */

Comment should have a full stop like the above and below ones?

+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */

[...]

+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again .

Stray space.

+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * DROPed between us getting the database list and trying to process

DROPed looks wrong, and there's no other occurence of it in the source
tree. DROPped looks even weirder, so maybe just "dropped"?

+ * it. Get a fresh list of databases to detect the second case with.

That sentence looks unfinished or at least is unclear to me.

+ * Any database that still exists but failed we retry for a limited

I'm not a native speaker, but this looks wrong to me as well, maybe "We
retry any database that still exists but failed for a limited [...]"?

diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
*/

/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
* We might eventually allow upgrades from checksum to no-checksum
* clusters.
*/

See below about src/bin/pg_upgrade/pg_upgrade.h having
data_checksum_version be a bool.

I checked pg_ugprade (from master to master though), and could find no
off-hand issues, i.e. it reported all issues correctly.

--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile

[...]

+check:
+       $(prove_check)
+
+installcheck:
+       $(prove_installcheck)

If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl
error:

|src/bin/pg_verify_checksums$ LANG=C make check
|rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install
|/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log
|make -C '../../..' DESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install install >'/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log/install.log 2>&1
|rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check
|/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check
|cd /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums && TESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums' PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//bin:$PATH" LD_LIBRARY_PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//lib" PGPORT='65432' PG_REGRESS='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/test/perl/ -I /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums t/*.pl
|Cannot detect source of 't/*.pl'! at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 261.
| TAP::Parser::IteratorFactory::detect_source(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8), TAP::Parser::Source=HASH(0x55eed10bd358)) called at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 211
| TAP::Parser::IteratorFactory::make_iterator(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8), TAP::Parser::Source=HASH(0x55eed10bd358)) called at /usr/share/perl/5.24/TAP/Parser.pm line 472
| TAP::Parser::_initialize(TAP::Parser=HASH(0x55eed10df328), HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Object.pm line 55
| TAP::Object::new("TAP::Parser", HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Object.pm line 130
| TAP::Object::_construct(TAP::Harness=HASH(0x55eed09176b0), "TAP::Parser", HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Harness.pm line 852
| TAP::Harness::make_parser(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Scheduler::Job=HASH(0x55eed0fdc708)) called at /usr/share/perl/5.24/TAP/Harness.pm line 651
| TAP::Harness::_aggregate_single(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520), TAP::Parser::Scheduler=HASH(0x55eed0fdc6a8)) called at /usr/share/perl/5.24/TAP/Harness.pm line 743
| TAP::Harness::aggregate_tests(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520), "t/*.pl") called at /usr/share/perl/5.24/TAP/Harness.pm line 558
| TAP::Harness::__ANON__() called at /usr/share/perl/5.24/TAP/Harness.pm line 571
| TAP::Harness::runtests(TAP::Harness=HASH(0x55eed09176b0), "t/*.pl") called at /usr/share/perl/5.24/App/Prove.pm line 546
| App::Prove::_runtests(App::Prove=HASH(0x55eed090b0c8), HASH(0x55eed0d79cf0), "t/*.pl") called at /usr/share/perl/5.24/App/Prove.pm line 504
| App::Prove::run(App::Prove=HASH(0x55eed090b0c8)) called at /usr/bin/prove line 13
|Makefile:39: recipe for target 'check' failed
|make: *** [check] Error 2

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..a4bfe7284d
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c

[...]

+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}

Those two if (DataDir == NULL) checks could maybe be put together into
one block.

diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..7f296264a9
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon

"deamon" is surely wrong (it'd be "daemon"), but maybe "(background)
worker" is better?

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION		4
#define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2

I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being
1, but I assumed it was a version, like, if we ever decide to use a
different checksumming algorithm, we'd bump it to 2.

Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree
is convenient, but is there some strategy what to do about this in case
the PG_DATA_CHECKSUM_VERSION needs to be increased?

In any case, src/bin/pg_upgrade/pg_upgrade.h has

bool data_checksum_version;

in the ControlData struct, which might need updating?

That's all for now.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

#76

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Michael Banck (#75)

1 attachment(s)

Re: Online enabling of checksums

On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:

I had a closer look at v3 of the patch now.

Thanks, much appreciated! Sorry for the late response, just came back from a
conference and have had little time for hacking.

All whitespace, punctuation and capitalization comments have been addressed
with your recommendations, so I took the liberty to trim them from the
response.

I am aware that this is discussed already, but as-is the user experience
is pretty bad, I think pg_enable_data_checksums() should either bail
earlier if it cannot connect to all databases, or it should be better
documented.

Personally I think we should first attempt to solve the "allow-connections in
background workers” issue which has been raised on another thread. For now I’m
documenting this better.

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..89afecb341 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,73 @@

[...]

+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>

Maybe document the above issue here, unless it is clear that the
templat0-needs-to-allow-connections issue will go away before the patch
is pushed.

I have added a paragraph on allowing connections here, as well as a note that
template0 will need to be handled.

+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in in_progress mode");

The string used in "SHOW data_checksums" is "inprogress", not
"in_progress”.

Fixed.

@@ -7769,6 +7839,16 @@ StartupXLOG(void)
CompleteCommitTsInitialization();

/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"in progress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));

Again, string is "inprogress", not "in progress", not sure if that
matters.

I think it does, we need to be consistent in userfacing naming. Updated.

diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..6aa71bcf30
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,631 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Backend worker to walk the database and write checksums to pages

Backend or Background?

“Background” is the right term here, fixed.

+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+		Page		page;
+		PageHeader	pagehdr;
+		uint16		checksum;
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/* Do we already have a valid checksum? */
+		page = BufferGetPage(buf);
+		pagehdr = (PageHeader) page;
+		checksum = pg_checksum_page((char *) page, b);

I guess the above block running pg_checksum_page() is a leftover from
previous versions of the patch and should be removed…

Correct, the checksum and pagehdr was used in the previous optimization to skip
forced writes where the checksum was valid. That optimization was however
based on a faulty assumption and was removed in the v3 patch. The leftover
variables are now removed.

+ * it. Get a fresh list of databases to detect the second case with.

That sentence looks unfinished or at least is unclear to me.

Updated to indicate what we mean by “second case”.

+ * Any database that still exists but failed we retry for a limited

I'm not a native speaker, but this looks wrong to me as well, maybe "We
retry any database that still exists but failed for a limited [...]”?

Updated and extended a bit for clarity.

Fixing this review comment also made me realize that checksumhelper.c was using
%s for outputting the database name in errmsg(), but \”%s\” is what we commonly
use. Updated all errmsg() invocations to quote the database name.

diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
*/

/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
* We might eventually allow upgrades from checksum to no-checksum
* clusters.
*/

See below about src/bin/pg_upgrade/pg_upgrade.h having
data_checksum_version be a bool.

I checked pg_ugprade (from master to master though), and could find no
off-hand issues, i.e. it reported all issues correctly.

data_checksum_version is indeed defined bool, but rather than being an actual
bool it’s a bool backed by a char via a typedef in c.h. This is why it works
to assign 0, 1 or 2 without issues.

That being said, I agree that it reads odd now that checksum version isn’t just
0 or 1 which mentally translate neatly to false or true. Since this is the
patch introducing version 2, I changed data_checksum_version to a uint32 as
that makes the intent much clearer.

+check:
+       $(prove_check)
+
+installcheck:
+       $(prove_installcheck)
If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl
error:

Thats because the check and installcheck targets are copy-pasteos, there are no
tests for pg_verify_checksums currently so the targets should not be in the
Makefile. Fixed.

+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	if (DataDir == NULL)
+	{
+		fprintf(stderr, _("%s: no data directory specified\n"), progname);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+		exit(1);
+	}

Those two if (DataDir == NULL) checks could maybe be put together into
one block.

Moved the check into the first block, as it makes code clearr and doesn’t
change the order in which the error messages for missing datadir and
too-many-arguments will be output.

diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..7f296264a9
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper deamon

"deamon" is surely wrong (it'd be "daemon"), but maybe "(background)
worker" is better?

Yes, "background worker” is better, updated.

diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION		4
#define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being
1, but I assumed it was a version, like, if we ever decide to use a
different checksumming algorithm, we'd bump it to 2.

Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree
is convenient, but is there some strategy what to do about this in case
the PG_DATA_CHECKSUM_VERSION needs to be increased?

I don’t think this has been discussed in any thread dealing with enabling
checksums in an online cluster, where using the version was mentioned (or my
archive search is failing me). Iff another algorithm was added I assume we’d
need something like the proposed checksumhelper to allow migrations from
version 1. Just as we now use version 2 to indicate that we are going from
version 0 to version 1, the same can be used for going from version 1 to 3 as
long as there is a source and destination version recorded.

I don’t know what such a process would look like, but I don’t see that we are
blocking a future new checksum algorithm by using version 2 for inprogress
(although I agree that using the version here is closer to convenient than
elegant).

Attached v4 of this patch, which addresses this review, and flipping status
back in the CF app back to Needs Review.

cheers ./daniel

Attachments:

online_checksums4.patchapplication/octet-stream; name=online_checksums4.patchDownload

---
 doc/src/sgml/config.sgml                          |   3 +-
 doc/src/sgml/func.sgml                            |  65 +++
 doc/src/sgml/ref/allfiles.sgml                    |   1 +
 doc/src/sgml/ref/initdb.sgml                      |   6 +-
 doc/src/sgml/ref/pg_verify_checksums.sgml         | 112 ++++
 doc/src/sgml/reference.sgml                       |   1 +
 doc/src/sgml/wal.sgml                             |  82 +++
 src/backend/access/rmgrdesc/xlogdesc.c            |  16 +
 src/backend/access/transam/xlog.c                 | 119 +++-
 src/backend/access/transam/xlogfuncs.c            |  43 ++
 src/backend/catalog/system_views.sql              |   5 +
 src/backend/postmaster/Makefile                   |   5 +-
 src/backend/postmaster/bgworker.c                 |   7 +
 src/backend/postmaster/checksumhelper.c           | 625 ++++++++++++++++++++++
 src/backend/postmaster/pgstat.c                   |   5 +
 src/backend/replication/logical/decode.c          |   1 +
 src/backend/storage/ipc/ipci.c                    |   2 +
 src/backend/storage/page/bufpage.c                |  14 +-
 src/backend/utils/misc/guc.c                      |  71 ++-
 src/bin/pg_upgrade/controldata.c                  |   9 +
 src/bin/pg_upgrade/pg_upgrade.h                   |   2 +-
 src/bin/pg_verify_checksums/.gitignore            |   1 +
 src/bin/pg_verify_checksums/Makefile              |  36 ++
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 309 +++++++++++
 src/include/access/xlog.h                         |  10 +-
 src/include/access/xlog_internal.h                |   7 +
 src/include/catalog/pg_control.h                  |   1 +
 src/include/catalog/pg_proc.h                     |   5 +
 src/include/pgstat.h                              |   4 +-
 src/include/postmaster/checksumhelper.h           |  31 ++
 src/include/storage/bufpage.h                     |   1 +
 src/include/storage/checksum.h                    |   7 +
 src/test/Makefile                                 |   3 +-
 src/test/checksum/.gitignore                      |   2 +
 src/test/checksum/Makefile                        |  24 +
 src/test/checksum/README                          |  22 +
 src/test/checksum/t/001_standby_checksum.pl       |  86 +++
 src/test/isolation/expected/checksum_enable.out   |  27 +
 src/test/isolation/isolation_schedule             |   3 +
 src/test/isolation/specs/checksum_enable.spec     |  71 +++
 40 files changed, 1811 insertions(+), 33 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_verify_checksums.sgml
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/bin/pg_verify_checksums/.gitignore
 create mode 100644 src/bin/pg_verify_checksums/Makefile
 create mode 100644 src/bin/pg_verify_checksums/pg_verify_checksums.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl
 create mode 100644 src/test/isolation/expected/checksum_enable.out
 create mode 100644 src/test/isolation/specs/checksum_enable.spec

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3a8fc7d803..60e06980de 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8546,7 +8546,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only be enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..a011ea1d8f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..cb6783e415 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,88 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connetions, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..51c052f69a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7768,6 +7838,16 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9521,6 +9601,22 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..0d10fd4c89 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,45 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int cost_delay = PG_GETARG_INT32(0);
+	int cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5e6e8a64f6..a2965d35d4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..9186dbea22
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,625 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra
+ * process is required as each page is checksummed, and verified, at
+ * accesses.  When enabling checksums on an already running cluster
+ * which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the
+ * checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific
+ * database before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/*
+		 * Failed to set means somebody else started
+		 */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+
+		/*
+		 * Launcher not started, so nothing to shut down.
+		 */
+		return;
+
+	ereport(ERROR,
+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.
+		 * We have to re-write the page to wal even if the checksum hasn't
+		 * changed, because if there is a replica it might have a slightly
+		 * different version of the page with an invalid checksum, caused
+		 * by unlogged changes (e.g. hintbits) on the master happening while
+		 * checksums were off. This can happen if there was a valid checksum
+		 * on the page at one point in the past, so only when checksums
+		 * are first on, then off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * Enable checksums in a single database.
+ * We do this by launching a dynamic background worker into this database,
+ * and waiting for it to finish.
+ * We have to do this in a separate worker, since each process can only be
+ * connected to one database during it's lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in \"%s\"", db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in \"%s\"", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"", db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in \"%s\"", db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed", db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it.
+		 * Any database that still exists but where enabling checksums failed,
+		 * is retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.", db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("Database \"%s\" dropped, skipping", db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+
+	/*
+	 * Force a checkpoint to get everything out to disk
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+}
+
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("Database \"%s\" does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * If shared is true, both shared relations and local ones are returned, else all
+ * non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("Checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index fc3e10c750..01c50d68f2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -418,6 +421,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1022,7 +1036,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1664,17 +1678,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -3956,6 +3959,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10203,6 +10217,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
 static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -590,6 +590,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..f3c93bb516
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,309 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open %s: %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in %s, got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+						progname, fn, blockno, header->pd_checksum, csum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: %s, block %d, correct checksum %X\n"), progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory %s: %m\n"), progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file %s: %m\n"), progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char *forkpath, *segmentpath;
+			int segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number in order to
+			 * mix it into the checksum. Then also cut off at the fork boundary, to get
+			 * the relfilenode the file belongs to for filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename %s\n"), progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0fdb42f639..ca9904293d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5556,6 +5556,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 16 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..87c0266672
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"
-- 
2.14.1.145.gb3622a4ee

#77

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Daniel Gustafsson (#76)

1 attachment(s)

Re: Online enabling of checksums

Hi,

On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:

On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:
I am aware that this is discussed already, but as-is the user experience
is pretty bad, I think pg_enable_data_checksums() should either bail
earlier if it cannot connect to all databases, or it should be better
documented.

Personally I think we should first attempt to solve the "allow-connections in
background workers” issue which has been raised on another thread. For now I’m
documenting this better.

I had a look at that thread and it seems stalled, I am a bit worried
that this will not be solved before the end of the CF.

So I think unless the above gets solved, pg_enable_data_checksums()
should error out with the hint. I've had a quick look and it seems one
can partly duplicate the check from BuildDatabaseList() (or optionally
move it) to the beginning of StartChecksumHelperLauncher(), see
attached.

That results in:

postgres=# SELECT pg_enable_data_checksums();
ERROR: Database "template0" does not allow connections.
HINT: Allow connections using ALTER DATABASE and try again.
postgres=#

Which I think is much nice than what we have right now:

postgres=# SELECT pg_enable_data_checksums();
pg_enable_data_checksums
--------------------------
t
(1 row)

postgres=# \q
postgres@fock:~$ tail -3 pg1.log
2018-03-18 14:00:08.512 CET [25514] ERROR: Database "template0" does not allow connections.
2018-03-18 14:00:08.512 CET [25514] HINT: Allow connections using ALTER DATABASE and try again.
2018-03-18 14:00:08.513 CET [24930] LOG: background worker "checksum helper launcher" (PID 25514) exited with exit code 1

Attached v4 of this patch, which addresses this review, and flipping status
back in the CF app back to Needs Review.

Thanks!

The following errmsg() capitalize the error message without the first
word being a specific term, which I believe is not project style:

+			(errmsg("Checksum helper is currently running, cannot disable checksums"),
+						(errmsg("Database \"%s\" dropped, skipping", db->dbname)));
+			(errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+					(errmsg("Database \"%s\" does not allow connections.", NameStr(pgdb->datname)),
+			(errmsg("Checksum worker starting for database oid %d", dboid)));
+			(errmsg("Checksum worker completed in database oid %d", dboid)));

Also, in src/backend/postmaster/checksumhelper.c there are few
multi-line comments (which are not function comments) that do not have a
full stop at the end, which I think is also project style:

+ * Failed to set means somebody else started

Could be changed to a one-line (/* ... */) comment?

+ * Force a checkpoint to get everything out to disk

Should have a full stop.

+ * checksummed, so skip

Should have a full stop.

+ * Enable vacuum cost delay, if any

Could be changed to a one-line comment?

+ * Create and set the vacuum strategy as our buffer strategy

Could be changed to a one-line comment?

Apart from that, I previously complained about the error in
pg_verify_checksums:

+                               fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+                                               progname, fn, blockno, header->pd_checksum, csum);

I still propose something like in backend/storage/page/bufpage.c's
PageIsVerified(), e.g.:

|fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
| progname, fn, blockno, csum, header->pd_checksum);

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.

Cheers,

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

online_checksums_check_datallowconn.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 9186dbea22..62d5b7b9c8 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -88,6 +88,25 @@ StartChecksumHelperLauncher(int cost_delay, int cost_limit)
 {
 	BackgroundWorker bgw;
 	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	/*
+	 * Check that all databases allow connections, while we can still send
+	 * an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections.", NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
 
 	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
 	{

#78

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Michael Banck (#77)

Re: Online enabling of checksums

Hi!

18 марта 2018 г., в 19:02, Michael Banck <michael.banck@credativ.de> написал(а):

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.

Yep, i'm already doing another pass on the code again. Hope to finish tomorrow.
My 2 cents, there's typo in the word "connections"
+ <literal>template0</literal> is by default not accepting connetions, to

Best regards, Andrey Borodin.

#79

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Michael Banck (#77)

1 attachment(s)

Re: Online enabling of checksums

On 18 Mar 2018, at 15:02, Michael Banck <michael.banck@credativ.de> wrote:
On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:

On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:
I am aware that this is discussed already, but as-is the user experience
is pretty bad, I think pg_enable_data_checksums() should either bail
earlier if it cannot connect to all databases, or it should be better
documented.

Personally I think we should first attempt to solve the "allow-connections in
background workers” issue which has been raised on another thread. For now I’m
documenting this better.

I had a look at that thread and it seems stalled, I am a bit worried
that this will not be solved before the end of the CF.

So I think unless the above gets solved, pg_enable_data_checksums()
should error out with the hint. I've had a quick look and it seems one
can partly duplicate the check from BuildDatabaseList() (or optionally
move it) to the beginning of StartChecksumHelperLauncher(), see
attached.

I’ve incorporated a slightly massaged version of your patch. While it is a
little ugly to duplicate the logic, it’s hopefully a short-term fix, and I
agree that silently failing is even uglier. Thanks for the proposal!

It should be noted that a database may well be altered to not allow connections
between the checksum helper starts, or gets the database list, and when it
tries to checksum it, so we might still fail on this very issue with the checks
bypassed. Still improves the UX to catch low hanging fruit of course.

The following errmsg() capitalize the error message without the first
word being a specific term, which I believe is not project style:

Fixed.

Also, in src/backend/postmaster/checksumhelper.c there are few
multi-line comments (which are not function comments) that do not have a
full stop at the end, which I think is also project style:

I’ve addressed all of these, but I did leave one as a multi-line which I think
looks better as that. I also did some spellchecking and general tidying up of
the error messages and comments in the checksum helper.

Apart from that, I previously complained about the error in
pg_verify_checksums:

Updated to your suggestion. While in there I re-wrote a few other error
messages to be consistent in message and quoting.

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

Thanks a lot for your reviews!

cheers ./daniel

Attachments:

online_checksums5.patchapplication/octet-stream; name=online_checksums5.patchDownload

---
 doc/src/sgml/config.sgml                          |   3 +-
 doc/src/sgml/func.sgml                            |  65 +++
 doc/src/sgml/ref/allfiles.sgml                    |   1 +
 doc/src/sgml/ref/initdb.sgml                      |   6 +-
 doc/src/sgml/ref/pg_verify_checksums.sgml         | 112 ++++
 doc/src/sgml/reference.sgml                       |   1 +
 doc/src/sgml/wal.sgml                             |  82 +++
 src/backend/access/rmgrdesc/xlogdesc.c            |  16 +
 src/backend/access/transam/xlog.c                 | 119 +++-
 src/backend/access/transam/xlogfuncs.c            |  43 ++
 src/backend/catalog/system_views.sql              |   5 +
 src/backend/postmaster/Makefile                   |   5 +-
 src/backend/postmaster/bgworker.c                 |   7 +
 src/backend/postmaster/checksumhelper.c           | 653 ++++++++++++++++++++++
 src/backend/postmaster/pgstat.c                   |   5 +
 src/backend/replication/logical/decode.c          |   1 +
 src/backend/storage/ipc/ipci.c                    |   2 +
 src/backend/storage/page/bufpage.c                |  14 +-
 src/backend/utils/misc/guc.c                      |  71 ++-
 src/bin/pg_upgrade/controldata.c                  |   9 +
 src/bin/pg_upgrade/pg_upgrade.h                   |   2 +-
 src/bin/pg_verify_checksums/.gitignore            |   1 +
 src/bin/pg_verify_checksums/Makefile              |  36 ++
 src/bin/pg_verify_checksums/pg_verify_checksums.c | 314 +++++++++++
 src/include/access/xlog.h                         |  10 +-
 src/include/access/xlog_internal.h                |   7 +
 src/include/catalog/pg_control.h                  |   1 +
 src/include/catalog/pg_proc.h                     |   5 +
 src/include/pgstat.h                              |   4 +-
 src/include/postmaster/checksumhelper.h           |  31 +
 src/include/storage/bufpage.h                     |   1 +
 src/include/storage/checksum.h                    |   7 +
 src/test/Makefile                                 |   3 +-
 src/test/checksum/.gitignore                      |   2 +
 src/test/checksum/Makefile                        |  24 +
 src/test/checksum/README                          |  22 +
 src/test/checksum/t/001_standby_checksum.pl       |  86 +++
 src/test/isolation/expected/checksum_enable.out   |  27 +
 src/test/isolation/isolation_schedule             |   3 +
 src/test/isolation/specs/checksum_enable.spec     |  71 +++
 40 files changed, 1844 insertions(+), 33 deletions(-)
 create mode 100644 doc/src/sgml/ref/pg_verify_checksums.sgml
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/bin/pg_verify_checksums/.gitignore
 create mode 100644 src/bin/pg_verify_checksums/Makefile
 create mode 100644 src/bin/pg_verify_checksums/pg_verify_checksums.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl
 create mode 100644 src/test/isolation/expected/checksum_enable.out
 create mode 100644 src/test/isolation/specs/checksum_enable.spec

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f18d2b3353..7cb4032c4a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8606,7 +8606,8 @@ LOG:  CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1)
         or hide corruption, or other serious problems</emphasis>.  However, it may allow
         you to get past the error and retrieve undamaged tuples that might still be
         present in the table if the block header is still sane. If the header is
-        corrupt an error will be reported even if this option is enabled. The
+        corrupt an error will be reported even if this option is enabled. This
+        option can only be enabled when data checksums are enabled. The
         default setting is <literal>off</literal>, and it can only be changed by a superuser.
        </para>
       </listitem>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..a011ea1d8f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        bool
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..eca75d86f7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,88 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..51c052f69a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,85 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledOrInProgress(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+bool
+DataChecksumsDisabled(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == 0);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	if (DataChecksumsEnabledOrInProgress())
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	if (!DataChecksumsEnabledOrInProgress())
+		elog(ERROR, "Checksums not enabled or in progress");
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7768,6 +7838,16 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9521,6 +9601,22 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9949,6 +10045,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..0d10fd4c89 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,45 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (DataChecksumsDisabled())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(DataChecksumsDisabled());
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int cost_delay = PG_GETARG_INT32(0);
+	int cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_BOOL(DataChecksumsEnabledOrInProgress());
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5e6e8a64f6..a2965d35d4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..86302726dc
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,653 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	bool		success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksum helper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static bool ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksum helper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of duplicating
+	 * this is to catch any databases we won't be able to open while we can
+	 * still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksum helper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Launcher not started, so nothing to shut down */
+		return;
+	}
+
+	ereport(ERROR,
+			(errmsg("checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish.")));
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums were
+		 * off. This can happen if there was a valid checksum on the page at
+		 * one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+			ProcessSingleRelationFork(rel, fnum, strategy);
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d", relationId);
+
+	CommitTransactionCommand();
+
+	return true;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static bool
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+
+	ChecksumHelperShmem->success = false;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksum helper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksum helper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksum helper in \"%s\"",
+				 db->dbname)));
+		return false;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksum helper in \"%s\"",
+				 db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+			 db->dbname)));
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksum helper in \"%s\"",
+				 db->dbname)));
+		return false;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+			 db->dbname)));
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any new created database will
+	 * be running with checksums turned on from the start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			if (ProcessDatabase(db))
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it.
+		 * Any database that still exists but where enabling checksums failed,
+		 * is retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ereport(ERROR,
+					(errmsg("failed to process table with oid %d", rel->reloid)));
+		}
+	}
+	list_free_deep(RelationList);
+
+	ChecksumHelperShmem->success = true;
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..c158e67a28 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,15 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * If data checksums have been turned on in a running cluster which
+		 * was initdb'd without checksums, or a cluster which has had
+		 * checksums turned off, we hold off on verifying the checksum until
+		 * all pages again are checksummed.  The PageSetChecksum functions
+		 * must continue to write the checksums even though we don't validate
+		 * them yet.
+		 */
+		if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1176,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1203,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsEnabledOrInProgress())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7a7ac479c1..71e39ec39e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -165,6 +167,7 @@ static void assign_syslog_ident(const char *newval, void *extra);
 static void assign_session_replication_role(int newval, void *extra);
 static bool check_temp_buffers(int *newval, void **extra, GucSource source);
 static bool check_bonjour(bool *newval, void **extra, GucSource source);
+static bool check_ignore_checksum_failure(bool *newval, void **extra, GucSource source);
 static bool check_ssl(bool *newval, void **extra, GucSource source);
 static bool check_stage_log_stats(bool *newval, void **extra, GucSource source);
 static bool check_log_stats(bool *newval, void **extra, GucSource source);
@@ -418,6 +421,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -513,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1031,7 +1045,7 @@ static struct config_bool ConfigureNamesBool[] =
 		},
 		&ignore_checksum_failure,
 		false,
-		NULL, NULL, NULL
+		check_ignore_checksum_failure, NULL, NULL
 	},
 	{
 		{"zero_damaged_pages", PGC_SUSET, DEVELOPER_OPTIONS,
@@ -1673,17 +1687,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -3975,6 +3978,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
@@ -10222,6 +10236,37 @@ check_bonjour(bool *newval, void **extra, GucSource source)
 	return true;
 }
 
+static bool
+check_ignore_checksum_failure(bool *newval, void **extra, GucSource source)
+{
+	if (*newval)
+	{
+		/*
+		 * When data checksums are in progress, the verification of the
+		 * checksums is already ignored until all pages have had checksums
+		 * backfilled, making the effect of ignore_checksum_failure a no-op.
+		 * Allowing it during checksumming in progress can hide the fact that
+		 * checksums become enabled once done, so disallow.
+		 */
+		if (DataChecksumsInProgress())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" are in progress.");
+			return false;
+		}
+
+		/*
+		 * While safe, it's nonsensical to allow ignoring checksums when data
+		 * checksums aren't enabled in the first place.
+		 */
+		if (DataChecksumsDisabled())
+		{
+			GUC_check_errdetail("\"ignore_checksum_failure\" cannot be turned on when \"data_checksums\" aren't enabled.");
+			return false;
+		}
+	}
+	return true;
+}
+
 static bool
 check_ssl(bool *newval, void **extra, GucSource source)
 {
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -590,6 +590,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..7684cde0e6
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,314 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char *forkpath, *segmentpath;
+			int segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..788f2d0b58 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledOrInProgress() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0fdb42f639..ca9904293d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5556,6 +5556,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 16 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..87c0266672
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void ChecksumHelperLauncherMain(Datum arg);
+void ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..290a74fc7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that both standbys have switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres', "INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data');
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..b5c5563f98
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+t              
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..cdd44979a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,6 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..f16127fa3f
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.5);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"
-- 
2.14.1.145.gb3622a4ee

#80

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Andrey Borodin (#78)

Re: Online enabling of checksums

On 18 Mar 2018, at 17:21, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

18 марта 2018 г., в 19:02, Michael Banck <michael.banck@credativ.de> написал(а):

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.

Yep, i'm already doing another pass on the code again. Hope to finish tomorrow.
My 2 cents, there's typo in the word "connections"
+ <literal>template0</literal> is by default not accepting connetions, to

Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
(version 5). Thanks!

cheers ./daniel

#81

Heikki Linnakangas

hlinnaka@iki.fi

almost 8 years ago

In reply to: Daniel Gustafsson (#79)

Re: Online enabling of checksums

Hi,

The patch looks good to me at a high level. Some notes below. I didn't
read through the whole thread, so sorry if some of these have been
discussed already.

+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Launcher not started, so nothing to shut down */
+		return;
+	}
+
+	ereport(ERROR,
+			(errmsg("checksum helper is currently running, cannot disable checksums"),
+			 errhint("Restart the cluster or wait for the worker to finish.")));
+}

Is there no way to stop the checksum helper once it's started? That
seems rather user-unfriendly. I can imagine it being a pretty common
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to
realize that you forgot to set the cost limit, and that it's hurting
queries too much. At that point, you want to abort.

+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.

I can't parse this sentence.

+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit
long. And doing this in PageIsVerified looks a bit weird:

if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()

+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that
it's bad.

+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear
to me when they would return what. I'd suggest marking them as 'void'
instead.

--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION		4
#define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2

This seems like a weird place for these PG_DATA_CHECKSUM_* constants.
They're not actually stored in the page header, as you might think. I
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk
format. I think it was a bit weird even before this patch, but now it's
worse. At least a better comment would be in order, or maybe move these
elsewhere.

Feature request: it'd be nice to update the 'ps status' with
set_ps_display, to show a rudimentary progress indicator. Like the name
and block number of the relation being processed. It won't tell you how
much is left to go, but at least it will give a warm fuzzy feeling to
the DBA that something is happening.

I didn't review the offline tool, pg_verify_checksums.

- Heikki

#82

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Daniel Gustafsson (#80)

Re: Online enabling of checksums

Hi, Daniel!

19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se> написал(а):

Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
(version 5). Thanks!

I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on enabling checksums for SLRU. The thing is CLOG is not protected with checksums right now. But the bad thing about it is that there's no reserved place for checksums in SLRU.
And this conversion from page without checksum to page with checksum is quite impossible online.

If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

Best regards, Andrey Borodin.

#83

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andrey Borodin (#82)

Re: Online enabling of checksums

On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Hi, Daniel!

19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se>

написал(а):

Fixed in patch just posted in 84693D0C-772F-45C2-88A1-

85B4983A5780@yesql.se

(version 5). Thanks!

I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on
enabling checksums for SLRU. The thing is CLOG is not protected with
checksums right now. But the bad thing about it is that there's no reserved
place for checksums in SLRU.
And this conversion from page without checksum to page with checksum is
quite impossible online.

If we commit online checksums before SLRU checksums, we will need very
neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to now
allow online enabling once SLRU protection is in there, and it doesn't make
sense for either of these patches to be blocking the other one for commit,
though it would of course be best to get both included.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#84

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Magnus Hagander (#83)

Re: Online enabling of checksums

On Mon, Mar 19, 2018 at 12:24 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Hi, Daniel!

19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se>

написал(а):

Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B498

3A5780@yesql.se

(version 5). Thanks!

I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on
enabling checksums for SLRU. The thing is CLOG is not protected with
checksums right now. But the bad thing about it is that there's no reserved
place for checksums in SLRU.
And this conversion from page without checksum to page with checksum is
quite impossible online.

If we commit online checksums before SLRU checksums, we will need very
neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to
now allow online enabling once SLRU protection is in there, and it doesn't
make sense for either of these patches to be blocking the other one for
commit, though it would of course be best to get both included.

Makes no sense to *not* allow it, of course. Meaning yes, that should be
handled.

We don' t need to convert from "page format with no support for checksums"
(pre-11) to "page format with support for checksums" (11+) online.

We do need to convert from "page format with support for checksums but no
checksums enabled" (11+) to "checksums enabled" online.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#85

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Magnus Hagander (#84)

Re: Online enabling of checksums

Hi Magnus!

19 марта 2018 г., в 16:57, Magnus Hagander <magnus@hagander.net> написал(а):

On Mon, Mar 19, 2018 at 12:24 PM, Magnus Hagander <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru <mailto:x4mmm@yandex-team.ru>> wrote:
Hi, Daniel!
If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to now allow online enabling once SLRU protection is in there, and it doesn't make sense for either of these patches to be blocking the other one for commit, though it would of course be best to get both included.

Makes no sense to *not* allow it, of course. Meaning yes, that should be handled.

We don' t need to convert from "page format with no support for checksums" (pre-11) to "page format with support for checksums" (11+) online.

We do need to convert from "page format with support for checksums but no checksums enabled" (11+) to "checksums enabled" online.

Oh, yes, you are right. Everything is fine, sorry for the noise.

Best regards, Andrey Borodin.

#86

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Heikki Linnakangas (#81)

Re: Online enabling of checksums

Hi!

19 марта 2018 г., в 11:27, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):

Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it being a pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to set the cost limit, and that it's hurting queries too much. At that point, you want to abort.

I've tried to pg_cancel_backend() and it worked.
But only if I cancel "checksum helper launcher" and then "checksum helper worker". If I cancel helper first - it spawns new.

Magnus, Daniel, is it safe to cancel worker or launcher?

BTW, I have some questions on pg_verify_chechsums.
It does not check catalog version. It it true that it will work for any?
Also, pg_verify_chechsums will stop on short reads. But we do not stop on wrong checksum, may be we should not stop on short reads either?

I agree with all of Heikki's comments.

Besides these I have no other questions, patch looks good.

Best regards, Andrey Borodin.

#87

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andrey Borodin (#86)

Re: Online enabling of checksums

On Tue, Mar 20, 2018 at 10:29 AM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Hi!

19 марта 2018 г., в 11:27, Heikki Linnakangas <hlinnaka@iki.fi>

написал(а):

Is there no way to stop the checksum helper once it's started? That

seems rather user-unfriendly. I can imagine it being a pretty common
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to
realize that you forgot to set the cost limit, and that it's hurting
queries too much. At that point, you want to abort.
I've tried to pg_cancel_backend() and it worked.
But only if I cancel "checksum helper launcher" and then "checksum helper
worker". If I cancel helper first - it spawns new.

Magnus, Daniel, is it safe to cancel worker or launcher?

It should be perfectly safe yes. However, it's not very user friendly,
because you have to go around and cancel both (meaning you actually have to
cancel them in the right order, or a new one will be launched).

Daniel is working on a proper way to stop things.

BTW, I have some questions on pg_verify_chechsums.
It does not check catalog version. It it true that it will work for any?

Yes. The actual page format for checksums has not changed since checksums
were introduced. I have successfully used it to validate checksums on v10
clusters for example.

Also, pg_verify_chechsums will stop on short reads. But we do not stop on
wrong checksum, may be we should not stop on short reads either?

These are very different scenarios though -- it's explicitly intended to
validate checksums, and short reads is a different issue. I prefer the way
it does it now, but I am not strongly locked into that position and can be
convinced otherwise :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#88

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Heikki Linnakangas (#81)

1 attachment(s)

Re: Online enabling of checksums

On Mon, Mar 19, 2018 at 7:27 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hi,

The patch looks good to me at a high level. Some notes below. I didn't
read through the whole thread, so sorry if some of these have been
discussed already.

+void
+ShutdownChecksumHelperIfRunning(void)
+{
+       if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher
_started))
+       {
+               /* Launcher not started, so nothing to shut down */
+               return;
+       }
+
+       ereport(ERROR,
+                       (errmsg("checksum helper is currently running,
cannot disable checksums"),
+                        errhint("Restart the cluster or wait for the
worker to finish.")));
+}
Is there no way to stop the checksum helper once it's started? That seems
rather user-unfriendly. I can imagine it being a pretty common mistake to
call pg_enable_data_checksums() on a 10 TB cluster, only to realize that
you forgot to set the cost limit, and that it's hurting queries too much.
At that point, you want to abort.

Agreed. You could do it with pg_terminate_backend() but you had to do it in
the right order, etc.

Attached patch fixes this by making it possible to abort the process by
executing pg_disable_data_checksums() during the process. In this case the
live workers will abort, and the checksums will be switched off again.

+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
I find the name of the DataChecksumsEnabledOrInProgress() function a bit
long. And doing this in PageIsVerified looks a bit weird:

if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()

Agreed. We also need DataChecksumsInProgress() to make it work properly,
but that makes the names a lot more clear. Adjusted as such.

+ <literal>template0</literal> is by default not accepting

connections, to
+ enable checksums you'll need to temporarily make it accept
connections.

This was already discussed, and I agree with the other comments that it's
bad.

Do you have any opinion on the thread about how to handle that one? As in
which way to go about solving it? (The second thread linked from the CF
entry - it wasn't linked before as intended, but it is now)

+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (

+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS
'enable_data_checksums'
+  PARALLEL RESTRICTED;
pg_[enable|disable]_checksums() functions return a bool. It's not clear
to me when they would return what. I'd suggest marking them as 'void'
instead.

Agreed, changed.

--- a/src/include/storage/bufpage.h

+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION         4
#define PG_DATA_CHECKSUM_VERSION       1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION            2
This seems like a weird place for these PG_DATA_CHECKSUM_* constants.
They're not actually stored in the page header, as you might think. I think
the idea was to keep PG_DATA_CHECKSUM_VERSION close to
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk format.
I think it was a bit weird even before this patch, but now it's worse. At
least a better comment would be in order, or maybe move these elsewhere.

True, but they apply to the page level? I'm not sure about the original
reasoning to put them there,figured it wasn't the responsibility of this
patch to move them. But we can certainly do that.

But what would be an appropriate place elsewhere? First thought would be
pg_control.h, but that would then not be consistent with e.g. wal_level
(which is declared in xlog.h and not pg_control.h..

Feature request: it'd be nice to update the 'ps status' with

set_ps_display, to show a rudimentary progress indicator. Like the name and
block number of the relation being processed. It won't tell you how much is
left to go, but at least it will give a warm fuzzy feeling to the DBA that
something is happening.

In the attached patch it sets this information in pg_stat_activity. I think
that makes more sense than the ps display, and I think is more consistent
with other ways we use them (e.g. autovacuum doesn't update it's ps title
for every table, but it does update pg_stat_activity).

I didn't review the offline tool, pg_verify_checksums.

PFA a patch that takes into account these comments and we believe all other
pending ones as well.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums6.patchtext/x-patch; charset=US-ASCII; name=online_checksums6.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f59af25a6..e484267a89 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19481,6 +19481,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 585665f161..0864afb890 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..eca75d86f7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,88 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 47a6c4d895..2fc999e6fb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4653,10 +4654,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4728,12 +4725,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7769,6 +7844,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9522,6 +9607,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9949,6 +10050,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..67b9cd8127 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5e6e8a64f6..23057f9b13 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1025,6 +1025,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..71cf0818ca
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,718 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+	bool		success;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+	bool		success;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * XXX: must hold a lock on the relation preventing it from being truncated?
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = relation_open(relationId, AccessShareLock);
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			processing = ProcessDatabase(db);
+
+			if (processing == SUCCESSFUL)
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+			else
+				return;
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it. Any
+		 * database that still exists but where enabling checksums failed, is
+		 * retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			ChecksumHelperShmem->success = ABORTED;
+			break;
+		}
+		else
+			ChecksumHelperShmem->success = SUCCESSFUL;
+	}
+	list_free_deep(RelationList);
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7a7ac479c1..bfd7793f3b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -67,6 +68,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -419,6 +421,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -513,7 +526,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1674,17 +1687,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -3975,6 +3977,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0fdb42f639..4d86cf365d 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5556,6 +5556,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v r 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 74d7d59546..1e8be553cf 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -66,3 +66,7 @@ test: async-notify
 test: vacuum-reltuples
 test: timeouts
 test: vacuum-concurrent-drop
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#89

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#88)

Re: Online enabling of checksums

Hi,

I see enable_data_checksums() does this:

if (cost_limit <= 0)
ereport(ERROR,
(errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#90

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#89)

Re: Online enabling of checksums

On Mon, Mar 26, 2018 at 10:09 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

wrote:

Hi,

I see enable_data_checksums() does this:

if (cost_limit <= 0)
ereport(ERROR,
(errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?

Eh. vaccum_cost_limit cannot be set to -1 (1 is the lowest). Neither can
vacuum_cost_delay -- it is set to *0* to disable it (which is how the
cost_delay parameter is handled here as well).

Are you thinking autovacuum_vacuum_cost_limit where -1 means "use
vacuum_cost_limit"?

The reason to disallow cost_limit=0 is to avoid divide-by-zero. We could
allow -1 and have it mean "use vacuum_cost_limit", but I'm not sure how
relevant that really would be in this context?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#91

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#90)

Re: Online enabling of checksums

On 03/27/2018 08:56 AM, Magnus Hagander wrote:

On Mon, Mar 26, 2018 at 10:09 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

Hi,

I see enable_data_checksums() does this:

if (cost_limit <= 0)
ereport(ERROR,
(errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?

Eh. vaccum_cost_limit cannot be set to -1 (1 is the lowest). Neither can
vacuum_cost_delay -- it is set to *0* to disable it (which is how the
cost_delay parameter is handled here as well).

Are you thinking autovacuum_vacuum_cost_limit where -1 means "use
vacuum_cost_limit"?

The reason to disallow cost_limit=0 is to avoid divide-by-zero. We could
allow -1 and have it mean "use vacuum_cost_limit", but I'm not sure how
relevant that really would be in this context?

D'oh! You're right, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#92

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#88)

Re: Online enabling of checksums

Hi,

I've just noticed the patch does not update

src/backend/storage/page/README

which is in fact about checksums. Most of it remains valid, but it also
mentions that currently it's an initdb-time choice.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#93

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#88)

Re: Online enabling of checksums

Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Now, when a table gets dropped after BuildRelationList() does it's work,
we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

ERROR: could not open relation with OID 16632
LOG: background worker "checksumhelper worker" (PID 27152) exited
with exit code 1

Which however means the error handling in ChecksumHelperWorkerMain() has
no chance to kick in, because the bgworker dies right away. The code
looks like this:

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
= (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
ChecksumHelperShmem->success = ABORTED;
break;
}
else
ChecksumHelperShmem->success = SUCCESSFUL;
}
list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

The launcher however does not realize anything went wrong, because the
flag still says SUCCESSFUL. And so it merrily switches checksums to
"on", leading to this on the rest of the relations:

WARNING: page verification failed, calculated checksum 58644 but
expected 0
ERROR: invalid page in block 0 of relation base/16631/16653

Yikes!

IMHO this error handling is broken by design - two things need to
happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

ChecksumHelperResult local_success = SUCCESFUL;

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
= (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
local_success = ABORTED;
break;
}
}
list_free_deep(RelationList);

ChecksumHelperShmem->success = local_success;

That is, leave the flag in shred memory set to FAILED until the very
last moment, and only when everything went fine set it to SUCCESSFUL.

BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

ERROR: failed to enable checksums in "test", giving up (attempts
639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
'success' field which is actually unused (and uninitialized).

But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

BEGIN;
CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
-- run pg_enable_data_checksums() from another session
SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

test=# SELECT COUNT(*) FROM t;
WARNING: page verification failed, calculated checksum 27170 but
expected 0
ERROR: invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled anyway.

Not great, I guess :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#94

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#93)

1 attachment(s)

Re: Online enabling of checksums

On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Thanks!

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Uh, yes. I could've sworn we had code for that, but I fully agree with your
assessment that it's not there :)

Now, when a table gets dropped after BuildRelationList() does it's work,

we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

ERROR: could not open relation with OID 16632
LOG: background worker "checksumhelper worker" (PID 27152) exited
with exit code 1

Yeah. We need to trap that error an not crash and burn.

Which however means the error handling in ChecksumHelperWorkerMain() has

no chance to kick in, because the bgworker dies right away. The code
looks like this:

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
= (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
ChecksumHelperShmem->success = ABORTED;
break;
}
else
ChecksumHelperShmem->success = SUCCESSFUL;
}
list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

Indeed, that's just a very simple bug. It shouldn't be set to SUCCESSFUL
until *all* tables have been processed. I believe the code needs to be this:

IMHO this error handling is broken by design - two things need to

happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

Not entirely sure what you mean with "even a simple does oid exist" means?
I mean, if we check for the file, that won't help us -- there will still be
a tiny race between the check and us opening it won't it?

However, we have try_relation_open(). Which is documented as:
* Same as relation_open, except return NULL instead of failing
* if the relation does not exist.

So I'm pretty sure it's just a matter of using try_relation_open() instead,
and checking for NULL?

(b) But just handling dropped relations is not enough, because I could

simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

<snip>
And now I see your code, which was below-fold when I first read. After
having writing a very similar fix myself. I'm glad this code looks mostly
identical to what I suggested above, so I think that's a good solution.

BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

No, I don't like that at all. We need to handle them gracefully, by
skipping them - crash and restart is not acceptable for something that
common.

Which however reminds me I've also ran into a bug in the automated retry

system, because you may get messages like this:

ERROR: failed to enable checksums in "test", giving up (attempts
639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

Eh. I don't have that "(attempts" part in my code at all. Is that either
from some earlier version of the patch, or did you add that yourself for
testing?

BTW both ChecksumHelperRelation and ChecksumHelperDatabase have

'success' field which is actually unused (and uninitialized).

Correct. These are old leftovers from the "partial restart" logic, and
should be removed.

But wait - there is more ;-) BuildRelationList is using heap_beginscan

with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

BEGIN;
CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
-- run pg_enable_data_checksums() from another session
SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

Ugh. Interestingly enough I just put that on my TODO list *yesterday* that
I forgot to check that specific case :/

test=# SELECT COUNT(*) FROM t;

WARNING: page verification failed, calculated checksum 27170 but
expected 0
ERROR: invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

I was thinking of that as one way to deal with it, yes.

I guess a reasonable way to do that would be to do it as part of
BuildRelationList() -- basically have that one wait until there is no other
running transaction in that specific database before we take the snapshot?

A first thought I had was to try to just take an access exclusive lock on
pg_class for a very short time, but a transaction that does create table
doesn't actually keep it's lock on that table so there is no conflict.

And if you try this with a temporary table (not hidden in transaction,

so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make things safer
-- in that checksums won't be enabled when not put on all pages.

I have attached a patch that fixes the "easy" ones per your first comments.
No solution for the open-transaction yet, but I wanted to put the rest out
there -- especially if you have automated tests you can send it through.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums7.patchtext/x-patch; charset=US-ASCII; name=online_checksums7.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c46fb..dcdd17ec0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19507,6 +19507,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..eca75d86f7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,88 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..67b9cd8127 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..76879a74d6
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,738 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * Process a single relation based on oid.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			processing = ProcessDatabase(db);
+
+			if (processing == SUCCESSFUL)
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+			else
+				/* aborted */
+				return;
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it. Any
+		 * database that still exists but where enabling checksums failed, is
+		 * retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+		ChecksumHelperShmem->success = ABORTED;
+	else
+		ChecksumHelperShmem->success = SUCCESSFUL;
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4ffc8451ca..14aa575733 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4101,6 +4103,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 90d994c71a..f601af71be 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5574,6 +5574,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index d3965fe73f..1ac9ade0a8 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -68,3 +68,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#95

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#94)

Re: Online enabling of checksums

Hi,

On 03/31/2018 02:02 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

...

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

Not entirely sure what you mean with "even a simple does oid exist"
means? I mean, if we check for the file, that won't help us -- there
will still be a tiny race between the check and us opening it won't it?

I meant to say "even a simple check if the OID still exists" but it was
a bit too late / not enough caffeine issue. You're right there would be
a tiny window of race condition - it'd be much shorter, possibly enough
to make the error+restart approach acceptable.

However, we have try_relation_open(). Which is documented as:
*Same as relation_open, except return NULL instead of failing
*if the relation does not exist.

So I'm pretty sure it's just a matter of using try_relation_open()
instead, and checking for NULL?

Oh, right. I thought we had a relation_open variant that handles this,
but have been looking for one with missing_ok flag and so I missed this.
try_relation_open should do the trick when it comes to dropped tables.

(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

<snip>
And now I see your code, which was below-fold when I first read. After
having writing a very similar fix myself. I'm glad this code looks
mostly identical to what I suggested above, so I think that's a good
solution.

Good ;-)

BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

No, I don't like that at all. We need to handle them gracefully, by
skipping them - crash and restart is not acceptable for something that
common.

Yeah, I was just thinking aloud.

Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

ERROR: failed to enable checksums in "test", giving up (attempts
639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

Eh. I don't have that "(attempts" part in my code at all. Is that either
from some earlier version of the patch, or did you add that yourself for
testing?

Apologies, you're right I tweaked the message a bit (just adding the
number of attempts to it). The logic however remains the same, and the
bug is real.

But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

BEGIN;
CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
-- run pg_enable_data_checksums() from another session
SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

Ugh. Interestingly enough I just put that on my TODO list *yesterday*
that I forgot to check that specific case :/

But I was faster in reporting it ;-)

test=# SELECT COUNT(*) FROM t;
WARNING: page verification failed, calculated checksum 27170 but
expected 0
ERROR: invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

I was thinking of that as one way to deal with it, yes.

I guess a reasonable way to do that would be to do it as part of
BuildRelationList() -- basically have that one wait until there is no
other running transaction in that specific database before we take the
snapshot?

A first thought I had was to try to just take an access exclusive lock
on pg_class for a very short time, but a transaction that does create
table doesn't actually keep it's lock on that table so there is no conflict.

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something.

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled
anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make things
safer -- in that checksums won't be enabled when not put on all pages.

Sure, the outcome with checksums enabled incorrectly is a consequence of
bogus status, and fixing that will prevent that. But that wasn't my main
point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might be
even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring the
local buffers. If the other session does some changes, it'll overwrite
our changes, this time with the correct checksums. But it seems pretty
dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary tables,
I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running transactions,
i.e. wait until all temporary tables (that we've seen at the beginning)
disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting for,
so that the admins can go and interrupt them manually.

I have attached a patch that fixes the "easy" ones per your first
comments. No solution for the open-transaction yet, but I wanted to put
the rest out there -- especially if you have automated tests you can
send it through.

I don't have automated tests, but I'll take a look.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#96

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#95)

1 attachment(s)

Re: Online enabling of checksums

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/31/2018 02:02 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

But wait - there is more ;-) BuildRelationList is using

heap_beginscan

with the regular snapshot, so it does not see uncommitted

transactions.

So if you do this:

BEGIN;
CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
-- run pg_enable_data_checksums() from another session
SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't

have

checksums updated and the cluster will get checksums enabled. Which
means this:

Ugh. Interestingly enough I just put that on my TODO list *yesterday*
that I forgot to check that specific case :/

But I was faster in reporting it ;-)

Indeed you were :)

test=# SELECT COUNT(*) FROM t;
WARNING: page verification failed, calculated checksum 27170 but
expected 0
ERROR: invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for

all

running transactions to end, before starting the work.

I was thinking of that as one way to deal with it, yes.

I guess a reasonable way to do that would be to do it as part of
BuildRelationList() -- basically have that one wait until there is no
other running transaction in that specific database before we take the
snapshot?

A first thought I had was to try to just take an access exclusive lock
on pg_class for a very short time, but a transaction that does create
table doesn't actually keep it's lock on that table so there is no

conflict.

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something.

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after that
will have checksums on it, so we should be fine.

PFA one that does this.

And if you try this with a temporary table (not hidden in transaction,

so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled
anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make things
safer -- in that checksums won't be enabled when not put on all pages.

Sure, the outcome with checksums enabled incorrectly is a consequence of
bogus status, and fixing that will prevent that. But that wasn't my main
point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might be
even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring the
local buffers. If the other session does some changes, it'll overwrite
our changes, this time with the correct checksums. But it seems pretty
dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary tables,
I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running transactions,
i.e. wait until all temporary tables (that we've seen at the beginning)
disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting for,
so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of the bigger issues here is temporary tables are *session* scope and
not transaction, so we'd actually need the other session to finish, not
just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period. Instead,
build a list of any temporary tables that existed when the worker started
in this particular database (basically anything that we got in our scan).
Once we have processed the complete database, keep re-scanning pg_class
until those particular tables are gone (search by oid).

That means that any temporary tables that are created *while* we are
processing a database are ignored, but they should already be receiving
checksums.

It definitely leads to a potential issue with long running temp tables. But
as long as we look at the *actual tables* (by oid), we should be able to
handle long-running sessions once they have dropped their temp tables.

Does that sound workable to you?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums8.patchtext/x-patch; charset=US-ASCII; name=online_checksums8.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c46fb..dcdd17ec0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19507,6 +19507,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..eca75d86f7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,88 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..67b9cd8127 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..b3e50a1fbd
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,784 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * Process a single relation based on oid.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (oldest is %d)", oldestxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Wait for all existing transactions to finish. This will make sure that
+	 * we can see all tables all databases, so we don't miss any.
+	 * Anything created after this point is known to have checksums on
+	 * all pages already, so we don't have to care about those.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			processing = ProcessDatabase(db);
+
+			if (processing == SUCCESSFUL)
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+			else
+				/* aborted */
+				return;
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it. Any
+		 * database that still exists but where enabling checksums failed, is
+		 * retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+		ChecksumHelperShmem->success = ABORTED;
+	else
+		ChecksumHelperShmem->success = SUCCESSFUL;
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4ffc8451ca..14aa575733 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4101,6 +4103,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 90d994c71a..f601af71be 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5574,6 +5574,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index d3965fe73f..1ac9ade0a8 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -68,3 +68,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#97

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#96)

Re: Online enabling of checksums

On 03/31/2018 05:05 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

...

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after
that will have checksums on it, so we should be fine.

PFA one that does this.

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled
anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make things
safer -- in that checksums won't be enabled when not put on all pages.

Sure, the outcome with checksums enabled incorrectly is a consequence of
bogus status, and fixing that will prevent that. But that wasn't my main
point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might be
even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring the
local buffers. If the other session does some changes, it'll overwrite
our changes, this time with the correct checksums. But it seems pretty
dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary tables,
I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running transactions,
i.e. wait until all temporary tables (that we've seen at the beginning)
disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting for,
so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of the bigger issues here is temporary tables are *session* scope
and not transaction, so we'd actually need the other session to finish,
not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period.
Instead, build a list of any temporary tables that existed when the
worker started in this particular database (basically anything that we
got in our scan). Once we have processed the complete database, keep
re-scanning pg_class until those particular tables are gone (search by oid).

That means that any temporary tables that are created *while* we are
processing a database are ignored, but they should already be receiving
checksums.

It definitely leads to a potential issue with long running temp tables.
But as long as we look at the *actual tables* (by oid), we should be
able to handle long-running sessions once they have dropped their temp
tables.

Does that sound workable to you?

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#98

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#97)

1 attachment(s)

Re: Online enabling of checksums

On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 03/31/2018 05:05 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

...

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE

SUBSCRIPTION

does pretty much exactly the same thing (and CREATE INDEX

CONCURRENTLY

too, to some extent). So we have a precedent / working code we can

copy.

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after
that will have checksums on it, so we should be fine.

PFA one that does this.

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

Yeah, makes sense. Updated.

And if you try this with a temporary table (not hidden in

transaction,

so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without

updating

the result for the launcher, so checksums may end up being

enabled

anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make things
safer -- in that checksums won't be enabled when not put on all

pages.

Sure, the outcome with checksums enabled incorrectly is a

consequence of

bogus status, and fixing that will prevent that. But that wasn't my

main

point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might

be

even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring

the

local buffers. If the other session does some changes, it'll

overwrite

our changes, this time with the correct checksums. But it seems

pretty

dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary

tables,

I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running

transactions,

i.e. wait until all temporary tables (that we've seen at the

beginning)

disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting

for,

so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of the bigger issues here is temporary tables are *session* scope
and not transaction, so we'd actually need the other session to finish,
not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period.
Instead, build a list of any temporary tables that existed when the
worker started in this particular database (basically anything that we
got in our scan). Once we have processed the complete database, keep
re-scanning pg_class until those particular tables are gone (search by

oid).

That means that any temporary tables that are created *while* we are
processing a database are ignored, but they should already be receiving
checksums.

It definitely leads to a potential issue with long running temp tables.
But as long as we look at the *actual tables* (by oid), we should be
able to handle long-running sessions once they have dropped their temp
tables.

Does that sound workable to you?

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

Yeah, agreed. I think it makes sense to show the *number* of temp tables.
That's also a predictable amount of information -- logging all temp tables
may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed that too.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums9.patchtext/x-patch; charset=US-ASCII; name=online_checksums9.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c46fb..dcdd17ec0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19507,6 +19507,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 22e6893211..c81c87ef41 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -210,6 +210,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index d27fb414f7..db4f4167e3 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -283,6 +283,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..123638bc3f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,99 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the checksummer process
+    to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..67b9cd8127 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..84a8cc865b
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,881 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * Process a single relation based on oid.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Wait for all existing transactions to finish. This will make sure that
+	 * we can see all tables all databases, so we don't miss any.
+	 * Anything created after this point is known to have checksums on
+	 * all pages already, so we don't have to care about those.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			processing = ProcessDatabase(db);
+
+			if (processing == SUCCESSFUL)
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+			else
+				/* aborted */
+				return;
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it. Any
+		 * database that still exists but where enabling checksums failed, is
+		 * retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, HeapTupleGetOid(tup));
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4ffc8451ca..14aa575733 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4101,6 +4103,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 3b35835abe..8c11060a2f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -26,6 +26,7 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
+	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 90d994c71a..f601af71be 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5574,6 +5574,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index d3965fe73f..1ac9ade0a8 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -68,3 +68,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#99

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Magnus Hagander (#98)

1 attachment(s)

Re: Online enabling of checksums

On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <
tomas.vondra@2ndquadrant.com> wrote:

On 03/31/2018 05:05 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>

wrote:

...

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE

SUBSCRIPTION

does pretty much exactly the same thing (and CREATE INDEX

CONCURRENTLY

too, to some extent). So we have a precedent / working code we can

copy.

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after
that will have checksums on it, so we should be fine.

PFA one that does this.

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

Yeah, makes sense. Updated.

And if you try this with a temporary table (not hidden in

transaction,

so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without

updating

the result for the launcher, so checksums may end up being

enabled

anyway.

Yeah, there will be plenty of side-effect issues from that
crash-with-wrong-status case. Fixing that will at least make

things

safer -- in that checksums won't be enabled when not put on all

pages.

Sure, the outcome with checksums enabled incorrectly is a

consequence of

bogus status, and fixing that will prevent that. But that wasn't my

main

point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This

might be

even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error

comes

from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary

tables

use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring

the

local buffers. If the other session does some changes, it'll

overwrite

our changes, this time with the correct checksums. But it seems

pretty

dangerous (I mean, what if they're writing stuff while we're

updating

the checksums? Considering the various short-cuts for temporary

tables,

I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running

transactions,

i.e. wait until all temporary tables (that we've seen at the

beginning)

disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting

for,

so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of the bigger issues here is temporary tables are *session* scope
and not transaction, so we'd actually need the other session to finish,
not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period.
Instead, build a list of any temporary tables that existed when the
worker started in this particular database (basically anything that we
got in our scan). Once we have processed the complete database, keep
re-scanning pg_class until those particular tables are gone (search by

oid).

That means that any temporary tables that are created *while* we are
processing a database are ignored, but they should already be receiving
checksums.

It definitely leads to a potential issue with long running temp tables.
But as long as we look at the *actual tables* (by oid), we should be
able to handle long-running sessions once they have dropped their temp
tables.

Does that sound workable to you?

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

Yeah, agreed. I think it makes sense to show the *number* of temp tables.
That's also a predictable amount of information -- logging all temp tables
may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed that too.

PFA a rebase on top of the just committed verify-checksums patch.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums10.patchtext/x-patch; charset=US-ASCII; name=online_checksums10.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5abb1c46fb..dcdd17ec0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19507,6 +19507,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4e01e5641c..7cd6ee85dc 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -211,6 +211,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ef2270c467..78c214f1b0 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -284,6 +284,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..123638bc3f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,99 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the checksummer process
+    to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..67b9cd8127 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,50 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..84a8cc865b
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,881 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+/*
+ * Maximum number of times to try enabling checksums in a specific database
+ * before giving up.
+ */
+#define MAX_ATTEMPTS 4
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+	int			attempts;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * Enable checksums in a single relation/fork.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * Process a single relation based on oid.
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Wait for all existing transactions to finish. This will make sure that
+	 * we can see all tables all databases, so we don't miss any.
+	 * Anything created after this point is known to have checksums on
+	 * all pages already, so we don't have to care about those.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	while (true)
+	{
+		List	   *remaining = NIL;
+		ListCell   *lc,
+				   *lc2;
+		List	   *CurrentDatabases = NIL;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			processing = ProcessDatabase(db);
+
+			if (processing == SUCCESSFUL)
+			{
+				pfree(db->dbname);
+				pfree(db);
+
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				remaining = lappend(remaining, db);
+			}
+			else
+				/* aborted */
+				return;
+		}
+		list_free(DatabaseList);
+
+		DatabaseList = remaining;
+		remaining = NIL;
+
+		/*
+		 * DatabaseList now has all databases not yet processed. This can be
+		 * because they failed for some reason, or because the database was
+		 * dropped between us getting the database list and trying to process
+		 * it. Get a fresh list of databases to detect the second case where
+		 * the database was dropped before we had started processing it. Any
+		 * database that still exists but where enabling checksums failed, is
+		 * retried for a limited number of times before giving up. Any
+		 * database that remains in failed state after the retries expire will
+		 * fail the entire operation.
+		 */
+		CurrentDatabases = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			bool		found = false;
+
+			foreach(lc2, CurrentDatabases)
+			{
+				ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+				if (db->dboid == db2->dboid)
+				{
+					/* Database still exists, time to give up? */
+					if (++db->attempts > MAX_ATTEMPTS)
+					{
+						/* Disable checksums on cluster, because we failed */
+						SetDataChecksumsOff();
+
+						ereport(ERROR,
+								(errmsg("failed to enable checksums in \"%s\", giving up.",
+										db->dbname)));
+					}
+					else
+						/* Try again with this db */
+						remaining = lappend(remaining, db);
+					found = true;
+					break;
+				}
+			}
+			if (!found)
+			{
+				ereport(LOG,
+						(errmsg("database \"%s\" has been dropped, skipping",
+								db->dbname)));
+				pfree(db->dbname);
+				pfree(db);
+			}
+		}
+
+		/* Free the extra list of databases */
+		foreach(lc, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+		list_free(CurrentDatabases);
+
+		/* All databases processed yet? */
+		if (remaining == NIL || list_length(remaining) == 0)
+			break;
+
+		DatabaseList = remaining;
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, HeapTupleGetOid(tup));
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c5b83232fd..5f26a03769 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1377,7 +1377,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4ffc8451ca..14aa575733 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4101,6 +4103,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 3b35835abe..8c11060a2f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -26,6 +26,7 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
+	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 90d994c71a..f601af71be 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5574,6 +5574,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99dd7c6bdb..31900cb920 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -72,3 +72,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#100

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#99)

1 attachment(s)

Re: Online enabling of checksums

On 04/03/2018 02:05 PM, Magnus Hagander wrote:

On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net
<mailto:magnus@hagander.net>> wrote:

On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
wrote:

On 03/31/2018 05:05 PM, Magnus Hagander wrote:

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com

<mailto:tomas.vondra@2ndquadrant.com>
<mailto:tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>>> wrote:

...

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after
that will have checksums on it, so we should be fine.

PFA one that does this.

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed,
knowing
the oldest running XID is useless for that. If we log
waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions
somehow.

Yeah, makes sense. Updated.

> And if you try this with a temporary table (not

hidden in transaction,

> so the bgworker can see it), the worker will fail

with this:

>
> ERROR: cannot access temporary tables of other

sessions

>
> But of course, this is just another way how to crash

without updating

> the result for the launcher, so checksums may end up

being enabled

> anyway.
>
>
> Yeah, there will be plenty of side-effect issues from that
> crash-with-wrong-status case. Fixing that will at least

make things

> safer -- in that checksums won't be enabled when not put

on all pages.

>

Sure, the outcome with checksums enabled incorrectly is a

consequence of

bogus status, and fixing that will prevent that. But that

wasn't my main

point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables

gracefully, so

that it does not terminate the bgworker like this at all.

This might be

even bigger issue than dropped relations, considering that

temporary

tables are pretty common part of applications (and it also

includes

CREATE/DROP).

For some clusters it might mean the online checksum

enabling would

crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the

error comes

from ReadBufferExtended. And it's not a matter of simply

creating a

ReadBuffer variant without that error check, because

temporary tables

use local buffers.

I wonder if we could just go and set the checksums anyway,

ignoring the

local buffers. If the other session does some changes,

it'll overwrite

our changes, this time with the correct checksums. But it

seems pretty

dangerous (I mean, what if they're writing stuff while

we're updating

the checksums? Considering the various short-cuts for

temporary tables,

I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running

transactions,

i.e. wait until all temporary tables (that we've seen at

the beginning)

disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're

waiting for,

so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty

simple.

Making the worker simply ignore temporary tables would also be

easy.

One of the bigger issues here is temporary tables are

*session* scope

and not transaction, so we'd actually need the other session

to finish,

not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period.
Instead, build a list of any temporary tables that existed

when the

worker started in this particular database (basically anything

that we

got in our scan). Once we have processed the complete

database, keep

re-scanning pg_class until those particular tables are gone

(search by oid).

That means that any temporary tables that are created *while*

we are

processing a database are ignored, but they should already be

receiving

checksums.

It definitely leads to a potential issue with long running

temp tables.

But as long as we look at the *actual tables* (by oid), we

should be

able to handle long-running sessions once they have dropped

their temp

tables.

Does that sound workable to you?

Yes, that's pretty much what I meant by 'wait until all
temporary tables
disappear'. Again, we need to make it easy to determine which
OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary
tables. There
can be many of them, and in most cases the connection/session is
managed
by the application, so the only thing you can do is kill the
connection.

Yeah, agreed. I think it makes sense to show the *number* of temp
tables. That's also a predictable amount of information -- logging
all temp tables may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed
that too.

PFA a rebase on top of the just committed verify-checksums patch.

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570

I guess those are not the droids we're looking for?

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

test=# select pg_enable_data_checksums(10, 10);
ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------

(1 row)

test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled

At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

checksums-tweaks.difftext/x-patch; name=checksums-tweaks.diffDownload

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 123638b..b0d082a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -257,7 +257,7 @@
   </para>
 
   <para>
-   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
    protection in order to recover data. To do this, temporarily set the configuration parameter
    <xref linkend="guc-ignore-checksum-failure" />.
   </para>
@@ -287,15 +287,17 @@
     be visible to the process enabling checksums. It will also, for each database,
     wait for all pre-existing temporary tables to get removed before it finishes.
     If long-lived temporary tables are used in the application it may be necessary
-    to terminate these application connections to allow the checksummer process
-    to complete.
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
    </para>
 
    <para>
     If the cluster is stopped while in <literal>inprogress</literal> mode, for
     any reason, then this process must be restarted manually. To do this,
     re-execute the function <function>pg_enable_data_checksums()</function>
-    once the cluster has been restarted.
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
    </para>
 
    <note>
@@ -317,6 +319,7 @@
     <para>
      <literal>template0</literal> is by default not accepting connections, to
      enable checksums you'll need to temporarily make it accept connections.
+     See <xref linkend="sql-alterdatabase" /> for details.
     </para>
    </note>
 
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 67b9cd8..b76b268 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -700,6 +700,11 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(xtime);
 }
 
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
 Datum
 disable_data_checksums(PG_FUNCTION_ARGS)
 {
@@ -718,6 +723,12 @@ disable_data_checksums(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
 Datum
 enable_data_checksums(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 84a8cc8..e1c34e8 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -653,6 +653,8 @@ BuildDatabaseList(void)
 
 		db->dboid = HeapTupleGetOid(tup);
 		db->dbname = pstrdup(NameStr(pgdb->datname));
+		elog(WARNING, "attempts = %d", db->attempts);
+		db->attempts = 0;
 
 		DatabaseList = lappend(DatabaseList, db);

#101

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#99)

Re: Online enabling of checksums

Hi,

On Tue, Apr 03, 2018 at 02:05:04PM +0200, Magnus Hagander wrote:

PFA a rebase on top of the just committed verify-checksums patch.

For the record, I am on vacation this week and won't be able to do
further in-depth review or testing of this patch before the end of the
commitfest, sorry.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

#102

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tomas Vondra (#100)

1 attachment(s)

Re: Online enabling of checksums

On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On 04/03/2018 02:05 PM, Magnus Hagander wrote:

On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net
<mailto:magnus@hagander.net>> wrote:

On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
wrote:

And if you try this with a temporary table (not

hidden in transaction,

so the bgworker can see it), the worker will fail

with this:

ERROR: cannot access temporary tables of other

sessions

But of course, this is just another way how to crash

without updating

the result for the launcher, so checksums may end up

being enabled

anyway.

Yeah, there will be plenty of side-effect issues from

that

crash-with-wrong-status case. Fixing that will at least

make things

safer -- in that checksums won't be enabled when not put

on all pages.

Sure, the outcome with checksums enabled incorrectly is a

consequence of

bogus status, and fixing that will prevent that. But that

wasn't my main

point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables

gracefully, so

that it does not terminate the bgworker like this at all.

This might be

even bigger issue than dropped relations, considering that

temporary

tables are pretty common part of applications (and it also

includes

CREATE/DROP).

For some clusters it might mean the online checksum

enabling would

crash+restart infinitely (well, until reaching

MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the

error comes

from ReadBufferExtended. And it's not a matter of simply

creating a

ReadBuffer variant without that error check, because

temporary tables

use local buffers.

I wonder if we could just go and set the checksums anyway,

ignoring the

local buffers. If the other session does some changes,

it'll overwrite

our changes, this time with the correct checksums. But it

seems pretty

dangerous (I mean, what if they're writing stuff while

we're updating

the checksums? Considering the various short-cuts for

temporary tables,

I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running

transactions,

i.e. wait until all temporary tables (that we've seen at

the beginning)

disappear. But we're starting to wait on more and more

stuff.

If we do this, we should clearly log which backends we're

waiting for,

so that the admins can go and interrupt them manually.

Yeah, waiting for all transactions at the beginning is pretty

simple.

Making the worker simply ignore temporary tables would also be

easy.

One of the bigger issues here is temporary tables are

*session* scope

and not transaction, so we'd actually need the other session

to finish,

not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker,

period.

Instead, build a list of any temporary tables that existed

when the

worker started in this particular database (basically anything

that we

got in our scan). Once we have processed the complete

database, keep

re-scanning pg_class until those particular tables are gone

(search by oid).

That means that any temporary tables that are created *while*

we are

processing a database are ignored, but they should already be

receiving

checksums.

It definitely leads to a potential issue with long running

temp tables.

But as long as we look at the *actual tables* (by oid), we

should be

able to handle long-running sessions once they have dropped

their temp

tables.

Does that sound workable to you?

Yes, that's pretty much what I meant by 'wait until all
temporary tables
disappear'. Again, we need to make it easy to determine which
OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary
tables. There
can be many of them, and in most cases the connection/session is
managed
by the application, so the only thing you can do is kill the
connection.

Yeah, agreed. I think it makes sense to show the *number* of temp
tables. That's also a predictable amount of information -- logging
all temp tables may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed
that too.

PFA a rebase on top of the just committed verify-checksums patch.

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570

I guess those are not the droids we're looking for?

When looking at that and after a quick discussion, we just decided it's
better to completely remove the retry logic. As you mentioned in some
earlier mail, we had all this logic to retry databases (unlikely) but not
relations (likely). Attached patch simplifies it to only detect the
"database was dropped" case (which is fine), and consider every other kind
of failure a permanent one and just not turn on checksums in those cases.

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets

initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
(The whole memset-to-zero)

test=# select pg_enable_data_checksums(10, 10);

ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------

(1 row)

test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled

Turns out that wasn't the problem. The problem was that we *set* it before
erroring out with the "does not allow connections", but never cleared it.
In that case, it would be listed as launcher-is-running even though the
launcher was never started.

Basically the check for datallowconn was put in the wrong place. That check
should go away completely once we merge (because we should also merge the
part that allows us to bypass it), but for now I have moved the check to
the correct place.

At which point the only thing you can do is restarting the cluster,

which seems somewhat unnecessary. But perhaps it's intentional?

Not at all.

Attached is a diff with a couple of minor comment tweaks, and correct

initialization of the attempts field.

Thanks. This is included in the attached update, along with the above fixes
and some other small touches from Daniel.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums10.patchtext/x-patch; charset=US-ASCII; name=online_checksums10.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a1efc14cf..5d463566aa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19528,6 +19528,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4e01e5641c..7cd6ee85dc 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -211,6 +211,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ef2270c467..78c214f1b0 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -284,6 +284,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..b0d082af8b 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,102 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+   <para>
+    Since checksums are clusterwide, all databases will get checksums added.
+    It is thus important that the worker is allowed to connect to all databases
+    for the process to succeed, see the <xref linkend="sql-alterdatabase"/>
+    command for how to allow connections.
+   </para>
+
+   <note>
+    <para>
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.
+     See <xref linkend="sql-alterdatabase" /> for details.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..b76b268891 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,61 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..bb3ddefd74
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,879 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	HeapTuple	tup;
+	Relation	rel;
+	HeapScanDesc scan;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	/*
+	 * Check that all databases allow connections.  This will be re-checked
+	 * when we build the list of databases to work on, the point of
+	 * duplicating this is to catch any databases we won't be able to open
+	 * while we can still send an error message to the client.
+	 */
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to wal even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	List	   *remaining = NIL;
+	ListCell   *lc,
+			   *lc2;
+	List	   *CurrentDatabases = NIL;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Wait for all existing transactions to finish. This will make sure that
+	 * we can see all tables all databases, so we don't miss any.
+	 * Anything created after this point is known to have checksums on
+	 * all pages already, so we don't have to care about those.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	foreach(lc, DatabaseList)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		ChecksumHelperResult processing;
+
+		processing = ProcessDatabase(db);
+
+		if (processing == SUCCESSFUL)
+		{
+			pfree(db->dbname);
+			pfree(db);
+
+			if (ChecksumHelperShmem->process_shared_catalogs)
+
+				/*
+				 * Now that one database has completed shared catalogs, we
+				 * don't have to process them again.
+				 */
+				ChecksumHelperShmem->process_shared_catalogs = false;
+		}
+		else if (processing == FAILED)
+		{
+			/*
+			 * Put failed databases on the remaining list.
+			 */
+			remaining = lappend(remaining, db);
+		}
+		else
+			/* aborted */
+			return;
+	}
+	list_free(DatabaseList);
+
+	/*
+	 * remaining now has all databases not yet processed. This can be
+	 * because they failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process
+	 * it. Get a fresh list of databases to detect the second case where
+	 * the database was dropped before we had started processing it. If a
+	 * database still exists, but enabling checksums failed then we fail
+	 * the entire checksumming process and exit with an error.
+	 */
+	CurrentDatabases = BuildDatabaseList();
+
+	foreach(lc, remaining)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool found = false;
+
+		foreach(lc2, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+
+		pfree(db->dbname);
+		pfree(db);
+	}
+	list_free(remaining);
+
+	/* Free the extra list of databases */
+	foreach(lc, CurrentDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+		pfree(db->dbname);
+		pfree(db);
+	}
+	list_free(CurrentDatabases);
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+		pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		if (!pgdb->datallowconn)
+			ereport(ERROR,
+					(errmsg("database \"%s\" does not allow connections",
+							NameStr(pgdb->datname)),
+					 errhint("Allow connections using ALTER DATABASE and try again.")));
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, HeapTupleGetOid(tup));
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1a0bae4c15..8ba29453b9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1383,7 +1383,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 260ae264d8..71c2b4eff1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4111,6 +4113,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 3b35835abe..8c11060a2f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -26,6 +26,7 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
+	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 9bf20c059b..d1f563ca12 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5579,6 +5579,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..4f8e0ab8f8
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Prep cluster for enabling checksums
+$node_master->safe_psql('postgres',
+	"ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;");
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99dd7c6bdb..31900cb920 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -72,3 +72,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..a9da0d74c7
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,48 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..7bfbe8d448
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,71 @@
+setup
+{
+	ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#103

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#102)

Re: Online enabling of checksums

On 4/5/18 11:07 AM, Magnus Hagander wrote:

On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

...

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570

I guess those are not the droids we're looking for?

When looking at that and after a quick discussion, we just decided it's
better to completely remove the retry logic. As you mentioned in some
earlier mail, we had all this logic to retry databases (unlikely) but
not relations (likely). Attached patch simplifies it to only detect the
"database was dropped" case (which is fine), and consider every other
kind of failure a permanent one and just not turn on checksums in those
cases.

OK, works for me.

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
(The whole memset-to-zero)

OK, seems fine now.

test=# select pg_enable_data_checksums(10, 10);
ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------

(1 row)

test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled

Turns out that wasn't the problem. The problem was that we *set* it
before erroring out with the "does not allow connections", but never
cleared it. In that case, it would be listed as launcher-is-running even
though the launcher was never started.

Basically the check for datallowconn was put in the wrong place. That
check should go away completely once we merge (because we should also
merge the part that allows us to bypass it), but for now I have moved
the check to the correct place.

Ah, OK. I was just guessing.

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

Thanks. This is included in the attached update, along with the above
fixes and some other small touches from Daniel.

This patch version seems fine to me. I'm inclined to mark it RFC.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#104

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Tomas Vondra (#103)

Re: Online enabling of checksums

5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

This patch version seems fine to me. I'm inclined to mark it RFC.

+1
The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I found is that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic behavior? If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for worker too?

I think it worth to capitalize WAL in "re-write the page to wal".

Best regards, Andrey Borodin.

#105

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andrey Borodin (#104)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com>

написал(а):

This patch version seems fine to me. I'm inclined to mark it RFC.

+1
The patch works fine for me. I've tried different combinations of backend
cancelation and the only suspicious thing I found is that you can start
multiple workers by cancelling launcher and not cancelling worker. Is it
problematic behavior? If we run pg_enable_data_checksums() it checks for
existing launcher for a reason, m.b. it should check for worker too?

I don't think it's a problem in itself -- it will cause pointless work, but
not actually cause any poroblems I think (whereas duplicate launchers could
cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

I think it worth to capitalize WAL in "re-write the page to wal".

In the comment, right? Yeah, easy fix.,

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#106

Andrey Borodin

x4mmm@yandex-team.ru

almost 8 years ago

In reply to: Magnus Hagander (#105)

Re: Online enabling of checksums

5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net> написал(а):

On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):

This patch version seems fine to me. I'm inclined to mark it RFC.

+1
The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I found is that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic behavior? If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for worker too?

I don't think it's a problem in itself -- it will cause pointless work, but not actually cause any poroblems I think (whereas duplicate launchers could cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
pid | backend_type
-------+-----------------------
98587 | checksumhelper worker
98589 | checksumhelper worker
98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling pg_disable_data_checksums(), but this is extremely stupid for a user.

Best regards, Andrey Borodin.

#107

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andrey Borodin (#106)

1 attachment(s)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net>

написал(а):

On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru>

wrote:

5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com>

написал(а):

This patch version seems fine to me. I'm inclined to mark it RFC.

+1
The patch works fine for me. I've tried different combinations of

backend cancelation and the only suspicious thing I found is that you can
start multiple workers by cancelling launcher and not cancelling worker. Is
it problematic behavior? If we run pg_enable_data_checksums() it checks for
existing launcher for a reason, m.b. it should check for worker too?

I don't think it's a problem in itself -- it will cause pointless work,

but not actually cause any poroblems I think (whereas duplicate launchers
could cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
pid | backend_type
-------+-----------------------
98587 | checksumhelper worker
98589 | checksumhelper worker
98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling
pg_disable_data_checksums(), but this is extremely stupid for a user

Ah, didn't consider query cancel. I'm not sure how much we should actually
care about it, but it's easy enough to trap that signal and just do a clean
shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch
just landed (which also removes some docs).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums11.patchtext/x-patch; charset=US-ASCII; name=online_checksums11.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 122f034f17..6257563eaa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19540,6 +19540,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4e01e5641c..7cd6ee85dc 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -211,6 +211,7 @@ Complete list of usable sgml source files in this directory.
 <!ENTITY pgResetwal         SYSTEM "pg_resetwal.sgml">
 <!ENTITY pgRestore          SYSTEM "pg_restore.sgml">
 <!ENTITY pgRewind           SYSTEM "pg_rewind.sgml">
+<!ENTITY pgVerifyChecksums  SYSTEM "pg_verify_checksums.sgml">
 <!ENTITY pgtestfsync        SYSTEM "pgtestfsync.sgml">
 <!ENTITY pgtesttiming       SYSTEM "pgtesttiming.sgml">
 <!ENTITY pgupgrade          SYSTEM "pgupgrade.sgml">
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
new file mode 100644
index 0000000000..463ecd5e1b
--- /dev/null
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -0,0 +1,112 @@
+<!--
+doc/src/sgml/ref/pg_verify_checksums.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="pgverifychecksums">
+ <indexterm zone="pgverifychecksums">
+  <primary>pg_verify_checksums</primary>
+ </indexterm>
+
+ <refmeta>
+  <refentrytitle><application>pg_verify_checksums</application></refentrytitle>
+  <manvolnum>1</manvolnum>
+  <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>pg_verify_checksums</refname>
+  <refpurpose>verify data checksums in an offline <productname>PostgreSQL</productname> database cluster</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+  <cmdsynopsis>
+   <command>pg_verify_checksums</command>
+   <arg choice="opt"><replaceable class="parameter">option</replaceable></arg>
+   <arg choice="opt"><arg choice="opt"><option>-D</option></arg> <replaceable class="parameter">datadir</replaceable></arg>
+  </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1 id="r1-app-pg_verify_checksums-1">
+  <title>Description</title>
+  <para>
+   <command>pg_verify_checksums</command> verifies data checksums in a PostgreSQL
+   cluster. It must be run against a cluster that's offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Options</title>
+
+   <para>
+    The following command-line options are available:
+
+    <variablelist>
+
+     <varlistentry>
+      <term><option>-r <replaceable>relfilenode</replaceable></option></term>
+      <listitem>
+       <para>
+        Only validate checksums in the relation with specified relfilenode.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-f</option></term>
+      <listitem>
+       <para>
+        Force check even if checksums are disabled on cluster.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-d</option></term>
+      <listitem>
+       <para>
+        Enable debug output. Lists all checked blocks and their checksum.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+       <term><option>-V</option></term>
+       <term><option>--version</option></term>
+       <listitem>
+       <para>
+       Print the <application>pg_verify_checksums</application> version and exit.
+       </para>
+       </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><option>-?</option></term>
+      <term><option>--help</option></term>
+       <listitem>
+        <para>
+         Show help about <application>pg_verify_checksums</application> command line
+         arguments, and exit.
+        </para>
+       </listitem>
+      </varlistentry>
+    </variablelist>
+   </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+  <para>
+    Can only be run when the server is offline.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>See Also</title>
+
+  <simplelist type="inline">
+   <member><xref linkend="checksums"/></member>
+  </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ef2270c467..78c214f1b0 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -284,6 +284,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc2d4161..cbb26785e6 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,87 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..813b2afaac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)
 	{
@@ -4673,10 +4674,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7789,6 +7864,16 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9542,6 +9627,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9969,6 +10070,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 316edbe3c5..b76b268891 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -24,6 +24,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -698,3 +699,61 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	if (!StartChecksumHelperLauncher(cost_delay, cost_limit))
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper process")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e9e188682f..5d567d0cf9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1027,6 +1027,11 @@ CREATE OR REPLACE FUNCTION pg_stop_backup (
   RETURNS SETOF record STRICT VOLATILE LANGUAGE internal as 'pg_stop_backup_v2'
   PARALLEL RESTRICTED;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..3b6fd7c2bc
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,855 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	pg_atomic_flag launcher_started;
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	if (ChecksumHelperShmem->abort)
+	{
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been cancelled")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	if (!pg_atomic_test_set_flag(&ChecksumHelperShmem->launcher_started))
+	{
+		/* Failed to set means somebody else started */
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+		return false;
+	}
+
+	return true;
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;
+
+	/*
+	 * We don't need an atomic variable for aborting, setting it multiple
+	 * times will not change the handling.
+	 */
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+	pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	ChecksumHelperShmem->abort = true;
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	List	   *remaining = NIL;
+	ListCell   *lc,
+			   *lc2;
+	List	   *CurrentDatabases = NIL;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	/*
+	 * Wait for all existing transactions to finish. This will make sure that
+	 * we can see all tables all databases, so we don't miss any.
+	 * Anything created after this point is known to have checksums on
+	 * all pages already, so we don't have to care about those.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	/*
+	 * If there are no databases at all to checksum, we can exit immediately
+	 * as there is no work to do.
+	 */
+	if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+		return;
+
+	foreach(lc, DatabaseList)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		ChecksumHelperResult processing;
+
+		processing = ProcessDatabase(db);
+
+		if (processing == SUCCESSFUL)
+		{
+			pfree(db->dbname);
+			pfree(db);
+
+			if (ChecksumHelperShmem->process_shared_catalogs)
+
+				/*
+				 * Now that one database has completed shared catalogs, we
+				 * don't have to process them again.
+				 */
+				ChecksumHelperShmem->process_shared_catalogs = false;
+		}
+		else if (processing == FAILED)
+		{
+			/*
+			 * Put failed databases on the remaining list.
+			 */
+			remaining = lappend(remaining, db);
+		}
+		else
+			/* aborted */
+			return;
+	}
+	list_free(DatabaseList);
+
+	/*
+	 * remaining now has all databases not yet processed. This can be
+	 * because they failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process
+	 * it. Get a fresh list of databases to detect the second case where
+	 * the database was dropped before we had started processing it. If a
+	 * database still exists, but enabling checksums failed then we fail
+	 * the entire checksumming process and exit with an error.
+	 */
+	CurrentDatabases = BuildDatabaseList();
+
+	foreach(lc, remaining)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool found = false;
+
+		foreach(lc2, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+
+		pfree(db->dbname);
+		pfree(db);
+	}
+	list_free(remaining);
+
+	/* Free the extra list of databases */
+	foreach(lc, CurrentDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+
+		pfree(db->dbname);
+		pfree(db);
+	}
+	list_free(CurrentDatabases);
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+		pg_atomic_init_flag(&ChecksumHelperShmem->launcher_started);
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, HeapTupleGetOid(tup));
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 96ba216387..83328a2766 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4125,6 +4125,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1a0bae4c15..8ba29453b9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1383,7 +1383,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 6eb0d5527e..84183f8203 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -198,6 +198,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f408d56270 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned off and on
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..790e4b860a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1168,7 +1168,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1195,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 260ae264d8..71c2b4eff1 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -68,6 +69,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -420,6 +422,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -514,7 +527,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -1684,17 +1697,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4111,6 +4113,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/Makefile b/src/bin/Makefile
index 3b35835abe..8c11060a2f 100644
--- a/src/bin/Makefile
+++ b/src/bin/Makefile
@@ -26,6 +26,7 @@ SUBDIRS = \
 	pg_test_fsync \
 	pg_test_timing \
 	pg_upgrade \
+	pg_verify_checksums \
 	pg_waldump \
 	pgbench \
 	psql \
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..4bb2b7e6ec 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/.gitignore b/src/bin/pg_verify_checksums/.gitignore
new file mode 100644
index 0000000000..d1dcdaf0dd
--- /dev/null
+++ b/src/bin/pg_verify_checksums/.gitignore
@@ -0,0 +1 @@
+/pg_verify_checksums
diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile
new file mode 100644
index 0000000000..d16261571f
--- /dev/null
+++ b/src/bin/pg_verify_checksums/Makefile
@@ -0,0 +1,36 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/bin/pg_verify_checksums
+#
+# Copyright (c) 1998-2018, PostgreSQL Global Development Group
+#
+# src/bin/pg_verify_checksums/Makefile
+#
+#-------------------------------------------------------------------------
+
+PGFILEDESC = "pg_verify_checksums - verify data checksums in an offline cluster"
+PGAPPICON=win32
+
+subdir = src/bin/pg_verify_checksums
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS= pg_verify_checksums.o $(WIN32RES)
+
+all: pg_verify_checksums
+
+pg_verify_checksums: $(OBJS) | submake-libpgport
+	$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X)
+
+install: all installdirs
+	$(INSTALL_PROGRAM) pg_verify_checksums$(X) '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+installdirs:
+	$(MKDIR_P) '$(DESTDIR)$(bindir)'
+
+uninstall:
+	rm -f '$(DESTDIR)$(bindir)/pg_verify_checksums$(X)'
+
+clean distclean maintainer-clean:
+	rm -f pg_verify_checksums$(X) $(OBJS)
+	rm -rf tmp_check
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
new file mode 100644
index 0000000000..e37f39bd2a
--- /dev/null
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -0,0 +1,315 @@
+/*
+ * pg_verify_checksums
+ *
+ * Verifies page level checksums in an offline cluster
+ *
+ *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
+ *
+ *	src/bin/pg_verify_checksums/pg_verify_checksums.c
+ */
+
+#define FRONTEND 1
+
+#include "postgres.h"
+#include "catalog/pg_control.h"
+#include "common/controldata_utils.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+
+#include "pg_getopt.h"
+
+
+static int64 files = 0;
+static int64 blocks = 0;
+static int64 badblocks = 0;
+static ControlFileData *ControlFile;
+
+static char *only_relfilenode = NULL;
+static bool debug = false;
+
+static const char *progname;
+
+static void
+usage()
+{
+	printf(_("%s verifies page level checksums in offline PostgreSQL database cluster.\n\n"), progname);
+	printf(_("Usage:\n"));
+	printf(_("  %s [OPTION] [DATADIR]\n"), progname);
+	printf(_("\nOptions:\n"));
+	printf(_(" [-D] DATADIR    data directory\n"));
+	printf(_("  -f,            force check even if checksums are disabled\n"));
+	printf(_("  -r relfilenode check only relation with specified relfilenode\n"));
+	printf(_("  -d             debug output, listing all checked blocks\n"));
+	printf(_("  -V, --version  output version information, then exit\n"));
+	printf(_("  -?, --help     show this help, then exit\n"));
+	printf(_("\nIf no data directory (DATADIR) is specified, "
+			 "the environment variable PGDATA\nis used.\n\n"));
+	printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n"));
+}
+
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
+
+static void
+scan_file(char *fn, int segmentno)
+{
+	char		buf[BLCKSZ];
+	PageHeader	header = (PageHeader) buf;
+	int			f;
+	int			blockno;
+
+	f = open(fn, 0);
+	if (f < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %m\n"), progname, fn);
+		exit(1);
+	}
+
+	files++;
+
+	for (blockno = 0;; blockno++)
+	{
+		uint16		csum;
+		int			r = read(f, buf, BLCKSZ);
+
+		if (r == 0)
+			break;
+		if (r != BLCKSZ)
+		{
+			fprintf(stderr, _("%s: short read of block %d in file \"%s\", got only %d bytes\n"),
+					progname, blockno, fn, r);
+			exit(1);
+		}
+		blocks++;
+
+		csum = pg_checksum_page(buf, blockno + segmentno * RELSEG_SIZE);
+		if (csum != header->pd_checksum)
+		{
+			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
+						progname, fn, blockno, csum, header->pd_checksum);
+			badblocks++;
+		}
+		else if (debug)
+			fprintf(stderr, _("%s: checksum verified in file \"%s\", block %d: %X\n"),
+					progname, fn, blockno, csum);
+	}
+
+	close(f);
+}
+
+static void
+scan_directory(char *basedir, char *subdir)
+{
+	char		path[MAXPGPATH];
+	DIR		   *dir;
+	struct dirent *de;
+
+	snprintf(path, MAXPGPATH, "%s/%s", basedir, subdir);
+	dir = opendir(path);
+	if (!dir)
+	{
+		fprintf(stderr, _("%s: could not open directory \"%s\": %m\n"),
+				progname, path);
+		exit(1);
+	}
+	while ((de = readdir(dir)) != NULL)
+	{
+		char		fn[MAXPGPATH];
+		struct stat st;
+
+		if (skipfile(de->d_name))
+			continue;
+
+		snprintf(fn, MAXPGPATH, "%s/%s", path, de->d_name);
+		if (lstat(fn, &st) < 0)
+		{
+			fprintf(stderr, _("%s: could not stat file \"%s\": %m\n"),
+					progname, fn);
+			exit(1);
+		}
+		if (S_ISREG(st.st_mode))
+		{
+			char	   *forkpath,
+					   *segmentpath;
+			int			segmentno = 0;
+
+			/*
+			 * Cut off at the segment boundary (".") to get the segment number
+			 * in order to mix it into the checksum. Then also cut off at the
+			 * fork boundary, to get the relfilenode the file belongs to for
+			 * filtering.
+			 */
+			segmentpath = strchr(de->d_name, '.');
+			if (segmentpath != NULL)
+			{
+				*segmentpath++ = '\0';
+				segmentno = atoi(segmentpath);
+				if (segmentno == 0)
+				{
+					fprintf(stderr, _("%s: invalid segment number %d in filename \"%s\"\n"),
+							progname, segmentno, fn);
+					exit(1);
+				}
+			}
+
+			forkpath = strchr(de->d_name, '_');
+			if (forkpath != NULL)
+				*forkpath++ = '\0';
+
+			if (only_relfilenode && strcmp(only_relfilenode, de->d_name) != 0)
+				/* Relfilenode not to be included */
+				continue;
+
+			scan_file(fn, segmentno);
+		}
+		else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode))
+			scan_directory(path, de->d_name);
+	}
+	closedir(dir);
+}
+
+int
+main(int argc, char *argv[])
+{
+	char	   *DataDir = NULL;
+	bool		force = false;
+	int			c;
+	bool		crc_ok;
+
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_verify_checksums"));
+
+	progname = get_progname(argv[0]);
+
+	if (argc > 1)
+	{
+		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+		{
+			usage();
+			exit(0);
+		}
+		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+		{
+			puts("pg_verify_checksums (PostgreSQL) " PG_VERSION);
+			exit(0);
+		}
+	}
+
+	while ((c = getopt(argc, argv, "D:fr:d")) != -1)
+	{
+		switch (c)
+		{
+			case 'd':
+				debug = true;
+				break;
+			case 'D':
+				DataDir = optarg;
+				break;
+			case 'f':
+				force = true;
+				break;
+			case 'r':
+				if (atoi(optarg) <= 0)
+				{
+					fprintf(stderr, _("%s: invalid relfilenode: %s\n"), progname, optarg);
+					exit(1);
+				}
+				only_relfilenode = pstrdup(optarg);
+				break;
+			default:
+				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+				exit(1);
+		}
+	}
+
+	if (DataDir == NULL)
+	{
+		if (optind < argc)
+			DataDir = argv[optind++];
+		else
+			DataDir = getenv("PGDATA");
+
+		/* If no DataDir was specified, and none could be found, error out */
+		if (DataDir == NULL)
+		{
+			fprintf(stderr, _("%s: no data directory specified\n"), progname);
+			fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
+			exit(1);
+		}
+	}
+
+	/* Complain if any arguments remain */
+	if (optind < argc)
+	{
+		fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
+				progname, argv[optind]);
+		fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+				progname);
+		exit(1);
+	}
+
+	/* Check if cluster is running */
+	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
+	if (!crc_ok)
+	{
+		fprintf(stderr, _("%s: pg_control CRC value is incorrect.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->state != DB_SHUTDOWNED &&
+		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
+	{
+		fprintf(stderr, _("%s: cluster must be shut down to verify checksums.\n"), progname);
+		exit(1);
+	}
+
+	if (ControlFile->data_checksum_version == 0 && !force)
+	{
+		fprintf(stderr, _("%s: data checksums are not enabled in cluster.\n"), progname);
+		exit(1);
+	}
+
+	/* Scan all files */
+	scan_directory(DataDir, "global");
+	scan_directory(DataDir, "base");
+	scan_directory(DataDir, "pg_tblspc");
+
+	printf(_("Checksum scan completed\n"));
+	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
+	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
+	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+
+	if (badblocks > 0)
+		return 1;
+
+	return 0;
+}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..f21870c644 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index a5c074642f..0530fd1a43 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index edf212fcf0..02be8a5fbd 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5583,6 +5583,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3996 ( pg_disable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 0 0 2278 "" _null_ _null_ _null_ _null_ _null_ disable_data_checksums _null_ _null_ _null_ ));
+DESCR("disable data checksums");
+DATA(insert OID = 3998 ( pg_enable_data_checksums		PGNSP PGUID 12 1 0 0 0 f f f t f v s 2 0 2278 "23 23" _null_ _null_ "{cost_delay,cost_limit}" _null_ _null_ enable_data_checksums _null_ _null_ _null_ ));
+DESCR("enable data checksums");
+
 /* collation management functions */
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f t f v u 1 0 23 "4089" _null_ _null_ _null_ _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..289bf2a935
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 433755e279..902ec29e2a 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..6a45356b6b
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,101 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
diff --git a/src/test/isolation/expected/checksum_cancel.out b/src/test/isolation/expected/checksum_cancel.out
new file mode 100644
index 0000000000..c449e7b6cc
--- /dev/null
+++ b/src/test/isolation/expected/checksum_cancel.out
@@ -0,0 +1,27 @@
+Parsed test spec with 2 sessions
+
+starting permutation: c_verify_checksums_off r_seqread c_enable_checksums c_verify_checksums_inprogress c_disable_checksums c_wait_checksums_off
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums(1000);
+pg_enable_data_checksums
+
+               
+step c_verify_checksums_inprogress: SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step c_disable_checksums: SELECT pg_disable_data_checksums();
+pg_disable_data_checksums
+
+               
+step c_wait_checksums_off: SELECT test_checksums_off();
+test_checksums_off
+
+t              
diff --git a/src/test/isolation/expected/checksum_enable.out b/src/test/isolation/expected/checksum_enable.out
new file mode 100644
index 0000000000..0a68f47023
--- /dev/null
+++ b/src/test/isolation/expected/checksum_enable.out
@@ -0,0 +1,27 @@
+Parsed test spec with 3 sessions
+
+starting permutation: c_verify_checksums_off w_insert100k r_seqread c_enable_checksums c_wait_for_checksums c_verify_checksums_on
+step c_verify_checksums_off: SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
+step w_insert100k: SELECT insert_1k(100);
+insert_1k      
+
+t              
+step r_seqread: SELECT * FROM reader_loop();
+reader_loop    
+
+t              
+step c_enable_checksums: SELECT pg_enable_data_checksums();
+pg_enable_data_checksums
+
+               
+step c_wait_for_checksums: SELECT test_checksums_on();
+test_checksums_on
+
+t              
+step c_verify_checksums_on: SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+?column?       
+
+t              
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 99dd7c6bdb..31900cb920 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -72,3 +72,7 @@ test: timeouts
 test: vacuum-concurrent-drop
 test: predicate-gist
 test: predicate-gin
+# The checksum_enable suite will enable checksums for the cluster so should
+# not run before anything expecting the cluster to have checksums turned off
+test: checksum_cancel
+test: checksum_enable
diff --git a/src/test/isolation/specs/checksum_cancel.spec b/src/test/isolation/specs/checksum_cancel.spec
new file mode 100644
index 0000000000..3466a749d2
--- /dev/null
+++ b/src/test/isolation/specs/checksum_cancel.spec
@@ -0,0 +1,47 @@
+setup
+{
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		PERFORM pg_sleep(1);
+		SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+		enabled boolean;
+	BEGIN
+		FOR counter IN 1..100 LOOP
+			PERFORM count(a) FROM t1;
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_off();
+
+	DROP TABLE t1;
+}
+
+session "reader"
+step "r_seqread"						{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"			{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"				{ SELECT pg_enable_data_checksums(1000); }
+step "c_disable_checksums"				{ SELECT pg_disable_data_checksums(); }
+step "c_verify_checksums_inprogress"	{ SELECT setting = 'inprogress' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_wait_checksums_off"				{ SELECT test_checksums_off(); }
+
+permutation "c_verify_checksums_off" "r_seqread" "c_enable_checksums" "c_verify_checksums_inprogress" "c_disable_checksums" "c_wait_checksums_off"
diff --git a/src/test/isolation/specs/checksum_enable.spec b/src/test/isolation/specs/checksum_enable.spec
new file mode 100644
index 0000000000..ba85dd6176
--- /dev/null
+++ b/src/test/isolation/specs/checksum_enable.spec
@@ -0,0 +1,70 @@
+setup
+{
+	CREATE TABLE t1 (a serial, b integer, c text);
+	INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');
+
+	CREATE OR REPLACE FUNCTION insert_1k(iterations int) RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..$1 LOOP
+			INSERT INTO t1 (b, c) VALUES (
+				generate_series(1, 1000),
+				array_to_string(array(select chr(97 + (random() * 25)::int) from generate_series(1,250)), '')
+			);
+			PERFORM pg_sleep(0.1);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
+	DECLARE
+		enabled boolean;
+	BEGIN
+		LOOP
+			SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
+			IF enabled THEN
+				EXIT;
+			END IF;
+			PERFORM pg_sleep(1);
+		END LOOP;
+		RETURN enabled;
+	END;
+	$$ LANGUAGE plpgsql;
+	
+	CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
+	DECLARE
+		counter integer;
+	BEGIN
+		FOR counter IN 1..30 LOOP
+			PERFORM count(a) FROM t1;
+			PERFORM pg_sleep(0.2);
+		END LOOP;
+		RETURN True;
+	END;
+	$$ LANGUAGE plpgsql;
+}
+
+teardown
+{
+	DROP FUNCTION reader_loop();
+	DROP FUNCTION test_checksums_on();
+	DROP FUNCTION insert_1k(int);
+
+	DROP TABLE t1;
+}
+
+session "writer"
+step "w_insert100k"				{ SELECT insert_1k(100); }
+
+session "reader"
+step "r_seqread"				{ SELECT * FROM reader_loop(); }
+
+session "checksums"
+step "c_verify_checksums_off"	{ SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+step "c_enable_checksums"		{ SELECT pg_enable_data_checksums(); }
+step "c_wait_for_checksums"		{ SELECT test_checksums_on(); }
+step "c_verify_checksums_on"	{ SELECT setting = 'on' FROM pg_catalog.pg_settings WHERE name = 'data_checksums'; }
+
+permutation "c_verify_checksums_off" "w_insert100k" "r_seqread" "c_enable_checksums" "c_wait_for_checksums" "c_verify_checksums_on"

#108

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Magnus Hagander (#107)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 7:30 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net>

написал(а):

On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru>

wrote:

5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com>

написал(а):

This patch version seems fine to me. I'm inclined to mark it RFC.

+1
The patch works fine for me. I've tried different combinations of

backend cancelation and the only suspicious thing I found is that you can
start multiple workers by cancelling launcher and not cancelling worker. Is
it problematic behavior? If we run pg_enable_data_checksums() it checks for
existing launcher for a reason, m.b. it should check for worker too?

I don't think it's a problem in itself -- it will cause pointless work,

but not actually cause any poroblems I think (whereas duplicate launchers
could cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where
backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type
~'checks';
pid | backend_type
-------+-----------------------
98587 | checksumhelper worker
98589 | checksumhelper worker
98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling
pg_disable_data_checksums(), but this is extremely stupid for a user

Ah, didn't consider query cancel. I'm not sure how much we should
actually care about it, but it's easy enough to trap that signal and just
do a clean shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch
just landed (which also removes some docs).

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#109

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#108)

Re: Online enabling of checksums

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool. We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

- Andres

#110

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Andres Freund (#109)

Re: Online enabling of checksums

On 2018-04-05 13:12:08 -0700, Andres Freund wrote:

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool. We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

And even worse, without even announcing an intent to commit and giving
people a chance to object.

Greetings,

Andres Freund

#111

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#110)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-05 13:12:08 -0700, Andres Freund wrote:

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments

and

a catversion bump.

Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool. We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

I can understand those arguments, and if that's the view of others as well,
I can of course revert that.

And even worse, without even announcing an intent to commit and giving

people a chance to object.

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such a
rule.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#112

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#111)

Re: Online enabling of checksums

On April 5, 2018 1:20:52 PM PDT, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund <andres@anarazel.de>
wrote:

And even worse, without even announcing an intent to commit and giving

people a chance to object.

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such
a
rule.

The more debatable a decision is, the more important it IMO becomes to give people a chance to object. Don't think there needs to be a hard rule to always announce an intent to commit.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#113

Joshua D. Drake

jd@commandprompt.com

almost 8 years ago

In reply to: Andres Freund (#109)

Re: Online enabling of checksums

On 04/05/2018 01:12 PM, Andres Freund wrote:

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool. We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

Perhaps I am missing something but there has been a lot of public
discussion on this feature for the last 7 weeks of which you barely
participated. I certainly understand wanting some notice before commit
but there has been lots of discussion, multiple people publicly
commenting on the patch and Magnus has been very receptive to all
feedback (that I have seen). Perhaps we are being a sensitive because of
another patch that is actually ramrodding the process and we need to
take a step back?

Thanks,

- Andres

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
*** A fault and talent of mine is to tell it exactly how it is. ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
***** Unless otherwise stated, opinions are my own. *****

#114

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Joshua D. Drake (#113)

Re: Online enabling of checksums

On 2018-04-05 13:51:41 -0700, Joshua D. Drake wrote:

On 04/05/2018 01:12 PM, Andres Freund wrote:

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool. We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

Perhaps I am missing something but there has been a lot of public discussion
on this feature for the last 7 weeks of which you barely participated.

I've commented weeks ago about my doubts, and Robert concurred:
http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com

I certainly understand wanting some notice before commit but there has
been lots of discussion, multiple people publicly commenting on the
patch and Magnus has been very receptive to all feedback (that I have
seen).

It's perfectly reasonable to continue review / improvement cycles of a
patch, even if it's not going to get in the current release. What does
that have to do with what I am concerned about?

Perhaps we are being a sensitive because of another patch that is
actually ramrodding the process and we need to take a step back?

No. See link above.

Please don't use "we" in this childishness implying fashion.

- Andres

#115

Joshua D. Drake

jd@commandprompt.com

almost 8 years ago

In reply to: Andres Freund (#114)

Re: Online enabling of checksums

On 04/05/2018 02:01 PM, Andres Freund wrote:

No. See link above.

Please don't use "we" in this childishness implying fashion.

The term "we" was used on purpose because I too was annoyed and I was
trying to be objective, non-combative and productive.

#116

Peter Geoghegan

pg@bowt.ie

almost 8 years ago

In reply to: Andres Freund (#112)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote:

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such
a
rule.

The rules cannot possibly anticipate every situation or subtlety. The
letter of the law is a slightly distinct thing to its spirit.

The more debatable a decision is, the more important it IMO becomes to give people a chance to object. Don't think there needs to be a hard rule to always announce an intent to commit.

Andres' remarks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.

--
Peter Geoghegan

#117

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Peter Geoghegan (#116)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 11:09 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote:

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such
a
rule.

The rules cannot possibly anticipate every situation or subtlety. The
letter of the law is a slightly distinct thing to its spirit.

The more debatable a decision is, the more important it IMO becomes to

give people a chance to object. Don't think there needs to be a hard rule
to always announce an intent to commit.

+1

Andres' remarks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.

I would argue that this is a pretty isolated patch. A large majority of the
code is completely isolated from the rest. And I would argue that this
reduces the risk of the patch substantially.

(And yes, we've noticed it's failing in isolationtester on a number of
boxes -- Daniel is currently investigating)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#118

Peter Geoghegan

pg@bowt.ie

almost 8 years ago

In reply to: Joshua D. Drake (#113)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 1:51 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

Perhaps I am missing something but there has been a lot of public discussion
on this feature for the last 7 weeks of which you barely participated. I
certainly understand wanting some notice before commit but there has been
lots of discussion, multiple people publicly commenting on the patch and
Magnus has been very receptive to all feedback (that I have seen). Perhaps
we are being a sensitive because of another patch that is actually
ramrodding the process and we need to take a step back?

I wish it was just one patch. I can think of several.

--
Peter Geoghegan

#119

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#108)

Re: Online enabling of checksums

Hi,

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them. The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

Just plonking a barrier into DataChecksumsNeedWrite() etc is a
possibility, but it's also not free...

Greetings,

Andres Freund

#120

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#119)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments

and

a catversion bump.

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them. The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums inprogress"
flag and continues to write without checksums?

That seems like quite a long time to me -- is that really a problem? I'm
guessing you're seeing a shorter path between the two that I can't see
right now (I'll blame the late evning...)?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#121

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#120)

Re: Online enabling of checksums

Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:

On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them. The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums inprogress"
flag and continues to write without checksums?

Yes. I think there are some variations of that, but yes, that's pretty
much it.

That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.

I'm guessing you're seeing a shorter path between the two that I can't
see right now (I'll blame the late evning...)?

I don't think it matters terribly much how long that path is.

Greetings,

Andres Freund

#122

Tom Lane

tgl@sss.pgh.pa.us

almost 8 years ago

In reply to: Magnus Hagander (#108)

Re: Online enabling of checksums

Magnus Hagander <magnus@hagander.net> writes:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

crake is not happy --- it's failing cross-version upgrade tests because:

Performing Consistency Checks
-----------------------------
Checking cluster versions ok

old cluster uses data checksums but the new one does not
Failure, exiting

This seems to indicate that you broke pg_upgrade's detection of
checksumming status, or that this patch changed the default
checksum state (which it surely isn't described as doing).

regards, tom lane

#123

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Tom Lane (#122)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 11:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

I have now pushed this latest version with some minor text adjustments

and

a catversion bump.

crake is not happy --- it's failing cross-version upgrade tests because:

Performing Consistency Checks
-----------------------------
Checking cluster versions ok

old cluster uses data checksums but the new one does not
Failure, exiting

This seems to indicate that you broke pg_upgrade's detection of
checksumming status, or that this patch changed the default
checksum state (which it surely isn't described as doing).

It's not supposed to.

Without checking into it (just about off to bed now), one guess is that
it's actually a leftover from a previous stage -- what state is the cluster
actually in when it does that upgrade? Because the specific checksums tests
do leave their cluster with checksums on, which I think would perhaps be
the outcome of the testmodules-install-check-C test. The actual definition
of those tests are somewhere in the buildfarm client code, right?

In that case, the easy fix is probably to have the checksums tests actually
turn off the checksums again when they're done.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#124

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Magnus Hagander (#117)

Re: Online enabling of checksums

On 05 Apr 2018, at 23:13, Magnus Hagander <magnus@hagander.net> wrote:

(And yes, we've noticed it's failing in isolationtester on a number of boxes -- Daniel is currently investigating)

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set. When running without atomics, don’t we
need to do something like the below diff to make these APIs match? :

--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
 bool
 pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
 {
-       return TAS((slock_t *) &ptr->sema);
+       return TAS((slock_t *) &ptr->sema) == 0;
 }

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

AFAICT there are no other callers of this than the online checksum code, and
this is not executed by the tests when running without atomics, which could
explain why nothing else is broken.

Before continuing the debugging, does this theory hold any water? This isn’t
code I’m deeply familiar with so would appreciate any pointers.

cheers ./daniel

#125

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#121)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:

On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de>

wrote:

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them. The window is going to
be at most one missed checksum per process (as the unlocking of the

page

is a barrier) and is probably not easy to hit, but that's dangerous
enough.

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums

inprogress"

flag and continues to write without checksums?

Yes. I think there are some variations of that, but yes, that's pretty
much it.

That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.

Oh, that's not my intention either -- I just wanted to make sure I was
thinking about the same issue you were.

Since you know a lot more about that type of interlocks than I do :) We
already wait for all running transactions to finish before we start doing
anything. Obviously transactions != buffer writes (and we have things like
the checkpointer/bgwriter to consider). Is there something else that we
could safely just *wait* for? I have no problem whatsoever if this is a
long wait (given the total time). I mean to the point of "what if we just
stick a sleep(10) in there" level waiting.

Or can that somehow be cleanly solved using some of the new atomic
operators? Or is that likely to cause the same kind of overhead as throwing
a barrier in there?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#126

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Magnus Hagander (#125)

Re: Online enabling of checksums

On 04/06/2018 11:25 AM, Magnus Hagander wrote:

On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres@anarazel.de
<mailto:andres@anarazel.de>> wrote:

Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:

On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them. The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums inprogress"
flag and continues to write without checksums?

Yes. I think there are some variations of that, but yes, that's pretty
much it.

That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.

Oh, that's not my intention either -- I just wanted to make sure I
was thinking about the same issue you were.

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

So I'm not sure I see the problem here. But perhaps LockBufHdr is not a
memory barrier?

Since you know a lot more about that type of interlocks than I do :) We
already wait for all running transactions to finish before we start
doing anything. Obviously transactions != buffer writes (and we have
things like the checkpointer/bgwriter to consider). Is there something
else that we could safely just *wait* for? I have no problem whatsoever
if this is a long wait (given the total time). I mean to the point of
"what if we just stick a sleep(10) in there" level waiting.

Or can that somehow be cleanly solved using some of the new atomic
operators? Or is that likely to cause the same kind of overhead as
throwing a barrier in there?

Perhaps the easiest thing we could do is walk shared buffers and do
LockBufHdr/UnlockBufHdr, which would guarantee no session is the process
of writing out a buffer with possibly stale checksum flag. Of course,
it's a bit brute-force-ish, but it's not that different from the waits
for running transactions and temporary tables.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#127

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#125)

Re: Online enabling of checksums

On 2018-04-06 11:25:59 +0200, Magnus Hagander wrote:

Since you know a lot more about that type of interlocks than I do :) We
already wait for all running transactions to finish before we start doing
anything. Obviously transactions != buffer writes (and we have things like
the checkpointer/bgwriter to consider). Is there something else that we
could safely just *wait* for? I have no problem whatsoever if this is a
long wait (given the total time). I mean to the point of "what if we just
stick a sleep(10) in there" level waiting.

I don't think anything just related to "time" is resonable in any sort
of way. On a overloaded system you can see long long stalls of processes
that have done a lot of work. Locking protocols should be correct, and
that's that.

Or can that somehow be cleanly solved using some of the new atomic
operators? Or is that likely to cause the same kind of overhead as throwing
a barrier in there?

Worse.

I wonder if we could introduce "MegaExpensiveRareMemoryBarrier()" that
goes through pgproc and signals every process with a signal that
requiers the other side to do an operation implying a memory barrier.

That's actually not hard to do (e.g. every latch operation qualifies),
the problem is that signal delivery isn't synchronous. So you need some
acknowledgement protocol.

I think you could introduce a procsignal message that does a memory
barrier and then sets PGPROC->barrierGeneration to
ProcArrayStruct->barrierGeneration. MegaExpensiveRareMemoryBarrier()
increments ProcArrayStruct->barrierGeneration, signals everyone, and
then waits till every PGPROC->barrierGeneration has surpassed
ProcArrayStruct->barrierGeneration.

Greetings,

Andres Freund

#128

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Tomas Vondra (#126)

Re: Online enabling of checksums

Hi,

On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:

Oh, that's not my intention either -- I just wanted to make sure I
was thinking about the same issue you were.

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

It's perfectly possible that some side-conditions mitigate this. What
concerns me that
a) Nobody appears to have raised this issue beforehand, besides an
unlocked read of a critical variable being a fairly obvious
issue. This kind of thing needs to be carefully thought about.
b) If there's some "side channel" interlock, it's not documented.

I noticed the issue because of an IM question about the general feature,
and I did a three minute skim and saw the read without a comment.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

Yes, they're are memory barriers.

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

Again, I'm not sure if there's some combination of issues that make this
not a problem in practice. I just *asked* if there's something
preventing this from being a problem.

The really problematic case would be if it is possible for some process
to wait long enough, without executing a barrier implying operation,
that it'd try to write out a page that the checksum worker has already
passed over.

Greetings,

Andres Freund

#129

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Andres Freund (#128)

Re: Online enabling of checksums

On 04/06/2018 07:22 PM, Andres Freund wrote:

Hi,

On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:

Oh, that's not my intention either -- I just wanted to make sure I
was thinking about the same issue you were.

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

It's perfectly possible that some side-conditions mitigate this. What
concerns me that
a) Nobody appears to have raised this issue beforehand, besides an
unlocked read of a critical variable being a fairly obvious
issue. This kind of thing needs to be carefully thought about.
b) If there's some "side channel" interlock, it's not documented.

I noticed the issue because of an IM question about the general feature,
and I did a three minute skim and saw the read without a comment.

All I can say is that I did consider this issue while reviewing the
patch, and I've managed to convince myself it's not an issue (using the
logic that I've just outlined here). Which is why I haven't raised it as
an issue, because I don't think it is.

You're right it might have been mentioned explicitly, of course.

In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
interlock. It's a pretty direct and intentional interlock, I think.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

Yes, they're are memory barriers.

Phew! ;-)

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

Again, I'm not sure if there's some combination of issues that make this
not a problem in practice. I just *asked* if there's something
preventing this from being a problem.

The really problematic case would be if it is possible for some process
to wait long enough, without executing a barrier implying operation,
that it'd try to write out a page that the checksum worker has already
passed over.

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#130

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Tomas Vondra (#129)

Re: Online enabling of checksums

On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:

In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
interlock. It's a pretty direct and intentional interlock, I think.

I mean it's a side-channel as far as DataChecksumsNeedWrite() is
concerned. You're banking on all callers using a barrier implying
operation around it.

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Greetings,

Andres Freund

#131

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Andres Freund (#130)

Re: Online enabling of checksums

On 04/06/2018 07:46 PM, Andres Freund wrote:

On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:

In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
interlock. It's a pretty direct and intentional interlock, I think.

I mean it's a side-channel as far as DataChecksumsNeedWrite() is
concerned. You're banking on all callers using a barrier implying
operation around it.

Ah, OK.

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#132

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Tomas Vondra (#131)

Re: Online enabling of checksums

On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:

On 04/06/2018 07:46 PM, Andres Freund wrote:

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.

The buffer content lock itself doesn't generally give any such guarantee
afaict, as it's required that the content lock is held in shared mode
during IO. ProcessSingleRelationFork() happens to use exclusive mode
(which could and possibly should be optimized), so that's probably
sufficient from that end though.

I'm mainly disconcerted this isn't well discussed & documented.

Greetings,

Andres Freund

#133

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Andres Freund (#114)

Re: Online enabling of checksums

On Thu, Apr 5, 2018 at 5:01 PM, Andres Freund <andres@anarazel.de> wrote:

I've commented weeks ago about my doubts, and Robert concurred:
http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com

Yes, and I expressed some previous reservations as well. Granted, my
reservations about this patch are less than for MERGE, and granted,
too, what is Magnus supposed to do about a couple of committers
expressing doubts about whether something really ought to be
committed? Is that an absolute bar? It wasn't phrased as such, nor
do we really have the authority. At the same time, those concerns
didn't generate much discussion and, at least in my case, are not
withdrawn merely because time has passed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#134

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Robert Haas (#133)

Re: Online enabling of checksums

On 2018-04-06 14:14:40 -0400, Robert Haas wrote:

and granted, too, what is Magnus supposed to do about a couple of
committers expressing doubts about whether something really ought to
be committed? Is that an absolute bar? It wasn't phrased as such,
nor do we really have the authority. At the same time, those concerns
didn't generate much discussion and, at least in my case, are not
withdrawn merely because time has passed.

Yea, I don't think they're an absolute blocker. But imo more than
sufficient reason to give the list a head up of a day or two that the
patch is intended to be committed.

I'd only pointed the message out because JD said something about me not
having participated in the earlier discussion.

Greetings,

Andres Freund

#135

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Andres Freund (#132)

Re: Online enabling of checksums

On 04/06/2018 08:13 PM, Andres Freund wrote:

On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:

On 04/06/2018 07:46 PM, Andres Freund wrote:

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.

The buffer content lock itself doesn't generally give any such
guarantee afaict, as it's required that the content lock is held in
shared mode during IO. ProcessSingleRelationFork() happens to use
exclusive mode (which could and possibly should be optimized), so
that's probably sufficient from that end though.

Yes.

I'm mainly disconcerted this isn't well discussed & documented.

Agreed, no argument here.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#136

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 8 years ago

In reply to: Andres Freund (#132)

Re: Online enabling of checksums

On 04/06/2018 08:13 PM, Andres Freund wrote:

On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:

On 04/06/2018 07:46 PM, Andres Freund wrote:

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.

The buffer content lock itself doesn't generally give any such guarantee
afaict, as it's required that the content lock is held in shared mode
during IO. ProcessSingleRelationFork() happens to use exclusive mode
(which could and possibly should be optimized), so that's probably
sufficient from that end though.

Oh, I've just realized the phrasing of my previous message was rather
confusing. What I meant to say is this:

Sure, but the checksum version is changed before the checksumhelper
launcher/worker is even started. So if you're holding the buffer lock
at that time, then the buffer is essentially guaranteed to be updated
by the worker later.

Sorry if it seemed I'm suggesting the buffer lock itself guarantees
something about the worker startup.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#137

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Daniel Gustafsson (#124)

Re: Online enabling of checksums

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set. When running without atomics, don’t we
need to do something like the below diff to make these APIs match? :
--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
bool
pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
{
-       return TAS((slock_t *) &ptr->sema);
+       return TAS((slock_t *) &ptr->sema) == 0;
}

Yes, this looks wrong.

Greetings,

Andres Freund

#138

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Daniel Gustafsson (#124)

Re: Online enabling of checksums

Hi,

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
get the buildfarm green again.

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

The flag informing whether the worker has started is cleared via an
on_shmem_exit() hook:

static void
launcher_exit(int code, Datum arg)
{
ChecksumHelperShmem->abort = false;
pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
}

but the the wait in the test is done via functions like:

CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
DECLARE
enabled boolean;
BEGIN
LOOP
SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
IF enabled THEN
EXIT;
END IF;
PERFORM pg_sleep(1);
END LOOP;
RETURN enabled;
END;
$$ LANGUAGE plpgsql;

INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');

CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
DECLARE
enabled boolean;
BEGIN
PERFORM pg_sleep(1);
SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
RETURN enabled;
END;
$$ LANGUAGE plpgsql;

which just waits for setting checksums to have finished. It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting. Then followup tests fail
because the process is still running

Also:
CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
DECLARE
counter integer;
BEGIN
FOR counter IN 1..30 LOOP
PERFORM count(a) FROM t1;
PERFORM pg_sleep(0.2);
END LOOP;
RETURN True;
END;
$$ LANGUAGE plpgsql;
}

really? Let's just force the test take at least 6s purely from
sleeping?

Greetings,

Andres Freund

#139

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Andres Freund (#137)

Re: Online enabling of checksums

On 2018-04-06 14:33:48 -0700, Andres Freund wrote:

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set. When running without atomics, don’t we
need to do something like the below diff to make these APIs match? :
--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
bool
pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
{
-       return TAS((slock_t *) &ptr->sema);
+       return TAS((slock_t *) &ptr->sema) == 0;
}
Yes, this looks wrong.

And the reason the tests fail reliably after is because the locking
model around ChecksumHelperShmem->launcher_started arguably is broken:

/* If the launcher isn't started, there is nothing to shut down */
if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
return;

This uses a non-concurrency safe primitive. Which then spuriously
triggers:

#define PG_HAVE_ATOMIC_UNLOCKED_TEST_FLAG
static inline bool
pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
{
/*
* Can't do this efficiently in the semaphore based implementation - we'd
* have to try to acquire the semaphore - so always return true. That's
* correct, because this is only an unlocked test anyway. Do this in the
* header so compilers can optimize the test away.
*/
return true;
}

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

Greetings,

Andres Freund

#140

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Andres Freund (#139)

Re: Online enabling of checksums

On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

I think we have enough evidence at this point to conclude that this
patch, along with MERGE, should be reverted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#141

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Andres Freund (#138)

Re: Online enabling of checksums

On 07 Apr 2018, at 00:23, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
get the buildfarm green again.

Sorry about that, I’m a bit slow due to various $family situations at the
moment.

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

I wonder if it may perhaps be a case of both?

It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting. Then followup tests fail
because the process is still running

I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination. checksumhelper cancellation use pg_atomic_unlocked_test_flag() to
test if it’s running when asked to abort, something which seems unsafe to do in
semaphore simulation as it always returns true. If I for debugging synthesize
a flag test with testset/clear, the tests pass green (with the upstream patch
for pg_atomic_test_set_flag_impl() applied). Cancelling with semaphore sim is
thus doomed to never work IIUC. Or it’s a red herring.

As Magnus mentioned upstream, rewriting to not use an atomic flag is probably
the best option, once the current failure is understood.

really? Let's just force the test take at least 6s purely from
sleeping?

The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester? I have failed to find a good way to repeat a step like
that, but I might be missing something.

cheers ./daniel

#142

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Daniel Gustafsson (#141)

Re: Online enabling of checksums

On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.

It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting. Then followup tests fail
because the process is still running

I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination.

atomics.h isn't important, it's just relevant for solaris (IIRC). Only
one of the failing ones lacks atomics afaict. See

On 2018-04-06 14:19:09 -0700, Andres Freund wrote:

Is that an explanation for
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16
? Those all don't seem fall under that? Having proper atomics?

So there it's the timing. Note that they didn't always fail either.

really? Let's just force the test take at least 6s purely from
sleeping?

The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester? I have failed to find a good way to repeat a step like
that, but I might be missing something.

IDK, I know this isn't right.

Greetings,

Andres Freund

#143

Daniel Gustafsson

daniel@yesql.se

almost 8 years ago

In reply to: Andres Freund (#142)

Re: Online enabling of checksums

On 07 Apr 2018, at 01:13, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.

Yep, my MUA pulled it down just as I had sent this. Thanks for confirming my
suspicion.

cheers ./daniel

#144

Stephen Frost

sfrost@snowman.net

almost 8 years ago

In reply to: Robert Haas (#140)

Re: Online enabling of checksums

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

I think we have enough evidence at this point to conclude that this
patch, along with MERGE, should be reverted.

I'm not sure that I see some issues around getting the locking correct
when starting/stopping the process is really evidence of a major problem
with the patch- yes, it obviously needs to be fixed and it would have
been unfortuante if we hadn't caught it, but a good bit of effort
appears to have been taken to ensure that exactly this is tested (which
is in part why the buildfarm is failing) and this evidently found an
existing bug, which is hardly this patch's fault.

In short, I don't agree (yet..) that this needs reverting.

I'm quite sure that bringing up MERGE in this thread and saying it needs
to be reverted without even having the committer of that feature on the
CC list isn't terribly useful and conflates two otherwise unrelated
patches and efforts. Let's try to use the threads the way they're
intended and keep our responses to each on their respective threads.

Thanks!

Stephen

#145

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Stephen Frost (#144)

Re: Online enabling of checksums

On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

I think we have enough evidence at this point to conclude that this
patch, along with MERGE, should be reverted.

I'm not sure that I see some issues around getting the locking correct
when starting/stopping the process is really evidence of a major problem
with the patch-

Note that there've been several other things mentioned in the
thread. I'll add some more in a bit.

yes, it obviously needs to be fixed and it would have been unfortuante
if we hadn't caught it, but a good bit of effort appears to have been
taken to ensure that exactly this is tested (which is in part why the
buildfarm is failing) and this evidently found an existing bug, which
is hardly this patch's fault.

THAT is the problem. It costs people that haven't been involved in the
feature time. I've friggin started debugging this because nobody else
could be bothered. Even though I'd planned to spend that time on other
patches that have been submitted far ahead in time.

I'm quite sure that bringing up MERGE in this thread and saying it needs
to be reverted without even having the committer of that feature on the
CC list isn't terribly useful and conflates two otherwise unrelated
patches and efforts.

Robert also mentioned it on the other thread, so... And no, they're not
unrelated matters, in that it's pushing half baked stuff.

Greetings,

Andres Freund

#146

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Daniel Gustafsson (#143)

Re: Online enabling of checksums

On 2018-04-07 01:27:13 +0200, Daniel Gustafsson wrote:

On 07 Apr 2018, at 01:13, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.

Yep, my MUA pulled it down just as I had sent this. Thanks for confirming my
suspicion.

But note it fails because the code using it is WRONG. There's a reason
there's "unlocked" in the name. But even leaving that aside, it probably
*still* be wrong if it were locked.

It seems *extremely* dubious that we'll allow to re-enable the checksums
while a worker is still doing stuff for the old cycle in the
background. Consider what happens if the checksum helper is currently
doing RequestCheckpoint() (something that can certainly take a *LONG*)
while. Another process disables checksums. Pages get written out
without checksums. Yet another process re-enables checksums. Helper
process does SetDataChecksumsOn(). Which succeeds because

if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
{
LWLockRelease(ControlFileLock);
elog(ERROR, "Checksums not in inprogress mode");
}

succeeds. Boom. Cluster with partially set checksums but marked as
valid.

Greetings,

Andres Freund

#147

Stephen Frost

sfrost@snowman.net

almost 8 years ago

In reply to: Andres Freund (#145)

Re: Online enabling of checksums

Andres,

* Andres Freund (andres@anarazel.de) wrote:

On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:

I'm quite sure that bringing up MERGE in this thread and saying it needs
to be reverted without even having the committer of that feature on the
CC list isn't terribly useful and conflates two otherwise unrelated
patches and efforts.

Robert also mentioned it on the other thread, so... And no, they're not
unrelated matters, in that it's pushing half baked stuff.

Apparently I've missed where he specifically called for it to be
reverted then, which is fine, and my apologies for missing it amongst
the depth of that particular thread. I do think that specifically
asking for it to be reverted is distinct from expressing concerns about
it.

Thanks!

Stephen

#148

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Andres Freund (#109)

Re: Online enabling of checksums

Here's a pass through the patch:
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());

if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)

Why does this need an in-progress specific addition? Given that we
unconditionally log FPWs for all pages, and

#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)

it's not clear what this achieves? At the very least needs a comment.

@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
* Are checksums enabled for data pages?
*/
bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
{
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
}

+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}

As previously mentioned, the locking model around this is unclear. It's
probably fine due to to surrounding memory barriers, but that needs to
be very very explicitly documented.

+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);

As I've explained in
/messages/by-id/20180406235126.d4sg4dtgicdpucnj@alap3.anarazel.de
this appears to be unsafe. There's no guarantee that the
ControlFile->data_checksum_version hasn't intermittently set to 0.

@@ -7788,6 +7863,16 @@ StartupXLOG(void)
*/
CompleteCommitTsInitialization();

+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+

Hm, so one manually has to take action here. Any reason we can't just
re-start the worker? Also, this'll be issued on standbys etc too, that
seems misleading?

+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();

Unsafe, see SetDataChecksumsOn comment above.

Shouldn't this be named with a pg_? We normally do that for SQL callable
functions, no? See the preceding functions.

This function is marked PROPARALLEL_SAFE. That can't be right?

Also, shouldn't this refuse to work if called in recovery mode? Erroring
out with
ERROR: XX000: cannot make new WAL entries during recovery
doesn't seem right.

+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)

This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
on it.

+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)

entry point sounds a bit like it's the bgw invoked routine...

+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+		return;

Using an unlocked op here without docs seems wrong. See also mail
referenced above.

+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
...
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);

Shouldn't this have a comment explaining that this is safe because all
pages that concurrently get added to the end are going to be checksummed
by the extendor?

+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);

Hm. So we'll need exclusive locks on all pages in the database. On busy
inner tables that's going to be painful. It shouldn't be too hard to
reduce this to share locks.

+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);

Hm. So we always log buffers as non-standard ones? That's going to cause
quite the increase in FPW space.

+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));

Hm. Wouldn't it be advisable to include actual relation names? It's
pretty annoying to keep track this way.

+void
+ChecksumHelperLauncherMain(Datum arg)
....

+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */

Why is this true? What if somebody runs CREATE DATABASE while the
launcher / worker are processing a different database? It'll copy the
template database on the filesystem level, and it very well might not
yet have checksums set? Afaict the second time we go through this list
that's not cought.

+		if (processing == SUCCESSFUL)
+		{
+			pfree(db->dbname);
+			pfree(db);
...

Why bother with that and other deallocations here and in other places?
Launcher's going to quit anyway...

+		else if (processing == FAILED)
+		{
+			/*
+			 * Put failed databases on the remaining list.
+			 */
+			remaining = lappend(remaining, db);

Uh. So we continue checksumming the entire cluster if one database
failed? Given there's no restart capability at all, that seems hard to
defend?

+	CurrentDatabases = BuildDatabaseList();
+
+	foreach(lc, remaining)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool		found = false;
+
+		foreach(lc2, CurrentDatabases)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;

This is an O(N^2) comparison logic? There's clusters with tens of
thousands of databases... The comparison costs are low, but still.

+		/*
+		 * Foreign tables have by definition no local storage that can be
+		 * checksummed, so skip.
+		 */
+		if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+

That strikes me as a dangerous form of test. Shouldn't we instead check
whether a relfilenode exists? I'll note that this test currently
includes plain views. It's just the smgrexist() test that makes it
work.

- Andres

#149

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Andres Freund (#148)

Re: Online enabling of checksums

On 2018-04-06 17:59:28 -0700, Andres Freund wrote:

+	/*
+	 * Create a database list.  We don't need to concern ourselves with
+	 * rebuilding this list during runtime since any database created after
+	 * this process started will be running with checksums turned on from the
+	 * start.
+	 */
Why is this true? What if somebody runs CREATE DATABASE while the
launcher / worker are processing a different database? It'll copy the
template database on the filesystem level, and it very well might not
yet have checksums set? Afaict the second time we go through this list
that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7 but expected 0
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D but expected 0

further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.

The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set. If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.

If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

Greetings,

Andres Freund

#150

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#149)

Re: Online enabling of checksums

On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
+     /*
+      * Create a database list.  We don't need to concern ourselves with
+      * rebuilding this list during runtime since any database created
after

+ * this process started will be running with checksums turned on

from the

+ * start.
+ */

Why is this true? What if somebody runs CREATE DATABASE while the
launcher / worker are processing a different database? It'll copy the
template database on the filesystem level, and it very well might not
yet have checksums set? Afaict the second time we go through this list
that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file
"/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7
but expected 0
pg_verify_checksums: checksum verification failed in file
"/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D
but expected 0

further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned
off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.

The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set. If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.

If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

While I do think that it's still definitely fixable in time for 11, I won't
argue for it.Will revert.

Note however that I'm sans-laptop until Sunday, so I will revert it then or
possibly Monday.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#151

Michael Banck

michael.banck@credativ.de

almost 8 years ago

In reply to: Magnus Hagander (#150)

Re: Online enabling of checksums

Hi,

On Sat, Apr 07, 2018 at 08:57:03AM +0200, Magnus Hagander wrote:

On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres@anarazel.de> wrote:

If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

While I do think that it's still definitely fixable in time for 11, I won't
argue for it.Will revert.

Can the pg_verify_checksums command be kept at least, please?

AFAICT this one is not contentious, the code is isolated, it's really
useful, orthogonal to online checksum activation and argueably could've
been committed as a separate patch anyway.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

#152

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Michael Banck (#151)

Re: Online enabling of checksums

Hi,

On 2018-04-07 10:14:49 +0200, Michael Banck wrote:

Can the pg_verify_checksums command be kept at least, please?

AFAICT this one is not contentious, the code is isolated, it's really
useful, orthogonal to online checksum activation and argueably could've
been committed as a separate patch anyway.

I've not looked at it in any meaningful amount of detail, but it does
seem a lot lower risk from here.

Greetings,

Andres Freund

#153

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Magnus Hagander (#150)

Re: Online enabling of checksums

Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:

Note however that I'm sans-laptop until Sunday, so I will revert it then or
possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Greetings,

Andres Freund

#154

Magnus Hagander

magnus@hagander.net

almost 8 years ago

In reply to: Andres Freund (#153)

Re: Online enabling of checksums

On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:

Note however that I'm sans-laptop until Sunday, so I will revert it then

or

possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Thanks.

I've pushed the revert now, and left the pg_verify_checksums in place for
the time being.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#155

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Andres Freund (#148)

Re: Online enabling of checksums

On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund <andres@anarazel.de> wrote:

This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}. Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query. It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in). It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe. Writing heap tuples makes it parallel-unsafe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#156

Magnus Hagander

magnus@hagander.net

over 7 years ago

In reply to: Robert Haas (#155)

Re: Online enabling of checksums

On Tue, Apr 10, 2018 at 6:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund <andres@anarazel.de> wrote:

This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}. Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query. It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in). It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe. Writing heap tuples makes it parallel-unsafe.

That's a good summary, thanks!

Just to be clear in this case though -- the function itself doesn't write
out *anything*. It only starts a background worker that later does it. The
background worker itself is not parallelized, so the risk in this
particular usecase would be that we ended up starting multiple workers (or
just failed), I think.

But the summary is very good to have regardless! :)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#157

Magnus Hagander

magnus@hagander.net

over 7 years ago

In reply to: Magnus Hagander (#154)

1 attachment(s)

Re: Online enabling of checksums

On Mon, Apr 9, 2018 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:

Note however that I'm sans-laptop until Sunday, so I will revert it

then or

possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Thanks.

I've pushed the revert now, and left the pg_verify_checksums in place for
the time being.

PFA an updated version of the patch for the next CF. We believe this one
takes care of all the things pointed out so far.

For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
simply requiring a restart of PostgreSQL to initiate the conversion
background. That is definitely going to guarantee a memory barrier. It's
certainly not ideal, but restarting the cluster is still a *lot* better
than having to do the entire conversion offline. This can of course be
improved upon in the future, but for now we stuck to the safe way.

The concurrent create-database-from-one-that-had-no-checksums is handled by
simply looping over the list of databases as long as new databases show up,
and waiting for all open transactions to finish at the right moment to
ensure there is no concurrently running one as we get the database list.

Since the worker is now a regular background worker started from
postmaster, the cost-delay parameters had to be made GUCs instead of
function arguments.

(And the more or less broken isolation tests are simply removed)

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

online_checksums12.patchtext/x-patch; charset=US-ASCII; name=online_checksums12.patchDownload

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7bfbc87109..108e049a85 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2011,6 +2011,42 @@ include_dir 'conf.d'
      </para>
     </sect2>
 
+    <sect2 id="runtime-config-online-checksum">
+     <title>Online Checksumming</title>
+
+     <variablelist>
+      <varlistentry id="guc-checksumhelper-cost-delay" xreflabel="checksumhelper_cost_delay">
+       <term><varname>checksumhelper_cost_delay</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>checksumhelper_cost_delay</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         The length of time, in milliseconds, that the process will sleep when
+         the cost limit has been exceeded. The default value is zero, which
+         disables the cost-based checksumming delay feature. Positive values
+         enable cost-based checksumming.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-checksumhelper-cost-limit" xreflabel="checksumhelper_cost_limit">
+       <term><varname>checksumhelper_cost_limit</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>checksumhelper_cost_limit</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         The accumulated cost that will cause the checksumming process to sleep.
+         It is turned off by default.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+
     <sect2 id="runtime-config-resource-async-behavior">
      <title>Asynchronous Behavior</title>
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dce8ef178..154cf40cd3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19582,6 +19582,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal>, but will not initiate the checksumming process.
+         In order to start checksumming the data pages the database must be restarted. Upon
+         restart a background worker will start processing all data in the database and enable
+         checksums for it. When all data pages have had checksums enabled, the cluster will
+         automatically switch to checksums <literal>on</literal>.
+        </para>
+        <para>
+         The <xref linkend="runtime-checksumhelper-cost-delay"/> and
+         <xref linkend="runtime-checksumhelper-cost-limit"/> GUCs are used to 
+         <link linkend="runtime-config-online-checksum">throttle the
+         speed of the process</link> is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster. This takes effect immediately.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 4489b585c7..be489e78b9 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -214,9 +214,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 8727f3c26b..29a2fdb449 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,86 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode,
+    but will not start the checksumming of data. In order to start checksumming
+    the data in the cluster, a restart is needed. When the cluster is restarted,
+    checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will track all created databases so that it can be certain that
+    no database has been created from a non-checksummed template database. The
+    process wont set the checksum mode to <literal>on</literal> until no database
+    can be created from a non-checksummed template. If an application repeatedly
+    creates databases it may be necessary to terminate this application to allow
+    the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be started over. When the cluster is
+    restarted, the checksum process will again checksum all data in the cluster
+    It is not possible to resume the work, the process has to start over and
+    re-process the cluster.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1a419aa49b..ccf8032847 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -856,6 +856,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(uint32 new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -4686,10 +4687,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4761,12 +4758,110 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsNeedVerifyLocked(void)
+{
+	bool ret;
+
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 * Make the check while holding the ControlFileLock, to make sure we are
+	 * looking at the latest version.
+	 */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
+	ret = (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+	LWLockRelease(ControlFileLock);
+
+	return ret;
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -9555,6 +9650,22 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(uint32 new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9991,6 +10102,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 9731742978..4de8ae7f4d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -697,3 +698,65 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+pg_disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is in progress"),
+				 errhint("checksum state cannot be changed during recovery.")));
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * This sets the system into a pending state. To initiate the actual
+ * checksum updates, a restart is required to make sure there can be
+ * no parallel backends doing things we cannot work with here.
+ */
+Datum
+pg_enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is in progress"),
+				 errhint("checksum state cannot be changed during recovery.")));
+
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	if (DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already pending"),
+				 errhint("A restart may be required to complete the process")));
+
+	SetDataChecksumsInProgress();
+
+	ereport(NOTICE,
+			(errmsg("data checksums set to pending"),
+			 errhint("To complete the operation, a restart of PostgreSQL is required")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..42f6b8fa69
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,811 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/hsearch.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void WaitForAllTransactionsToFinish(void);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void checksumhelper_sighup(SIGNAL_ARGS);
+
+/* GUCs */
+int			checksumhelper_cost_limit;
+int			checksumhelper_cost_delay;
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+ChecksumHelperLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+
+	return true;
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Loops over all existing blocks in this fork and calculates the checksum on them,
+ * and writes them out. For any blocks added by another process extending this
+ * fork while we run checksums will already set by the process extending it,
+ * so we don't need to care about those.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Full page writes should only happen
+		 * for relations that are actually logged (not unlogged or temp
+		 * tables), but we still need to mark their buffers as dirty so the
+		 * local file gets updated.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		if (RelationNeedsWAL(reln))
+			log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * aborting will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		/*
+		 * Update cost based delay parameters if changed, and then initiate
+		 * the cost delay point.
+		 */
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+			if (checksumhelper_cost_delay >= 0)
+				VacuumCostDelay = checksumhelper_cost_delay;
+			if (checksumhelper_cost_limit >= 0)
+				VacuumCostLimit = checksumhelper_cost_limit;
+			VacuumCostActive = (VacuumCostDelay > 0);
+		}
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there
+		 * are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	elog(DEBUG1, "started background worker for checksums in \"%s\"",
+		 db->dbname);
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	elog(DEBUG1, "background worker for checksums in \"%s\" completed",
+		 db->dbname);
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	ChecksumHelperShmem->abort = true;
+}
+
+static void
+checksumhelper_sighup(SIGNAL_ARGS)
+{
+	got_SIGHUP = true;
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Waiting for old transactions to finish");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	if (RecoveryInProgress())
+	{
+		elog(DEBUG1, "not starting checksumhelper launcher, recovery is in progress");
+		return;
+	}
+
+	/*
+	 * If a standby was restarted when in pending state, a background worker
+	 * was registered to start. If it's later promoted after the master has
+	 * completed enabling checksums, we need to terminate immediately and not
+	 * do anything. If the cluster is still in pending state when promoted,
+	 * the background worker should start to complete the job.
+	 */
+	if (DataChecksumsNeedVerifyLocked())
+	{
+		elog(DEBUG1, "not starting checksumhelper launcher, checksums already enabled");
+		return;
+	}
+
+	on_shmem_exit(launcher_exit, 0);
+
+	elog(DEBUG1, "checksumhelper launcher started");
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This probably can never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+				/* This database has already been processed */
+				continue;
+
+			processing = ProcessDatabase(db);
+			hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (processing == SUCCESSFUL)
+			{
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the list of failures.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+		if (processed_databases == 0)
+
+			/*
+			 * No databases processed in this run of the loop, we have now
+			 * finished all databases and no concurrently created ones can
+			 * exist.
+			 */
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because
+	 * the database was dropped between us getting the database list and
+	 * trying to process it. Get a fresh list of databases to detect the
+	 * second case where the database was dropped before we had started
+	 * processing it. If a database still exists, but enabling checksums
+	 * failed then we fail the entire checksumming process and exit with an
+	 * error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool		found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+	 * testing
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relation types that has local storage.
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGHUP, checksumhelper_sighup);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	elog(DEBUG1, "checksum worker starting for database oid %d", dboid);
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	if (checksumhelper_cost_delay >= 0)
+		VacuumCostDelay = checksumhelper_cost_delay;
+	if (checksumhelper_cost_limit >= 0)
+		VacuumCostLimit = checksumhelper_cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		elog(DEBUG1, "checksum worker aborted in database oid %d", dboid);
+		return;
+	}
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	elog(DEBUG1, "checksum worker completed in database oid %d", dboid);
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 084573e77c..bcd6086cea 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4132,6 +4132,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..8b8d4d5f49 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -110,6 +110,7 @@
 #include "port/pg_bswap.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
@@ -988,6 +989,17 @@ PostmasterMain(int argc, char *argv[])
 	ApplyLauncherRegister();
 
 	/*
+	 * If checksums are set to pending, start the checksum helper launcher
+	 * to start enabling checksums.
+	 */
+	if (DataChecksumsInProgress())
+	{
+		ereport(LOG,
+				(errmsg("data checksums in pending state, starting background worker to enable")));
+		ChecksumHelperLauncherRegister();
+	}
+
+	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
 	process_shared_preload_libraries();
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3f1eae38a9..2ab6afe99c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1386,7 +1386,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003de9c..30d80e7c54 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -199,6 +199,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..5381016915 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,10 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime. When
+enabled via pg_enable_data_checksums() the server must be restarted for the
+checksumming to take effect.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..3de09f03a1 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,12 +93,22 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
-				checksum_failure = true;
+			{
+				/*
+				 * It is possible we get this failure because the user
+				 * has disabled checksums, but we have not yet seen this
+				 * in pg_control and therefor think we should verify it.
+				 * To make sure we have seen any change, make a locked
+				 * access to verify it as well.
+				 */
+				if (DataChecksumsNeedVerifyLocked())
+					checksum_failure = true;
+			}
 		}
 
 		/*
@@ -1168,7 +1178,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1205,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 859ef931e7..f84b916393 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -59,6 +60,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "postmaster/walwriter.h"
@@ -68,6 +70,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -421,6 +424,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 };
 
 /*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", PG_DATA_CHECKSUM_VERSION, true},
+	{"off", 0, true},
+	{"inprogress", PG_DATA_CHECKSUM_INPROGRESS_VERSION, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -515,7 +529,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -589,6 +603,8 @@ const char *const config_group_names[] =
 	gettext_noop("Resource Usage / Kernel Resources"),
 	/* RESOURCES_VACUUM_DELAY */
 	gettext_noop("Resource Usage / Cost-Based Vacuum Delay"),
+	/* RESOURCES_CHECKSUMHELPER */
+	gettext_noop("Resource Usage / Checksumhelper"),
 	/* RESOURCES_BGWRITER */
 	gettext_noop("Resource Usage / Background Writer"),
 	/* RESOURCES_ASYNCHRONOUS */
@@ -1706,17 +1722,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -2205,6 +2210,27 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"checksumhelper_cost_delay", PGC_SIGHUP, RESOURCES_CHECKSUMHELPER,
+			gettext_noop("Checksum helper cost delay in milliseconds."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&checksumhelper_cost_delay,
+		20, -1, 100,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"checksumhelper_cost_limit", PGC_SIGHUP, RESOURCES_CHECKSUMHELPER,
+			gettext_noop("Checksum helper cost amount available before napping."),
+			NULL
+		},
+		&checksumhelper_cost_limit,
+		-1, -1, 10000,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"max_files_per_process", PGC_POSTMASTER, RESOURCES_KERNEL,
 			gettext_noop("Sets the maximum number of simultaneously open files for each server process."),
 			NULL
@@ -4150,6 +4176,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		0, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9e39baf466..9159643634 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -154,6 +154,11 @@
 #vacuum_cost_page_dirty = 20		# 0-10000 credits
 #vacuum_cost_limit = 200		# 1-10000 credits
 
+# - Checksumhelper -
+
+#checksumhelper_cost_delay = 20		# 0-100 milliseconds, -1 to use vacuum_cost_delay
+#checksumhelper_cost_limit = -1		# 0-10000 credits, -1 to use vacuum_cost_limit
+
 # - Background Writer -
 
 #bgwriter_delay = 200ms			# 10-10000ms between rounds
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..1a82e1ddad 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -11,6 +11,8 @@
 
 #include "pg_upgrade.h"
 
+#include "storage/bufpage.h"
+
 #include <ctype.h>
 
 /*
@@ -591,6 +593,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 28c975446e..c2e8b55109 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -314,7 +314,10 @@ main(int argc, char *argv[])
 	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
 	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
-	printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
 
 	if (badblocks > 0)
 		return 1;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..63438ec8f4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,14 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsNeedVerifyLocked(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 7c766836db..043efc89ed 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	uint32		new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 40d54ed030..c57b1bb436 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10176,6 +10176,13 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '3996', descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => '', prosrc => 'pg_disable_data_checksums' },
+{ oid => '3998', descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => '', prosrc => 'pg_enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2f59239b..4ed9ed76cc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..a1ff73e31f
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		ChecksumHelperLauncherRegister(void);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+/* GUCs */
+extern int checksumhelper_cost_limit;
+extern int checksumhelper_cost_delay;
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9efd35..4d6bd12581 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -63,6 +63,7 @@ enum config_group
 	RESOURCES_DISK,
 	RESOURCES_KERNEL,
 	RESOURCES_VACUUM_DELAY,
+	RESOURCES_CHECKSUMHELPER,
 	RESOURCES_BGWRITER,
 	RESOURCES_ASYNCHRONOUS,
 	WAL,
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..71f4cb3d7c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Restart master to trigger background worker to enable checksums
+$node_master->restart();
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');

#158

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Magnus Hagander (#157)

Re: Online enabling of checksums

Hello

I tried build this patch and got error during make docs

postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-limit"
postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-delay"

Both new GUC checksumhelper_cost_delay and checksumhelper_cost_limit mentioned in postgresql.conf with special value -1 (-1 to use vacuum_cost_limit), but this value was not mentioned in docs. I noticed that the code and documentation describe different defaults.
Also i found one "<literal>in progress</literal>" in pg_enable_data_checksums() description. In other places status is called "inprogress" (without space).

VacuumPageHit = 0;
VacuumPageMiss = 0;
VacuumPageDirty = 0;

Hm, why these settings are set to 0 in checksumhelper process?

/*
* Force a checkpoint to get everything out to disk. XXX: this should
* probably not be an IMMEDIATE checkpoint, but leave it there for now for
* testing
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);

We need not forget that.

regards, Sergei

#159

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Sergei Kornilov (#158)

Re: Online enabling of checksums

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:

postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-delay"
postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

regards, Sergei

The new status of this patch is: Waiting on Author

#160

Daniel Gustafsson

daniel@yesql.se

over 7 years ago

In reply to: Sergei Kornilov (#159)

1 attachment(s)

Re: Online enabling of checksums

On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:

postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-delay"
postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

cheers ./daniel

Attachments:

online_checksums13.patchapplication/octet-stream; name=online_checksums13.patch; x-unix-mode=0644Download

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4d48d93305..1ca2fe78c7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2011,6 +2011,42 @@ include_dir 'conf.d'
      </para>
     </sect2>
 
+    <sect2 id="runtime-config-online-checksum">
+     <title>Online Checksumming</title>
+
+     <variablelist>
+      <varlistentry id="guc-checksumhelper-cost-delay" xreflabel="checksumhelper_cost_delay">
+       <term><varname>checksumhelper_cost_delay</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>checksumhelper_cost_delay</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         The length of time, in milliseconds, that the process will sleep when
+         the cost limit has been exceeded. The default value is <literal>-1</literal>, which
+         disables the cost-based checksumming delay feature. Positive values
+         enable cost-based checksumming.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry id="guc-checksumhelper-cost-limit" xreflabel="checksumhelper_cost_limit">
+       <term><varname>checksumhelper_cost_limit</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>checksumhelper_cost_limit</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         The accumulated cost that will cause the checksumming process to sleep.
+         The default value is <literal>200</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+     </variablelist>
+    </sect2>
+
     <sect2 id="runtime-config-resource-async-behavior">
      <title>Asynchronous Behavior</title>
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be92a6..783a54675b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19582,6 +19582,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>inprogress</literal>, but will not initiate the checksumming process.
+         In order to start checksumming the data pages the database must be restarted. Upon
+         restart a background worker will start processing all data in the database and enable
+         checksums for it. When all data pages have had checksums enabled, the cluster will
+         automatically switch to checksums <literal>on</literal>.
+        </para>
+        <para>
+         The <xref linkend="guc-checksumhelper-cost-delay"/> and
+         <xref linkend="guc-checksumhelper-cost-limit"/> GUCs are used to 
+         <link linkend="runtime-config-online-checksum">throttle the
+         speed of the process</link> is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster. This takes effect immediately.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 4489b585c7..be489e78b9 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -214,9 +214,9 @@ PostgreSQL documentation
        <para>
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
-        may incur a noticeable performance penalty. This option can only
-        be set during initialization, and cannot be changed later. If
-        set, checksums are calculated for all objects, in all databases.
+        may incur a noticeable performance penalty. If set, checksums
+        are calculated for all objects, in all databases. See
+        <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..23640e378d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,86 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode,
+    but will not start the checksumming of data. In order to start checksumming
+    the data in the cluster, a restart is needed. When the cluster is restarted,
+    checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will track all created databases so that it can be certain that
+    no database has been created from a non-checksummed template database. The
+    process wont set the checksum mode to <literal>on</literal> until no database
+    can be created from a non-checksummed template. If an application repeatedly
+    creates databases it may be necessary to terminate this application to allow
+    the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be started over. When the cluster is
+    restarted, the checksum process will again checksum all data in the cluster
+    It is not possible to resume the work, the process has to start over and
+    re-process the cluster.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 00741c7b09..a31f8b806a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -17,6 +17,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -137,6 +138,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -182,6 +195,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 493f1db7b9..17e50cccd0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -862,6 +862,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(uint32 new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 					TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -4740,10 +4741,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 void
@@ -4817,12 +4814,110 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsNeedVerifyLocked(void)
+{
+	bool ret;
+
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 * Make the check while holding the ControlFileLock, to make sure we are
+	 * looking at the latest version.
+	 */
+	LWLockAcquire(ControlFileLock, LW_SHARED);
+	ret = (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+	LWLockRelease(ControlFileLock);
+
+	return ret;
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -9665,6 +9760,22 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(uint32 new_type)
+{
+	xl_checksum_state xlrec;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10107,6 +10218,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 9731742978..4de8ae7f4d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
 #include "utils/builtins.h"
@@ -697,3 +698,65 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(xtime);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+pg_disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is in progress"),
+				 errhint("checksum state cannot be changed during recovery.")));
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * This sets the system into a pending state. To initiate the actual
+ * checksum updates, a restart is required to make sure there can be
+ * no parallel backends doing things we cannot work with here.
+ */
+Datum
+pg_enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (RecoveryInProgress())
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("recovery is in progress"),
+				 errhint("checksum state cannot be changed during recovery.")));
+
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	if (DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already pending"),
+				 errhint("A restart may be required to complete the process")));
+
+	SetDataChecksumsInProgress();
+
+	ereport(NOTICE,
+			(errmsg("data checksums set to pending"),
+			 errhint("To complete the operation, a restart of PostgreSQL is required")));
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f651bb49b1..19529d77ad 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..47ad4d43c9
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,808 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/hsearch.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	bool		abort;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void WaitForAllTransactionsToFinish(void);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void checksumhelper_sighup(SIGNAL_ARGS);
+
+/* GUCs */
+int			checksumhelper_cost_limit;
+int			checksumhelper_cost_delay;
+
+/* Flags set by signal handlers */
+static volatile sig_atomic_t got_SIGHUP = false;
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+ChecksumHelperLauncherRegister(void)
+{
+	BackgroundWorker bgw;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = 0;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	RegisterBackgroundWorker(&bgw);
+
+	return true;
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	ChecksumHelperShmem->abort = true;
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Loops over all existing blocks in this fork and calculates the checksum on them,
+ * and writes them out. For any blocks added by another process extending this
+ * fork while we run checksums will already set by the process extending it,
+ * so we don't need to care about those.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Full page writes should only happen
+		 * for relations that are actually logged (not unlogged or temp
+		 * tables), but we still need to mark their buffers as dirty so the
+		 * local file gets updated.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		if (RelationNeedsWAL(reln))
+			log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * aborting will bubble up from here.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		/*
+		 * Update cost based delay parameters if changed, and then initiate
+		 * the cost delay point.
+		 */
+		if (got_SIGHUP)
+		{
+			got_SIGHUP = false;
+			ProcessConfigFile(PGC_SIGHUP);
+			if (checksumhelper_cost_delay >= 0)
+				VacuumCostDelay = checksumhelper_cost_delay;
+			if (checksumhelper_cost_limit >= 0)
+				VacuumCostLimit = checksumhelper_cost_limit;
+			VacuumCostActive = (VacuumCostDelay > 0);
+		}
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there
+		 * are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	elog(DEBUG1, "started background worker for checksums in \"%s\"",
+		 db->dbname);
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	elog(DEBUG1, "background worker for checksums in \"%s\" completed",
+		 db->dbname);
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	ChecksumHelperShmem->abort = false;
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	ChecksumHelperShmem->abort = true;
+}
+
+static void
+checksumhelper_sighup(SIGNAL_ARGS)
+{
+	got_SIGHUP = true;
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = ShmemVariableCache->nextXid;
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Waiting for old transactions to finish");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	if (RecoveryInProgress())
+	{
+		elog(DEBUG1, "not starting checksumhelper launcher, recovery is in progress");
+		return;
+	}
+
+	/*
+	 * If a standby was restarted when in pending state, a background worker
+	 * was registered to start. If it's later promoted after the master has
+	 * completed enabling checksums, we need to terminate immediately and not
+	 * do anything. If the cluster is still in pending state when promoted,
+	 * the background worker should start to complete the job.
+	 */
+	if (DataChecksumsNeedVerifyLocked())
+	{
+		elog(DEBUG1, "not starting checksumhelper launcher, checksums already enabled");
+		return;
+	}
+
+	on_shmem_exit(launcher_exit, 0);
+
+	elog(DEBUG1, "checksumhelper launcher started");
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This probably can never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult processing;
+
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+				/* This database has already been processed */
+				continue;
+
+			processing = ProcessDatabase(db);
+			hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (processing == SUCCESSFUL)
+			{
+				if (ChecksumHelperShmem->process_shared_catalogs)
+
+					/*
+					 * Now that one database has completed shared catalogs, we
+					 * don't have to process them again.
+					 */
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (processing == FAILED)
+			{
+				/*
+				 * Put failed databases on the list of failures.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+		if (processed_databases == 0)
+
+			/*
+			 * No databases processed in this run of the loop, we have now
+			 * finished all databases and no concurrently created ones can
+			 * exist.
+			 */
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because
+	 * the database was dropped between us getting the database list and
+	 * trying to process it. Get a fresh list of databases to detect the
+	 * second case where the database was dropped before we had started
+	 * processing it. If a database still exists, but enabling checksums
+	 * failed then we fail the entire checksumming process and exit with an
+	 * error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool		found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+	 * testing
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = HeapTupleGetOid(tup);
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relation types that has local storage.
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = HeapTupleGetOid(tup);
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	heap_endscan(scan);
+	heap_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGHUP, checksumhelper_sighup);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	elog(DEBUG1, "checksum worker starting for database oid %d", dboid);
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	if (checksumhelper_cost_delay >= 0)
+		VacuumCostDelay = checksumhelper_cost_delay;
+	if (checksumhelper_cost_limit >= 0)
+		VacuumCostLimit = checksumhelper_cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		elog(DEBUG1, "checksum worker aborted in database oid %d", dboid);
+		return;
+	}
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	elog(DEBUG1, "checksum worker completed in database oid %d", dboid);
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index bbe73618c7..618c36dd18 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4135,6 +4135,11 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
 	}
 
 	return backendDesc;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a4b53b33cd..8b8d4d5f49 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -110,6 +110,7 @@
 #include "port/pg_bswap.h"
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/fork_process.h"
 #include "postmaster/pgarch.h"
 #include "postmaster/postmaster.h"
@@ -987,6 +988,17 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	ApplyLauncherRegister();
 
+	/*
+	 * If checksums are set to pending, start the checksum helper launcher
+	 * to start enabling checksums.
+	 */
+	if (DataChecksumsInProgress())
+	{
+		ereport(LOG,
+				(errmsg("data checksums in pending state, starting background worker to enable")));
+		ChecksumHelperLauncherRegister();
+	}
+
 	/*
 	 * process any libraries that should be preloaded at postmaster start
 	 */
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 3f1eae38a9..2ab6afe99c 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1386,7 +1386,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 59c003de9c..30d80e7c54 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -199,6 +199,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 0c86a581c0..853e1e472f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -261,6 +262,7 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..5381016915 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,10 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime. When
+enabled via pg_enable_data_checksums() the server must be restarted for the
+checksumming to take effect.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index dfbda5458f..3de09f03a1 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,12 +93,22 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
-				checksum_failure = true;
+			{
+				/*
+				 * It is possible we get this failure because the user
+				 * has disabled checksums, but we have not yet seen this
+				 * in pg_control and therefor think we should verify it.
+				 * To make sure we have seen any change, make a locked
+				 * access to verify it as well.
+				 */
+				if (DataChecksumsNeedVerifyLocked())
+					checksum_failure = true;
+			}
 		}
 
 		/*
@@ -1168,7 +1178,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1195,7 +1205,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a88ea6cfc9..802db4f6f8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -32,6 +32,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -59,6 +60,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
 #include "postmaster/walwriter.h"
@@ -68,6 +70,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -427,6 +430,17 @@ static const struct config_enum_entry password_encryption_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * data_checksum used to be a boolean, but was only set by initdb so there is
+ * no need to support variants of boolean input.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", PG_DATA_CHECKSUM_VERSION, true},
+	{"off", 0, true},
+	{"inprogress", PG_DATA_CHECKSUM_INPROGRESS_VERSION, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -522,7 +536,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp; /* only accessed locally! */
 static bool integer_datetimes;
 static bool assert_enabled;
 
@@ -596,6 +610,8 @@ const char *const config_group_names[] =
 	gettext_noop("Resource Usage / Kernel Resources"),
 	/* RESOURCES_VACUUM_DELAY */
 	gettext_noop("Resource Usage / Cost-Based Vacuum Delay"),
+	/* RESOURCES_CHECKSUMHELPER */
+	gettext_noop("Resource Usage / Checksumhelper"),
 	/* RESOURCES_BGWRITER */
 	gettext_noop("Resource Usage / Background Writer"),
 	/* RESOURCES_ASYNCHRONOUS */
@@ -1712,17 +1728,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -2211,6 +2216,27 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"checksumhelper_cost_delay", PGC_SIGHUP, RESOURCES_CHECKSUMHELPER,
+			gettext_noop("Checksum helper cost delay in milliseconds."),
+			NULL,
+			GUC_UNIT_MS
+		},
+		&checksumhelper_cost_delay,
+		-1, -1, 100,
+		NULL, NULL, NULL
+	},
+
+	{
+		{"checksumhelper_cost_limit", PGC_SIGHUP, RESOURCES_CHECKSUMHELPER,
+			gettext_noop("Checksum helper cost amount available before napping."),
+			NULL
+		},
+		&checksumhelper_cost_limit,
+		200, -1, 10000,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"max_files_per_process", PGC_POSTMASTER, RESOURCES_KERNEL,
 			gettext_noop("Sets the maximum number of simultaneously open files for each server process."),
@@ -4157,6 +4183,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		0, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	{
 		{"plan_cache_mode", PGC_USERSET, QUERY_TUNING_OTHER,
 			gettext_noop("Controls the planner's selection of custom or generic plan."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c0d3fb8491..b813af8f98 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -153,6 +153,11 @@
 #vacuum_cost_page_dirty = 20		# 0-10000 credits
 #vacuum_cost_limit = 200		# 1-10000 credits
 
+# - Checksumhelper -
+
+#checksumhelper_cost_delay = 20		# 0-100 milliseconds, -1 to use vacuum_cost_delay
+#checksumhelper_cost_limit = -1		# 0-10000 credits, -1 to use vacuum_cost_limit
+
 # - Background Writer -
 
 #bgwriter_delay = 200ms			# 10-10000ms between rounds
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 0fe98a550e..1a82e1ddad 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -11,6 +11,8 @@
 
 #include "pg_upgrade.h"
 
+#include "storage/bufpage.h"
+
 #include <ctype.h>
 
 /*
@@ -590,6 +592,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		pg_fatal("transition to data checksums not completed in old cluster\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 7e5e971294..449a703c47 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -226,7 +226,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 28c975446e..c2e8b55109 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -314,7 +314,10 @@ main(int argc, char *argv[])
 	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 	printf(_("Files scanned:  %" INT64_MODIFIER "d\n"), files);
 	printf(_("Blocks scanned: %" INT64_MODIFIER "d\n"), blocks);
-	printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		printf(_("Blocks left in progress: %" INT64_MODIFIER "d\n"), badblocks);
+	else
+		printf(_("Bad checksums:  %" INT64_MODIFIER "d\n"), badblocks);
 
 	if (badblocks > 0)
 		return 1;
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..63438ec8f4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -154,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -257,7 +257,14 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsNeedVerifyLocked(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 30610b3ea9..e65b4536ca 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -240,6 +241,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	uint32		new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..33c59f9a63 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a14651010f..fbb6b3fb67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10176,6 +10176,13 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '3996', descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => '', prosrc => 'pg_disable_data_checksums' },
+{ oid => '3998', descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => '', prosrc => 'pg_enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d59c24ae23..9566335c9c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -710,7 +710,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..a1ff73e31f
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+bool		ChecksumHelperLauncherRegister(void);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+/* GUCs */
+extern int checksumhelper_cost_limit;
+extern int checksumhelper_cost_delay;
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 85dd10c45a..bd46bf2ce6 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/utils/guc_tables.h b/src/include/utils/guc_tables.h
index 668d9efd35..4d6bd12581 100644
--- a/src/include/utils/guc_tables.h
+++ b/src/include/utils/guc_tables.h
@@ -63,6 +63,7 @@ enum config_group
 	RESOURCES_DISK,
 	RESOURCES_KERNEL,
 	RESOURCES_VACUUM_DELAY,
+	RESOURCES_CHECKSUMHELPER,
 	RESOURCES_BGWRITER,
 	RESOURCES_ASYNCHRONOUS,
 	WAL,
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..f3ad9dfae1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..c1a13b6d22
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+
+# Ensure that the standby has switched to inprogress
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on standby_1');
+
+# Restart cluster to trigger background worker to enable checksums
+$node_master->restart();
+$node_standby_1->restart();
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay', $node_master->lsn('insert'));
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are in progress on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');

#161

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Daniel Gustafsson (#160)

Re: Online enabling of checksums

Hello
Thank you for update! I did only quick test now: patch applied and build clean. But i have reproducible error during check-world:

t/001_standby_checksum.pl .. 6/10
# Failed test 'ensure checksums are enabled on standby'
# at t/001_standby_checksum.pl line 84.
# got: 'inprogress'
# expected: 'on'

In stanby log i found error:

2018-07-25 13:13:05.463 MSK [16544] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000000000003 has already been removed

Checksumhelper obviously writes lot of wal. Test pass if i change restart order to slave first:

$node_standby_1->restart();
$node_master->restart();

Or we need replication slot setup.

Also we have log record after start:

data checksums in pending state, starting background worker to enable

even in recovery state with actual start background worker only at recovery end (or promote). I think better using DEBUG ereport in postmaster and LOG in checksumhelper.

regards, Sergei

25.07.2018, 12:35, "Daniel Gustafsson" <daniel@yesql.se>:

Show quoted text

On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:

postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-delay"
postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

cheers ./daniel

#162

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Magnus Hagander (#157)

Re: Online enabling of checksums

On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander <magnus@hagander.net> wrote:

PFA an updated version of the patch for the next CF. We believe this one
takes care of all the things pointed out so far.

For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
simply requiring a restart of PostgreSQL to initiate the conversion
background. That is definitely going to guarantee a memory barrier. It's
certainly not ideal, but restarting the cluster is still a *lot* better than
having to do the entire conversion offline. This can of course be improved
upon in the future, but for now we stuck to the safe way.

Honestly, I feel like the bar for this feature ought to be higher than that.

(I half-expect a vigorous discussion of whether I have set the bar for
the features I've developed in the right place or not, but I think
that's not really a fair response. If somebody thinks some feature I
implemented should've been more baked, they might be right, but that's
not what this thread is about. I'm giving you MY opinion about THIS
patch, nothing more or less.)

Why can't we do better?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#163

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Robert Haas (#162)

Re: Online enabling of checksums

On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander <magnus@hagander.net>
wrote:

PFA an updated version of the patch for the next CF. We believe this

one

takes care of all the things pointed out so far.

For this version, we "implemented" the

MegaExpensiveRareMemoryBarrier() by

simply requiring a restart of PostgreSQL to initiate the conversion
background. That is definitely going to guarantee a memory barrier.

It's

certainly not ideal, but restarting the cluster is still a *lot*

better than

having to do the entire conversion offline. This can of course be

improved

upon in the future, but for now we stuck to the safe way.

Honestly, I feel like the bar for this feature ought to be higher than
that.

(I half-expect a vigorous discussion of whether I have set the bar for
the features I've developed in the right place or not, but I think
that's not really a fair response. If somebody thinks some feature I
implemented should've been more baked, they might be right, but that's
not what this thread is about. I'm giving you MY opinion about THIS
patch, nothing more or less.)

Why can't we do better?

I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly can (sketch: use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things).

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#164

Bruce Momjian

bruce@momjian.us

over 7 years ago

In reply to: Daniel Gustafsson (#160)

Re: Online enabling of checksums

On Wed, Jul 25, 2018 at 11:35:31AM +0200, Daniel Gustafsson wrote:

On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:

postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-delay"
postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID "runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
^

Is "helper" the right word?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#165

Joshua D. Drake

jd@commandprompt.com

over 7 years ago

In reply to: Bruce Momjian (#164)

Re: Online enabling of checksums

On 07/31/2018 12:45 PM, Bruce Momjian wrote:

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
^

Is "helper" the right word?

Based on other terminology within the postgresql.conf should it be
"checksum_worker_cost_delay"?

#166

Bruce Momjian

bruce@momjian.us

over 7 years ago

In reply to: Joshua D. Drake (#165)

Re: Online enabling of checksums

On Tue, Jul 31, 2018 at 12:52:40PM -0700, Joshua Drake wrote:

On 07/31/2018 12:45 PM, Bruce Momjian wrote:

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
^

Is "helper" the right word?

Based on other terminology within the postgresql.conf should it be
"checksum_worker_cost_delay"?

+1>

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#167

Daniel Gustafsson

daniel@yesql.se

over 7 years ago

In reply to: Joshua D. Drake (#165)

Re: Online enabling of checksums

On 31 Jul 2018, at 21:52, Joshua D. Drake <jd@commandprompt.com> wrote:

On 07/31/2018 12:45 PM, Bruce Momjian wrote:

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
^
Is "helper" the right word?

IIRC, “helper” was chosen to signal that it’s a single process where “worker”
may be thought of as a process of which there can be many.

Based on other terminology within the postgresql.conf should it be "checksum_worker_cost_delay”?

Yes, I think it makes sense to rename it “worker” to align better with the
postgres nomenclature. Will fix.

cheers ./daniel

#168

Daniel Gustafsson

daniel@yesql.se

over 7 years ago

In reply to: Andres Freund (#163)

Re: Online enabling of checksums

On 26 Jul 2018, at 19:35, Andres Freund <andres@anarazel.de> wrote:
On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:

Why can't we do better?

I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly can (sketch: use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things).

Not really arguing for or against, but just to understand the reasoning before
starting hacking. Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process? Is it from a usability or
technical point of view? Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

cheers ./daniel

#169

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Daniel Gustafsson (#168)

Re: Online enabling of checksums

Hi,

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:

On 26 Jul 2018, at 19:35, Andres Freund <andres@anarazel.de> wrote:
On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:

Why can't we do better?

I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly can (sketch: use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things).

Not really arguing for or against, but just to understand the reasoning before
starting hacking. Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process? Is it from a usability or
technical point of view? Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Greetings,

Andres Freund

#170

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Andres Freund (#169)

Re: Online enabling of checksums

Andres Freund <andres@anarazel.de> writes:

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:

Not really arguing for or against, but just to understand the reasoning before
starting hacking. Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process? Is it from a usability or
technical point of view? Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Without taking a position on the merits ... I don't see how you can
claim "it doesn't save that much code" when we don't have a patch to
compare to that doesn't require the restart. Maybe it will turn out
not to be much code, but we don't know that now.

regards, tom lane

#171

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tom Lane (#170)

Re: Online enabling of checksums

On 2018-07-31 17:28:41 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:

Not really arguing for or against, but just to understand the reasoning before
starting hacking. Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process? Is it from a usability or
technical point of view? Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Without taking a position on the merits ... I don't see how you can
claim "it doesn't save that much code" when we don't have a patch to
compare to that doesn't require the restart. Maybe it will turn out
not to be much code, but we don't know that now.

IIRC I outlined a solution around the feature freeze, and I've since
offered to go into further depth if needed. And I'd pointed out the
issue at hand. So while I'd obviously not want to predict a specific
linecount, I'm fairly sure I have a reasonable guesstimate about the
complexity.

- Andres

#172

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Tom Lane (#170)

Re: Online enabling of checksums

On 2018-Jul-31, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:

Not really arguing for or against, but just to understand the reasoning before
starting hacking. Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process? Is it from a usability or
technical point of view? Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Without taking a position on the merits ... I don't see how you can
claim "it doesn't save that much code" when we don't have a patch to
compare to that doesn't require the restart. Maybe it will turn out
not to be much code, but we don't know that now.

The ability to get checksums enabled is a killer feature; the ability to
do it with no restart ... okay, it's better than requiring a restart,
but it's not *that* big a deal.

In the spirit of supporting incremental development, I think it's quite
sensible to get the current thing done, then see what it takes to get
the next thing done. Each is an improvement on its own merits. And it
doesn't have to be made by the same people.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#173

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Alvaro Herrera (#172)

Re: Online enabling of checksums

Hi,

On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:

In the spirit of supporting incremental development, I think it's quite
sensible to get the current thing done, then see what it takes to get
the next thing done. Each is an improvement on its own merits. And it
doesn't have to be made by the same people.

I just don't buy this. An earlier version of this feature was committed
to v11 without the restart, over objections. There's now extra state in
the control file to support the restart based system, there's extra
tests, extra docs. And it'd not be much code to just make it work
without the restart. The process around this patchset is just plain
weird.

Greetings,

Andres Freund

#174

Michael Banck

michael.banck@credativ.de

over 7 years ago

In reply to: Alvaro Herrera (#172)

Re: Online enabling of checksums

Hi,

Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:

The ability to get checksums enabled is a killer feature; the ability to
do it with no restart ... okay, it's better than requiring a restart,
but it's not *that* big a deal.

Well, it's a downtime and service interruption from the client's POV,
and arguably we should remove the 'online' from the patch subject then.
You can activate checksums on an offline instance already via the
pg_checksums extensions to pg_verify_checksums that Michael Paquier and
I wrote independently; of course, that downtime will be linerarily
longer the more data you have.

If this was one week before feature freeze, I would agree with you that
it makes sense to ship it with the restart requirement rather than not
shipping it at all. But we're several commitfests away from v12, so
making an effort to having this work without a downtime looks like a
reasonable requirement to me.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#175

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Michael Banck (#174)

Re: Online enabling of checksums

On 08/01/2018 10:40 AM, Michael Banck wrote:

Hi,

Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:

The ability to get checksums enabled is a killer feature; the
ability to do it with no restart ... okay, it's better than
requiring a restart, but it's not *that* big a deal.

Well, it's a downtime and service interruption from the client's POV,
and arguably we should remove the 'online' from the patch subject
then. You can activate checksums on an offline instance already via
the pg_checksums extensions to pg_verify_checksums that Michael
Paquier and I wrote independently; of course, that downtime will be
linerarily longer the more data you have.

IMHO the main work still happens while the instance is running, so I
don't see why the restart would make it "not online".

But keeping or not keeping "online" is not the main dilemma faced by
this patch, I think. That is, if we drop "online" from the name I doubt
it'll make it any more acceptable for those objecting to having to
restart the cluster.

If this was one week before feature freeze, I would agree with you
that it makes sense to ship it with the restart requirement rather
than not shipping it at all. But we're several commitfests away from
v12, so making an effort to having this work without a downtime
looks like a reasonable requirement to me.

Why would all those pieces had to be committed at once? Why not to
commit what we have now (with the restart) and then remove the
restriction in a later commit?

I understand the desire to be able to enable checksums without a
restart, but kinda agree with Alvaro regarding incremental development.

In a way, the question is how far can we reasonably push the patch
author(s) to implement stuff we consider desirable, but he/she/they
decided it's not worth the time investment at this point.

To me, it seems like an immensely useful feature even with the restart,
and I don't think the restart is a major burden for most systems (it can
be, if your system has no maintenance windows, or course).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#176

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Tomas Vondra (#175)

Re: Online enabling of checksums

Hello
I think one restart is acceptable for such feature. I doubt user want often disable-enable checksums. In many cases checksums will be enabled one time during all cluster life. We need more downtimes for minor updates (4/year) and changes in config PGC_POSTMASTER (max_connections or autovacuum workers, for example).

regards, Sergei

#177

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Michael Banck (#174)

Re: Online enabling of checksums

Hi,

On 2018-08-01 10:40:24 +0200, Michael Banck wrote:

If this was one week before feature freeze, I would agree with you that
it makes sense to ship it with the restart requirement rather than not
shipping it at all. But we're several commitfests away from v12, so
making an effort to having this work without a downtime looks like a
reasonable requirement to me.

My problem isn't just that I shouldn't think this should be committed
without at least a firm committement to do better, my problem is that I
think the "restart" approach is just using the entirely wrong hammer to
solve the problem at hand. At the very least it's very problematic in
respect to replicas, which need to know about the setting too, and can
have similar problems the restart on the primary is supposed to
prevent.

Greetings,

Andres Freund

#178

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tomas Vondra (#175)

Re: Online enabling of checksums

Hi,

On 2018-08-01 11:15:38 +0200, Tomas Vondra wrote:

On 08/01/2018 10:40 AM, Michael Banck wrote:

If this was one week before feature freeze, I would agree with you
that it makes sense to ship it with the restart requirement rather
than not shipping it at all. But we're several commitfests away from
v12, so making an effort to having this work without a downtime
looks like a reasonable requirement to me.

Why would all those pieces had to be committed at once? Why not to
commit what we have now (with the restart) and then remove the
restriction in a later commit?

Sure, if all the pieces existed in various degrees of solidness (with
the earlier pieces committable, but later ones needing work), I'd feel
*much* less concerned about it.

In a way, the question is how far can we reasonably push the patch
author(s) to implement stuff we consider desirable, but he/she/they
decided it's not worth the time investment at this point.

We push people to only implement something really consistent all the
time.

To me, it seems like an immensely useful feature even with the restart,
and I don't think the restart is a major burden for most systems (it can
be, if your system has no maintenance windows, or course).

I think it a problem, my problem is more that I don't think it's really
a solution for the problem.

Greetings,

Andres Freund

#179

Alvaro Herrera

alvherre@2ndquadrant.com

over 7 years ago

In reply to: Andres Freund (#177)

Re: Online enabling of checksums

Hello

On 2018-Aug-01, Andres Freund wrote:

My problem isn't just that I shouldn't think this should be committed
without at least a firm committement to do better,

I take "I think this shouldn't be committed" is what you meant.

I'm not sure I agree with this line of argument. The reality is that
real life or diverging priorities preclude people from working on
$stuff. This is a useful feature-1 we have here, and if we stall it
until we have feature-2, we may not get either until a year later.
That's not a great outcome. We didn't wait for partitioning, parallel
query, DDL progress reporting, logical replication, JIT, wait events (to
name but a few) to solve world's hunger in order to start getting
committed. We move forward step by step, and that's a good thing.

Firm commitments are great things to have, and if the firmness leads to
feature-2 being part of the same release, great, but if it's not firm
enough, we can have feature-2 the next release (or whenever). Even if
there's no such commitment, feature-1 is useful on its own.

my problem is that I think the "restart" approach is just using the
entirely wrong hammer to solve the problem at hand. At the very least
it's very problematic in respect to replicas, which need to know about
the setting too, and can have similar problems the restart on the
primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned. For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development). But
there are many systems where it *is* possible to take everything down
for five seconds, then back up. They can definitely take advantage of
checksummed data.

Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#180

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Andres Freund (#178)

Re: Online enabling of checksums

On 08/01/2018 05:58 PM, Andres Freund wrote:

Hi,

On 2018-08-01 11:15:38 +0200, Tomas Vondra wrote:

On 08/01/2018 10:40 AM, Michael Banck wrote:

If this was one week before feature freeze, I would agree with you
that it makes sense to ship it with the restart requirement rather
than not shipping it at all. But we're several commitfests away from
v12, so making an effort to having this work without a downtime
looks like a reasonable requirement to me.

Why would all those pieces had to be committed at once? Why not to
commit what we have now (with the restart) and then remove the
restriction in a later commit?

Sure, if all the pieces existed in various degrees of solidness (with
the earlier pieces committable, but later ones needing work), I'd feel
*much* less concerned about it.

That's not what I meant, sorry for not being clearer. My point was that
I see the "without restart" as desirable but optional, and would not
mind treating it as a future improvement.

In a way, the question is how far can we reasonably push the patch
author(s) to implement stuff we consider desirable, but he/she/they
decided it's not worth the time investment at this point.

We push people to only implement something really consistent all the
time.

Sure, but it's somewhat subjective matter - to me this limitation does
not make this particular patch inconsistent. If we can remove it, great.
If not, it's still immensely useful improvement.

To me, it seems like an immensely useful feature even with the restart,
and I don't think the restart is a major burden for most systems (it can
be, if your system has no maintenance windows, or course).

I think it a problem, my problem is more that I don't think it's really
a solution for the problem.

Sure, if there are issues with this approach, that would make it
unacceptable. I'm not sure why would it be an issue for replicas (which
is what you mention elsewhere), considering those don't write data and
so can't fail to update a checksum?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#181

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Alvaro Herrera (#179)

Re: Online enabling of checksums

On 2018-08-01 12:20:12 -0400, Alvaro Herrera wrote:

Hello

On 2018-Aug-01, Andres Freund wrote:

My problem isn't just that I shouldn't think this should be committed
without at least a firm committement to do better,

I take "I think this shouldn't be committed" is what you meant.

Yep.

I'm not sure I agree with this line of argument. The reality is that
real life or diverging priorities preclude people from working on
$stuff.

Right.

This is a useful feature-1 we have here, and if we stall it
until we have feature-2, we may not get either until a year later.
That's not a great outcome. We didn't wait for partitioning, parallel
query, DDL progress reporting, logical replication, JIT, wait events (to
name but a few) to solve world's hunger in order to start getting
committed. We move forward step by step, and that's a good thing.

But we asked that they implement something consistent, and rejected many
that were not deemed to be that.

my problem is that I think the "restart" approach is just using the
entirely wrong hammer to solve the problem at hand. At the very least
it's very problematic in respect to replicas, which need to know about
the setting too, and can have similar problems the restart on the
primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.

It really can't realistically. That'd essentially mean breaking PITR.
You'd have to schedule the restart of any replicas to happen after a
specific record. And what if there's replicas that are on a delay? What
if there's data centers that are currently offline?

And again, this isn't hard to do properly. I don't get why we're talking
about an at least operationally complex workaround when the proper
solution isn't hard.

Greetings,

Andres Freund

#182

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Tomas Vondra (#180)

Re: Online enabling of checksums

Hi,

On 2018-08-01 18:25:48 +0200, Tomas Vondra wrote:

Sure, if there are issues with this approach, that would make it
unacceptable. I'm not sure why would it be an issue for replicas (which is
what you mention elsewhere), considering those don't write data and so can't
fail to update a checksum?

Standbys compute checksums on writeout as well, no? We compute checksums
not at buffer modification, but at writeout time. And replay just marks
buffers dirty, it doesn't directly write to disk.

Architecturally there'd also be hint bits as a source, but I think we
probably neutered them enough for that not to be a problem during
replay.

And then there's also promotions.

Greetings,

Andres Freund

#183

Bruce Momjian

bruce@momjian.us

over 7 years ago

In reply to: Andres Freund (#173)

Re: Online enabling of checksums

On Tue, Jul 31, 2018 at 04:05:23PM -0700, Andres Freund wrote:

Hi,

On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:

In the spirit of supporting incremental development, I think it's quite
sensible to get the current thing done, then see what it takes to get
the next thing done. Each is an improvement on its own merits. And it
doesn't have to be made by the same people.

I just don't buy this. An earlier version of this feature was committed
to v11 without the restart, over objections. There's now extra state in
the control file to support the restart based system, there's extra
tests, extra docs. And it'd not be much code to just make it work
without the restart. The process around this patchset is just plain
weird. ----------------------------------------------

-----

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution. Forcing a restart is probably part of that
primitive orchestration. We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly. These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#184

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Bruce Momjian (#183)

Re: Online enabling of checksums

On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.

There's a number of GUCs that do this? Even in related areas,
cf. full_page_writes.

Greetings,

Andres Freund

#185

Bruce Momjian

bruce@momjian.us

over 7 years ago

In reply to: Andres Freund (#184)

Re: Online enabling of checksums

On Wed, Aug 1, 2018 at 09:39:43AM -0700, Andres Freund wrote:

On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.

There's a number of GUCs that do this? Even in related areas,
cf. full_page_writes.

How is that taking the server from one state to the next in a
non-instantaneous way?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

#186

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Bruce Momjian (#185)

Re: Online enabling of checksums

Hi,

On 2018-08-01 12:45:27 -0400, Bruce Momjian wrote:

On Wed, Aug 1, 2018 at 09:39:43AM -0700, Andres Freund wrote:

On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.

There's a number of GUCs that do this? Even in related areas,
cf. full_page_writes.

How is that taking the server from one state to the next in a
non-instantaneous way?

Because it requires coordination around checkpoints, it can only really
fully take effect after the start of a checkpoint. And it even both has
primary and replica differences...

Greetings,

Andres Freund

#187

Joshua D. Drake

jd@commandprompt.com

over 7 years ago

In reply to: Alvaro Herrera (#179)

Re: Online enabling of checksums

On 08/01/2018 09:20 AM, Alvaro Herrera wrote:

my problem is that I think the "restart" approach is just using the
entirely wrong hammer to solve the problem at hand. At the very least
it's very problematic in respect to replicas, which need to know about
the setting too, and can have similar problems the restart on the
primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.

People in mission critical environments do not "restart all servers".
They fail over to a secondary to do maintenance on a primary. When you
have a system where you literally lose thousands of dollars every minute
the database is down you can't do what you are proposing. When you have
a system that if the database is down for longer than X minutes, you
actually lose a whole day because all of the fabricators have to
revalidate before they begin work, you can't do that. Granted that is
not the majority (which you mention) but let's not forget them.

The one place where a restart does happen and will continue to happen
for around 5 (3 if you incorporate pg_logical and 9.6) more years is
upgrades. Although we have logical replication for upgrades now, we are
5 years away from the majority of users being on a version of PostgreSQL
that supports logical replication for upgrades. So, I can see an
argument for an incremental approach because people could enable
checksums as part of their upgrade restart.

For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development). But

I have to say, as a proponent of funded development for longer than most
I like to see this refreshing take on the fact that this all does take
money.

there are many systems where it *is* possible to take everything down
for five seconds, then back up. They can definitely take advantage of
checksummed data.

This is a good point.

Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.

Originally, I was going to -1 how this is being implemented. I too wish
we had the "ALTER DATABASE ENABLE CHECKSUM" or equivalent without a
restart. However, being able to just restart is a huge step forward from
what we have now.

Lastly, I think Alvaro has a point with the incremental development and
I also think some others on this thread need to, "show me the patch"
instead of being armchair directors of development.

#188

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Joshua D. Drake (#187)

Re: Online enabling of checksums

Hi,

On 2018-08-01 10:34:55 -0700, Joshua D. Drake wrote:

Lastly, I think Alvaro has a point with the incremental development and I
also think some others on this thread need to, "show me the patch" instead
of being armchair directors of development.

Oh, FFS. I pointed out the issue that led to the restart being
introduced (reminder, in the committed but then reverted version that
didn't exist). I explained how the much less intrusive version would
work. I think it's absurd to describe that as "armchair directors of
development".

Greetings,

Andres Freund

#189

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Joshua D. Drake (#187)

Re: Online enabling of checksums

They fail over to a secondary to do maintenance on a primary.

But this is not problem even in current patch state. We can restart replica before failover and it works. I tested this behavior during my review.
We can:
- call pg_enable_data_checksums() on master
- wait change data_checksums to inprogress on replica
- restart replica - we can restart replica before promote, right?
- promote this replica
- checksum helper is launched now and working on this promoted cluster

regards, Sergei

#190

Andres Freund

andres@anarazel.de

over 7 years ago

In reply to: Sergei Kornilov (#189)

Re: Online enabling of checksums

Hi,

On 2018-08-01 21:03:22 +0300, Sergei Kornilov wrote:

They fail over to a secondary to do maintenance on a primary.

But this is not problem even in current patch state. We can restart replica before failover and it works. I tested this behavior during my review.
We can:
- call pg_enable_data_checksums() on master
- wait change data_checksums to inprogress on replica

That's *precisely* the problem. What if your replicas are delayed
(e.g. recovery_min_apply_delay)? How would you schedule that restart
properly? What if you later need to do PITR?

- restart replica - we can restart replica before promote, right?
- promote this replica
- checksum helper is launched now and working on this promoted cluster

This doesn't test the consequences of the restart being skipped, nor
does it review on a code level the correctness.

Greetings,

Andres Freund

#191

Sergei Kornilov

sk@zsrv.org

over 7 years ago

In reply to: Andres Freund (#190)

Re: Online enabling of checksums

This doesn't test the consequences of the restart being skipped, nor
does it review on a code level the correctness.

I check not only one stuff during review. I look code too: bgworker checksumhelper.c registered with:

bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;

And then process the whole cluster (even if we run checksumhelper before, but exit before its completed). Or BgWorkerStart_RecoveryFinished does not guarantee start only after recovery finished?
Before start any real work (and after recovery end) checksumhelper checked current cluster status again:

+	 * If a standby was restarted when in pending state, a background worker
+	 * was registered to start. If it's later promoted after the master has
+	 * completed enabling checksums, we need to terminate immediately and not
+	 * do anything. If the cluster is still in pending state when promoted,
+	 * the background worker should start to complete the job.

What if your replicas are delayed (e.g. recovery_min_apply_delay)?
What if you later need to do PITR?

if we start after replay pg_enable_data_checksums and before it completed - we plan start bgworker on recovery finish.
if we replay checksumhelper finish - we _can_ start checksumhelper again and this is handled during checksumhelper start.

Behavior seems correct for me. I miss something very wrong?

regards, Sergei

#192

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Daniel Gustafsson (#160)

Re: Online enabling of checksums

Hi,

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#193

Stephen Frost

sfrost@snowman.net

over 7 years ago

In reply to: Tomas Vondra (#192)

Re: Online enabling of checksums

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

Thanks!

Stephen

#194

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Stephen Frost (#193)

Re: Online enabling of checksums

On 09/29/2018 02:19 PM, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#195

Stephen Frost

sfrost@snowman.net

over 7 years ago

In reply to: Tomas Vondra (#194)

Re: Online enabling of checksums

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 09/29/2018 02:19 PM, Stephen Frost wrote:

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Thanks!

Stephen

#196

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Stephen Frost (#195)

Re: Online enabling of checksums

On 09/29/2018 06:51 PM, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 09/29/2018 02:19 PM, Stephen Frost wrote:

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#197

Andres Freund

andres@anarazel.de

almost 7 years ago

In reply to: Tomas Vondra (#196)

Re: Online enabling of checksums

On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:

On 09/29/2018 06:51 PM, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 09/29/2018 02:19 PM, Stephen Frost wrote:

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/

Greetings,

Andres Freund

#198

Magnus Hagander

magnus@hagander.net

almost 7 years ago

In reply to: Andres Freund (#197)

Re: Online enabling of checksums

On Thu, Jan 31, 2019 at 11:57 AM Andres Freund <andres@anarazel.de> wrote:

On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:

On 09/29/2018 06:51 PM, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

On 09/29/2018 02:19 PM, Stephen Frost wrote:

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:

While looking at the online checksum verification patch (which I

guess

will get committed before this one), it occurred to me that

disabling

checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what

if

someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or

not.

So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.

I'm not really sure what else we could say here..? I don't

particularly

see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're

going

to get odd results.

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint.

Soi

if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/

Unfortunately, I agree. I wish that wasn't the case, but due to things
outside my control, that's what happened.

So yes, let's punt it.

//Magnus