Online checksums patch - once again

Started by Magnus Haganderover 6 years ago91 messages

magnus@hagander.net

over 6 years ago

2 attachment(s)

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (
/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
)

Andres patch has been enhanced with wait events per
/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
.

I'm including both those as attachments here to hopefully trick the cfbot
into being able to build things :)

Other than that, we believe that the list of objections from the first one
should be covered by now, but it's been quite some time and many emails, so
it's possible we missed some. So if you find them, point them out!

Documentation needs another go-over in particular base don changes since,
but the basic principles of how it works should not have changed.

//Magnus & Daniel

Attachments:

0001-WIP-global-barriers.patchtext/x-patch; charset=US-ASCII; name=0001-WIP-global-barriers.patchDownload

From 14b20affc98ecb893531a683d83a3bef03fcff62 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 29 Oct 2018 10:14:15 -0700
Subject: [PATCH 1/2] WIP: global barriers

This is a squash of three patches from Andres:
* Use procsignal_sigusr1_handler for all shmem connected bgworkers.
* Use  procsignal_sigusr1_handler in all auxiliary processes.
* WIP: global barriers.

And one from Magnus:
* Wait event for global barriers
---
 src/backend/postmaster/autovacuum.c   |   3 +-
 src/backend/postmaster/bgworker.c     |  31 +++++---
 src/backend/postmaster/bgwriter.c     |  24 ++----
 src/backend/postmaster/checkpointer.c |  19 ++---
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/postmaster/startup.c      |  18 ++---
 src/backend/postmaster/walwriter.c    |  17 +---
 src/backend/replication/walreceiver.c |  20 +----
 src/backend/storage/buffer/bufmgr.c   |   4 +
 src/backend/storage/ipc/procsignal.c  | 141 ++++++++++++++++++++++++++++++++++
 src/backend/storage/lmgr/proc.c       |  20 +++++
 src/backend/tcop/postgres.c           |   7 ++
 src/include/pgstat.h                  |   1 +
 src/include/storage/proc.h            |   9 +++
 src/include/storage/procsignal.h      |  23 +++++-
 15 files changed, 255 insertions(+), 85 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 073f313337..24e28dd3a3 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -649,8 +649,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		ResetLatch(MyLatch);
 
-		/* Process sinval catchup interrupts that happened while sleeping */
+		/* Process pending interrupts. */
 		ProcessCatchupInterrupt();
+		ProcessGlobalBarrierIntterupt();
 
 		/* the normal shutdown case */
 		if (got_SIGTERM)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517aca..f300f9285b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -734,23 +734,32 @@ StartBackgroundWorker(void)
 	/*
 	 * Set up signal handlers.
 	 */
+
+
+	/*
+	 * SIGINT is used to signal canceling the current action for processes
+	 * able to run queries.
+	 */
 	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
-	{
-		/*
-		 * SIGINT is used to signal canceling the current action
-		 */
 		pqsignal(SIGINT, StatementCancelHandler);
-		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
-		pqsignal(SIGFPE, FloatExceptionHandler);
-
-		/* XXX Any other handlers needed here? */
-	}
 	else
-	{
 		pqsignal(SIGINT, SIG_IGN);
+
+	/*
+	 * Everything with a PGPROC should be able to receive procsignal.h style
+	 * signals.
+	 */
+	if (worker->bgw_flags & (BGWORKER_BACKEND_DATABASE_CONNECTION |
+							 BGWORKER_SHMEM_ACCESS))
+		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	else
 		pqsignal(SIGUSR1, bgworker_sigusr1_handler);
+
+	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
+		pqsignal(SIGFPE, FloatExceptionHandler);
+	else
 		pqsignal(SIGFPE, SIG_IGN);
-	}
+
 	pqsignal(SIGTERM, bgworker_die);
 	pqsignal(SIGHUP, SIG_IGN);
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 8ec16a3fb8..80a8e3cf4b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -51,6 +51,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -97,7 +98,6 @@ static volatile sig_atomic_t shutdown_requested = false;
 static void bg_quickdie(SIGNAL_ARGS);
 static void BgSigHupHandler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
-static void bgwriter_sigusr1_handler(SIGNAL_ARGS);
 
 
 /*
@@ -115,10 +115,7 @@ BackgroundWriterMain(void)
 	WritebackContext wb_context;
 
 	/*
-	 * Properly accept or ignore signals the postmaster might send us.
-	 *
-	 * bgwriter doesn't participate in ProcSignal signalling, but a SIGUSR1
-	 * handler is still needed for latch wakeups.
+	 * Properly accept or ignore signals that might be sent to us.
 	 */
 	pqsignal(SIGHUP, BgSigHupHandler);	/* set flag to read config file */
 	pqsignal(SIGINT, SIG_IGN);
@@ -126,7 +123,7 @@ BackgroundWriterMain(void)
 	pqsignal(SIGQUIT, bg_quickdie); /* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, bgwriter_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN);
 
 	/*
@@ -261,6 +258,10 @@ BackgroundWriterMain(void)
 			proc_exit(0);		/* done */
 		}
 
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Do one cycle of dirty-buffer writing.
 		 */
@@ -428,14 +429,3 @@ ReqShutdownHandler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
-
-/* SIGUSR1: used for latch wakeups */
-static void
-bgwriter_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 61544f65ad..def9aa87d8 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -179,7 +180,6 @@ static void UpdateSharedMemoryConfig(void);
 static void chkpt_quickdie(SIGNAL_ARGS);
 static void ChkptSigHupHandler(SIGNAL_ARGS);
 static void ReqCheckpointHandler(SIGNAL_ARGS);
-static void chkpt_sigusr1_handler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
 
 
@@ -211,7 +211,7 @@ CheckpointerMain(void)
 	pqsignal(SIGQUIT, chkpt_quickdie);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, chkpt_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, ReqShutdownHandler);	/* request shutdown */
 
 	/*
@@ -346,6 +346,10 @@ CheckpointerMain(void)
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
 
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Process any requests or signals received recently.
 		 */
@@ -853,17 +857,6 @@ ReqCheckpointHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: used for latch wakeups */
-static void
-chkpt_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to run a shutdown checkpoint and exit */
 static void
 ReqShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f7d7..a0631ee154 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3765,6 +3765,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_GLOBAL_BARRIER:
+			event_name = "GlobalBarrier";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATING:
 			event_name = "Hash/Batch/Allocating";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 5048a2c2aa..da0a670bdf 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -30,6 +30,7 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
+#include "storage/procsignal.h"
 #include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/timeout.h"
@@ -50,7 +51,6 @@ static volatile sig_atomic_t in_restore_command = false;
 
 /* Signal handlers */
 static void startupproc_quickdie(SIGNAL_ARGS);
-static void StartupProcSigUsr1Handler(SIGNAL_ARGS);
 static void StartupProcTriggerHandler(SIGNAL_ARGS);
 static void StartupProcSigHupHandler(SIGNAL_ARGS);
 
@@ -87,17 +87,6 @@ startupproc_quickdie(SIGNAL_ARGS)
 }
 
 
-/* SIGUSR1: let latch facility handle the signal */
-static void
-StartupProcSigUsr1Handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to finish recovery */
 static void
 StartupProcTriggerHandler(SIGNAL_ARGS)
@@ -162,6 +151,9 @@ HandleStartupProcInterrupts(void)
 	 */
 	if (IsUnderPostmaster && !PostmasterIsAlive())
 		exit(1);
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
@@ -181,7 +173,7 @@ StartupProcessMain(void)
 	pqsignal(SIGQUIT, startupproc_quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, StartupProcSigUsr1Handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, StartupProcTriggerHandler);
 
 	/*
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a6fdba3f41..19120aa6e1 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -55,6 +55,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/smgr.h"
 #include "utils/guc.h"
 #include "utils/hsearch.h"
@@ -86,7 +87,6 @@ static volatile sig_atomic_t shutdown_requested = false;
 static void wal_quickdie(SIGNAL_ARGS);
 static void WalSigHupHandler(SIGNAL_ARGS);
 static void WalShutdownHandler(SIGNAL_ARGS);
-static void walwriter_sigusr1_handler(SIGNAL_ARGS);
 
 /*
  * Main entry point for walwriter process
@@ -114,7 +114,7 @@ WalWriterMain(void)
 	pqsignal(SIGQUIT, wal_quickdie);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, walwriter_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN); /* not used */
 
 	/*
@@ -255,6 +255,8 @@ WalWriterMain(void)
 			/* Normal exit from the walwriter is here */
 			proc_exit(0);		/* done */
 		}
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
 
 		/*
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
@@ -337,14 +339,3 @@ WalShutdownHandler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
-
-/* SIGUSR1: used for latch wakeups */
-static void
-walwriter_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 6abc780778..9acdbdd7c9 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -63,6 +63,7 @@
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
 #include "storage/procarray.h"
+#include "storage/procsignal.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/pg_lsn.h"
@@ -125,7 +126,6 @@ static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
 
 /* Signal handlers */
 static void WalRcvSigHupHandler(SIGNAL_ARGS);
-static void WalRcvSigUsr1Handler(SIGNAL_ARGS);
 static void WalRcvShutdownHandler(SIGNAL_ARGS);
 static void WalRcvQuickDieHandler(SIGNAL_ARGS);
 
@@ -147,9 +147,8 @@ void
 ProcessWalRcvInterrupts(void)
 {
 	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32.
+	 * The CHECK_FOR_INTERRUPTS() call ensures global barriers are handled,
+	 * and incoming signals on Win32 are received.
 	 */
 	CHECK_FOR_INTERRUPTS();
 
@@ -252,7 +251,7 @@ WalReceiverMain(void)
 	pqsignal(SIGQUIT, WalRcvQuickDieHandler);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalRcvSigUsr1Handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN);
 
 	/* Reset some signals that are accepted by postmaster but not here */
@@ -766,17 +765,6 @@ WalRcvSigHupHandler(SIGNAL_ARGS)
 }
 
 
-/* SIGUSR1: used by latch mechanism */
-static void
-WalRcvSigUsr1Handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGTERM: set flag for ProcessWalRcvInterrupts */
 static void
 WalRcvShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6f3a402854..36cd363bfb 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1885,6 +1885,10 @@ BufferSync(int flags)
 
 		cur_tsid = CkptBufferIds[i].tsId;
 
+		/* XXX: need a more principled approach here */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Grow array of per-tablespace status structs, every time a new
 		 * tablespace is found.
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c367..9aed52df4a 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,8 +18,10 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
@@ -62,9 +64,11 @@ typedef struct
 
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
+volatile sig_atomic_t GlobalBarrierInterruptPending = false;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
+static void HandleGlobalBarrierSignal(void);
 
 /*
  * ProcSignalShmemSize
@@ -262,6 +266,8 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pg_read_barrier();
+
 	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
 		HandleCatchupInterrupt();
 
@@ -292,9 +298,144 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_GLOBAL_BARRIER))
+		HandleGlobalBarrierSignal();
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
 
 	errno = save_errno;
 }
+
+/*
+ *
+ */
+uint64
+EmitGlobalBarrier(GlobalBarrierKind kind)
+{
+	uint64 generation;
+
+	/*
+	 * Broadcast flag, without incrementing generation. This ensures that all
+	 * backends could know about this.
+	 *
+	 * It's OK if the to-be-signalled backend enters after our check here. A
+	 * new backend should have current settings.
+	 */
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+
+		if (proc->pid == 0)
+			continue;
+
+		pg_atomic_fetch_or_u32(&proc->barrierFlags, (uint32) kind);
+
+		elog(LOG, "setting flags for %u", proc->pid);
+	}
+
+	/*
+	 * Broadcast flag generation. If any backend joins after this, it's either
+	 * going to be signalled below, or has read a new enough generation that
+	 * WaitForGlobalBarrier() will not wait for it.
+	 */
+	generation = pg_atomic_add_fetch_u64(&ProcGlobal->globalBarrierGen, 1);
+
+	/* Wake up each backend (including ours) */
+	for (int i = 0; i < NumProcSignalSlots; i++)
+	{
+		ProcSignalSlot *slot = &ProcSignalSlots[i];
+
+		if (slot->pss_pid == 0)
+			continue;
+
+		/* Atomically set the proper flag */
+		slot->pss_signalFlags[PROCSIG_GLOBAL_BARRIER] = true;
+
+		pg_write_barrier();
+
+		/* Send signal */
+		kill(slot->pss_pid, SIGUSR1);
+	}
+
+	return generation;
+}
+
+/*
+ * Wait for all barriers to be absorbed.  This guarantees that all changes
+ * requested by a specific EmitGlobalBarrier() have taken effect.
+ */
+void
+WaitForGlobalBarrier(uint64 generation)
+{
+	pgstat_report_wait_start(WAIT_EVENT_GLOBAL_BARRIER);
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+		uint64 oldval;
+
+		pg_memory_barrier();
+		oldval = pg_atomic_read_u64(&proc->barrierGen);
+
+		/*
+		 * Unused proc slots get their barrierGen set to UINT64_MAX, so we
+		 * need not care about that.
+		 */
+		while (oldval < generation)
+		{
+			CHECK_FOR_INTERRUPTS();
+			pg_usleep(10000);
+
+			pg_memory_barrier();
+			oldval = pg_atomic_read_u64(&proc->barrierGen);
+		}
+	}
+	pgstat_report_wait_end();
+}
+
+/*
+ * Absorb the global barrier procsignal.
+ */
+static void
+HandleGlobalBarrierSignal(void)
+{
+	InterruptPending = true;
+	GlobalBarrierInterruptPending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Perform global barrier related interrupt checking. If CHECK_FOR_INTERRUPTS
+ * is used, it'll be called by that, if a backend type doesn't do so, it has
+ * to be called explicitly.
+ */
+void
+ProcessGlobalBarrierIntterupt(void)
+{
+	if (GlobalBarrierInterruptPending)
+	{
+		uint64 generation;
+		uint32 flags;
+
+		GlobalBarrierInterruptPending = false;
+
+		generation = pg_atomic_read_u64(&ProcGlobal->globalBarrierGen);
+		pg_memory_barrier();
+		flags = pg_atomic_exchange_u32(&MyProc->barrierFlags, 0);
+		pg_memory_barrier();
+
+		if (flags & GLOBBAR_CHECKSUM)
+		{
+			/*
+			 * By virtue of getting here (i.e. interrupts being processed), we
+			 * know that this backend won't have any in-progress writes (which
+			 * might have missed the checksum change).
+			 */
+		}
+
+		pg_atomic_write_u64(&MyProc->barrierGen, generation);
+
+		elog(LOG, "processed interrupts for %u", MyProcPid);
+	}
+}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373fd0e..ae52b9e9ac 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -190,6 +190,7 @@ InitProcGlobal(void)
 	ProcGlobal->checkpointerLatch = NULL;
 	pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
 	pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+	pg_atomic_init_u64(&ProcGlobal->globalBarrierGen, 1);
 
 	/*
 	 * Create and initialize all the PGPROC structures we'll need.  There are
@@ -284,6 +285,9 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+
+		pg_atomic_init_u32(&procs[i].barrierFlags, 0);
+		pg_atomic_init_u64(&procs[i].barrierGen, PG_UINT64_MAX);
 	}
 
 	/*
@@ -442,6 +446,12 @@ InitProcess(void)
 	MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
 	Assert(pg_atomic_read_u32(&MyProc->clogGroupNext) == INVALID_PGPROCNO);
 
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 	/*
 	 * Acquire ownership of the PGPROC's latch, so that we can use WaitLatch
 	 * on it.  That allows us to repoint the process latch, which so far
@@ -585,6 +595,13 @@ InitAuxiliaryProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -883,6 +900,9 @@ ProcKill(int code, Datum arg)
 		LWLockRelease(leader_lwlock);
 	}
 
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen, PG_UINT64_MAX);
+
 	/*
 	 * Reset MyLatch to the process local one.  This is so that signal
 	 * handlers et al can continue using the latch after the shared latch
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e8d8e6f828..976e966565 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -612,6 +612,10 @@ ProcessClientWriteInterrupt(bool blocked)
 			SetLatch(MyLatch);
 	}
 
+	/* safe to handle during client communication */
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
+
 	errno = save_errno;
 }
 
@@ -3159,6 +3163,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..c997add881 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,6 +824,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_GLOBAL_BARRIER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATING,
 	WAIT_EVENT_HASH_BATCH_ELECTING,
 	WAIT_EVENT_HASH_BATCH_LOADING,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db725..f108ac52c6 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -203,6 +203,13 @@ struct PGPROC
 	PGPROC	   *lockGroupLeader;	/* lock group leader, if I'm a member */
 	dlist_head	lockGroupMembers;	/* list of members, if I'm a leader */
 	dlist_node	lockGroupLink;	/* my member link, if I'm a member */
+
+	/*
+	 * Support for "super barriers". These can be used to e.g. make sure that
+	 * all backends have acknowledged a configuration change.
+	 */
+	pg_atomic_uint64 barrierGen;
+	pg_atomic_uint32 barrierFlags;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */
@@ -272,6 +279,8 @@ typedef struct PROC_HDR
 	int			startupProcPid;
 	/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
 	int			startupBufferPinWaitBufId;
+
+	pg_atomic_uint64 globalBarrierGen;
 } PROC_HDR;
 
 extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a05c..a978db9b24 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -14,8 +14,9 @@
 #ifndef PROCSIGNAL_H
 #define PROCSIGNAL_H
 
-#include "storage/backendid.h"
+#include <signal.h>
 
+#include "storage/backendid.h"
 
 /*
  * Reasons for signalling a Postgres child process (a backend or an auxiliary
@@ -42,6 +43,8 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	PROCSIG_GLOBAL_BARRIER,
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
@@ -57,4 +60,22 @@ extern int	SendProcSignal(pid_t pid, ProcSignalReason reason,
 
 extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
 
+/*
+ * These collapse. The flag values better be distinct bits.
+ */
+typedef enum GlobalBarrierKind
+{
+	/*
+	 * Guarantee that all processes have the correct view of whether checksums
+	 * enabled/disabled, and no writes are in-progress with previous value(s).
+	 */
+	GLOBBAR_CHECKSUM = 1 << 0
+} GlobalBarrierKind;
+
+extern uint64 EmitGlobalBarrier(GlobalBarrierKind kind);
+extern void WaitForGlobalBarrier(uint64 generation);
+extern void ProcessGlobalBarrierIntterupt(void);
+
+extern PGDLLIMPORT volatile sig_atomic_t GlobalBarrierInterruptPending;
+
 #endif							/* PROCSIGNAL_H */
-- 
2.11.0

0002-Online-checksums-patch-for-v13.patchtext/x-patch; charset=US-ASCII; name=0002-Online-checksums-patch-for-v13.patchDownload

From 1b641b201c1745ca2a2628c21791329ebc87d1df Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 10 Jul 2019 11:25:25 +0200
Subject: [PATCH 2/2] Online checksums patch for v13

Updated from previous patches and now using the global barriers
---
 doc/src/sgml/func.sgml                      |  65 ++
 doc/src/sgml/ref/initdb.sgml                |   7 +-
 doc/src/sgml/reference.sgml                 |   1 +
 doc/src/sgml/wal.sgml                       |  81 +++
 src/backend/access/rmgrdesc/xlogdesc.c      |  16 +
 src/backend/access/transam/xlog.c           | 131 +++-
 src/backend/access/transam/xlogfuncs.c      |  57 ++
 src/backend/catalog/system_views.sql        |   5 +
 src/backend/postmaster/Makefile             |   5 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/checksumhelper.c     | 909 ++++++++++++++++++++++++++++
 src/backend/postmaster/pgstat.c             |   6 +
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/logical/decode.c    |   1 +
 src/backend/storage/ipc/ipci.c              |   2 +
 src/backend/storage/lmgr/lwlocknames.txt    |   1 +
 src/backend/storage/page/README             |   3 +-
 src/backend/storage/page/bufpage.c          |   6 +-
 src/backend/utils/adt/pgstatfuncs.c         |   4 +-
 src/backend/utils/misc/guc.c                |  36 +-
 src/bin/pg_upgrade/controldata.c            |   9 +
 src/bin/pg_upgrade/pg_upgrade.h             |   2 +-
 src/include/access/xlog.h                   |  10 +-
 src/include/access/xlog_internal.h          |   7 +
 src/include/catalog/pg_control.h            |   1 +
 src/include/catalog/pg_proc.dat             |  16 +
 src/include/pgstat.h                        |   4 +-
 src/include/postmaster/checksumhelper.h     |  31 +
 src/include/storage/bufpage.h               |   1 +
 src/include/storage/checksum.h              |   7 +
 src/test/Makefile                           |   3 +-
 src/test/checksum/.gitignore                |   2 +
 src/test/checksum/Makefile                  |  24 +
 src/test/checksum/README                    |  22 +
 src/test/checksum/t/001_standby_checksum.pl | 104 ++++
 35 files changed, 1553 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a7abf8c2ee..d9ec3f2516 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20919,6 +20919,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..b545ad73cb 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -217,9 +217,10 @@ PostgreSQL documentation
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
         may incur a noticeable performance penalty. If set, checksums
-        are calculated for all objects, in all databases. All checksum
-        failures will be reported in the
-        <xref linkend="pg-stat-database-view"/> view.
+        are calculated for all objects, in all databases. All
+        checksum failures will be reported in the <xref
+        linkend="pg-stat-database-view"/> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index cef09dd38b..938ebf8477 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -284,6 +284,7 @@
    &pgtestfsync;
    &pgtesttiming;
    &pgupgrade;
+   &pgVerifyChecksums;
    &pgwaldump;
    &postgres;
    &postmaster;
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..7838f3616a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,87 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3042..ced4ab6d78 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e651a841bb..b6a073b82a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -867,6 +867,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1050,7 +1051,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4779,10 +4780,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 /*
@@ -4819,12 +4816,93 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7746,6 +7824,18 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 * This is because we cannot launch a dynamic background worker directly
+	 * from here, it has to be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9469,6 +9559,24 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9919,6 +10027,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 4795c6fa94..a1e0d93e28 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
@@ -776,3 +777,59 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	StartChecksumHelperLauncher(cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..1d8cd7d3a9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1155,6 +1155,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f300f9285b..f40b7044bd 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..06db05979c
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,909 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * ChecksumHelperLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they
+	 * are in shared memory, they are never concurrently accessed. When
+	 * a worker is running, the launcher is only waiting for that worker
+	 * to finish.
+	 */
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+void
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->abort)
+	{
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been canceled")));
+	}
+
+	if (ChecksumHelperShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	ChecksumHelperShmem->launcher_started = true;
+	LWLockRelease(ChecksumHelperLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+		ChecksumHelperShmem->launcher_started = false;
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper launcher")));
+	}
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->launcher_started)
+		ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 * It's safe to check this without a lock, because if we miss it being
+		 * set, we will try again soon.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = false;
+	ChecksumHelperShmem->launcher_started = false;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL     hash_ctl;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This can probably never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult result;
+			Oid *oid;
+
+			/* Skup if this database has been processed already */
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+			{
+				pfree(db->dbname);
+				pfree(db);
+				continue;
+			}
+
+			result = ProcessDatabase(db);
+
+			/* Make a copy of the oid so we can free the rest of the structure */
+			oid = palloc(sizeof(Oid));
+			*oid = db->dboid;
+			pfree(db->dbname);
+			pfree(db);
+
+			hash_search(ProcessedDatabases, (void *) oid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (result == SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we
+				 * don't have to process them again.
+				 */
+				if (ChecksumHelperShmem->process_shared_catalogs)
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (result == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because the
+	 * database was dropped between us getting the database list and trying to
+	 * process it. Get a fresh list of databases to detect the second case
+	 * where the database was dropped before we had started processing it. If a
+	 * database still exists, but enabling checksums failed then we fail the
+	 * entire checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now
+	 * for testing.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relations types that have local storage
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a0631ee154..b29bab24d2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4310,6 +4310,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66dcbe..836d8e1a32 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1384,7 +1384,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5315d93af0..fe620265a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -199,6 +199,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d733530f..876c6279c5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(int port)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..d50b4b13e1 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+ChecksumHelperLock					45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f873fb0eea 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6b49810e37..6e3bfa045a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1171,7 +1171,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1198,7 +1198,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..61e856deac 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1527,7 +1527,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1545,7 +1545,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89339..bd07fe4170 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -71,6 +72,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -471,6 +473,16 @@ static struct config_enum_entry shared_memory_options[] = {
 };
 
 /*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -572,7 +584,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1824,17 +1836,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4537,6 +4538,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 38236415be..9e196cc2dd 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -658,6 +658,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f724ecf9ca..66758bbd7f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -220,7 +220,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..ae02c09c84 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -189,7 +189,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -293,7 +293,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3f0de6625d..d4e3a3eab2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -241,6 +242,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e91a..8177414854 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cf1f409351..c6101a07f2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10638,6 +10638,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c997add881..346de83de9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -727,7 +727,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..556f801668
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+void		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 4ef6d8ddd4..cf31f24b01 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 7ef32a3baa..2c414aa1e7 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..22a3b64dd8
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..891743fa6c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the master
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.11.0

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Magnus Hagander (#1)

Re: Online checksums patch - once again

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (
/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
)

Andres patch has been enhanced with wait events per
/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com
.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

/usr/bin/xmllint --path . --noout --valid postgres.sgml
reference.sgml:287: parser error : Entity 'pgVerifyChecksums' not defined
&pgVerifyChecksums;
^
reference.sgml:295: parser error : chunk is not well balanced
postgres.sgml:231: parser error : Failure to process entity reference
&reference;
^
postgres.sgml:231: parser error : Entity 'reference' not defined
&reference;
^

Other than bots, this patch doesn't seem to have attracted any reviewers
this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Alvaro Herrera (#2)

2 attachment(s)

Re: Online checksums patch - once again

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachments:

0001-WIP-global-barriers.patchtext/x-patch; charset=US-ASCII; name=0001-WIP-global-barriers.patchDownload

From a52a74193015c8701bf40bb87321dfd56e0dab76 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 29 Oct 2018 10:14:15 -0700
Subject: [PATCH 1/2] WIP: global barriers

This is a squash of three patches from Andres:
* Use procsignal_sigusr1_handler for all shmem connected bgworkers.
* Use  procsignal_sigusr1_handler in all auxiliary processes.
* WIP: global barriers.

And one from Magnus:
* Wait event for global barriers
---
 src/backend/postmaster/autovacuum.c   |   3 +-
 src/backend/postmaster/bgworker.c     |  31 +++++---
 src/backend/postmaster/bgwriter.c     |  24 ++----
 src/backend/postmaster/checkpointer.c |  19 ++---
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/postmaster/startup.c      |  18 ++---
 src/backend/postmaster/walwriter.c    |  17 +---
 src/backend/replication/walreceiver.c |  20 +----
 src/backend/storage/buffer/bufmgr.c   |   4 +
 src/backend/storage/ipc/procsignal.c  | 141 ++++++++++++++++++++++++++++++++++
 src/backend/storage/lmgr/proc.c       |  20 +++++
 src/backend/tcop/postgres.c           |   7 ++
 src/include/pgstat.h                  |   1 +
 src/include/storage/proc.h            |   9 +++
 src/include/storage/procsignal.h      |  23 +++++-
 15 files changed, 255 insertions(+), 85 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 073f313337..24e28dd3a3 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -649,8 +649,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		ResetLatch(MyLatch);
 
-		/* Process sinval catchup interrupts that happened while sleeping */
+		/* Process pending interrupts. */
 		ProcessCatchupInterrupt();
+		ProcessGlobalBarrierIntterupt();
 
 		/* the normal shutdown case */
 		if (got_SIGTERM)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index b66b517aca..f300f9285b 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -734,23 +734,32 @@ StartBackgroundWorker(void)
 	/*
 	 * Set up signal handlers.
 	 */
+
+
+	/*
+	 * SIGINT is used to signal canceling the current action for processes
+	 * able to run queries.
+	 */
 	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
-	{
-		/*
-		 * SIGINT is used to signal canceling the current action
-		 */
 		pqsignal(SIGINT, StatementCancelHandler);
-		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
-		pqsignal(SIGFPE, FloatExceptionHandler);
-
-		/* XXX Any other handlers needed here? */
-	}
 	else
-	{
 		pqsignal(SIGINT, SIG_IGN);
+
+	/*
+	 * Everything with a PGPROC should be able to receive procsignal.h style
+	 * signals.
+	 */
+	if (worker->bgw_flags & (BGWORKER_BACKEND_DATABASE_CONNECTION |
+							 BGWORKER_SHMEM_ACCESS))
+		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	else
 		pqsignal(SIGUSR1, bgworker_sigusr1_handler);
+
+	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
+		pqsignal(SIGFPE, FloatExceptionHandler);
+	else
 		pqsignal(SIGFPE, SIG_IGN);
-	}
+
 	pqsignal(SIGTERM, bgworker_die);
 	pqsignal(SIGHUP, SIG_IGN);
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 8ec16a3fb8..80a8e3cf4b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -51,6 +51,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -97,7 +98,6 @@ static volatile sig_atomic_t shutdown_requested = false;
 static void bg_quickdie(SIGNAL_ARGS);
 static void BgSigHupHandler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
-static void bgwriter_sigusr1_handler(SIGNAL_ARGS);
 
 
 /*
@@ -115,10 +115,7 @@ BackgroundWriterMain(void)
 	WritebackContext wb_context;
 
 	/*
-	 * Properly accept or ignore signals the postmaster might send us.
-	 *
-	 * bgwriter doesn't participate in ProcSignal signalling, but a SIGUSR1
-	 * handler is still needed for latch wakeups.
+	 * Properly accept or ignore signals that might be sent to us.
 	 */
 	pqsignal(SIGHUP, BgSigHupHandler);	/* set flag to read config file */
 	pqsignal(SIGINT, SIG_IGN);
@@ -126,7 +123,7 @@ BackgroundWriterMain(void)
 	pqsignal(SIGQUIT, bg_quickdie); /* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, bgwriter_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN);
 
 	/*
@@ -261,6 +258,10 @@ BackgroundWriterMain(void)
 			proc_exit(0);		/* done */
 		}
 
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Do one cycle of dirty-buffer writing.
 		 */
@@ -428,14 +429,3 @@ ReqShutdownHandler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
-
-/* SIGUSR1: used for latch wakeups */
-static void
-bgwriter_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 61544f65ad..def9aa87d8 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -54,6 +54,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
@@ -179,7 +180,6 @@ static void UpdateSharedMemoryConfig(void);
 static void chkpt_quickdie(SIGNAL_ARGS);
 static void ChkptSigHupHandler(SIGNAL_ARGS);
 static void ReqCheckpointHandler(SIGNAL_ARGS);
-static void chkpt_sigusr1_handler(SIGNAL_ARGS);
 static void ReqShutdownHandler(SIGNAL_ARGS);
 
 
@@ -211,7 +211,7 @@ CheckpointerMain(void)
 	pqsignal(SIGQUIT, chkpt_quickdie);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, chkpt_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, ReqShutdownHandler);	/* request shutdown */
 
 	/*
@@ -346,6 +346,10 @@ CheckpointerMain(void)
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
 
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Process any requests or signals received recently.
 		 */
@@ -853,17 +857,6 @@ ReqCheckpointHandler(SIGNAL_ARGS)
 	errno = save_errno;
 }
 
-/* SIGUSR1: used for latch wakeups */
-static void
-chkpt_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to run a shutdown checkpoint and exit */
 static void
 ReqShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 011076c3e3..819381a2ae 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3765,6 +3765,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_GLOBAL_BARRIER:
+			event_name = "GlobalBarrier";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATING:
 			event_name = "Hash/Batch/Allocating";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 5048a2c2aa..da0a670bdf 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -30,6 +30,7 @@
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
+#include "storage/procsignal.h"
 #include "storage/standby.h"
 #include "utils/guc.h"
 #include "utils/timeout.h"
@@ -50,7 +51,6 @@ static volatile sig_atomic_t in_restore_command = false;
 
 /* Signal handlers */
 static void startupproc_quickdie(SIGNAL_ARGS);
-static void StartupProcSigUsr1Handler(SIGNAL_ARGS);
 static void StartupProcTriggerHandler(SIGNAL_ARGS);
 static void StartupProcSigHupHandler(SIGNAL_ARGS);
 
@@ -87,17 +87,6 @@ startupproc_quickdie(SIGNAL_ARGS)
 }
 
 
-/* SIGUSR1: let latch facility handle the signal */
-static void
-StartupProcSigUsr1Handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGUSR2: set flag to finish recovery */
 static void
 StartupProcTriggerHandler(SIGNAL_ARGS)
@@ -162,6 +151,9 @@ HandleStartupProcInterrupts(void)
 	 */
 	if (IsUnderPostmaster && !PostmasterIsAlive())
 		exit(1);
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
@@ -181,7 +173,7 @@ StartupProcessMain(void)
 	pqsignal(SIGQUIT, startupproc_quickdie);	/* hard crash time */
 	InitializeTimeouts();		/* establishes SIGALRM handler */
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, StartupProcSigUsr1Handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, StartupProcTriggerHandler);
 
 	/*
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a6fdba3f41..19120aa6e1 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -55,6 +55,7 @@
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
 #include "storage/proc.h"
+#include "storage/procsignal.h"
 #include "storage/smgr.h"
 #include "utils/guc.h"
 #include "utils/hsearch.h"
@@ -86,7 +87,6 @@ static volatile sig_atomic_t shutdown_requested = false;
 static void wal_quickdie(SIGNAL_ARGS);
 static void WalSigHupHandler(SIGNAL_ARGS);
 static void WalShutdownHandler(SIGNAL_ARGS);
-static void walwriter_sigusr1_handler(SIGNAL_ARGS);
 
 /*
  * Main entry point for walwriter process
@@ -114,7 +114,7 @@ WalWriterMain(void)
 	pqsignal(SIGQUIT, wal_quickdie);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, walwriter_sigusr1_handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN); /* not used */
 
 	/*
@@ -255,6 +255,8 @@ WalWriterMain(void)
 			/* Normal exit from the walwriter is here */
 			proc_exit(0);		/* done */
 		}
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
 
 		/*
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
@@ -337,14 +339,3 @@ WalShutdownHandler(SIGNAL_ARGS)
 
 	errno = save_errno;
 }
-
-/* SIGUSR1: used for latch wakeups */
-static void
-walwriter_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 6abc780778..9acdbdd7c9 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -63,6 +63,7 @@
 #include "storage/ipc.h"
 #include "storage/pmsignal.h"
 #include "storage/procarray.h"
+#include "storage/procsignal.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/pg_lsn.h"
@@ -125,7 +126,6 @@ static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
 
 /* Signal handlers */
 static void WalRcvSigHupHandler(SIGNAL_ARGS);
-static void WalRcvSigUsr1Handler(SIGNAL_ARGS);
 static void WalRcvShutdownHandler(SIGNAL_ARGS);
 static void WalRcvQuickDieHandler(SIGNAL_ARGS);
 
@@ -147,9 +147,8 @@ void
 ProcessWalRcvInterrupts(void)
 {
 	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32.
+	 * The CHECK_FOR_INTERRUPTS() call ensures global barriers are handled,
+	 * and incoming signals on Win32 are received.
 	 */
 	CHECK_FOR_INTERRUPTS();
 
@@ -252,7 +251,7 @@ WalReceiverMain(void)
 	pqsignal(SIGQUIT, WalRcvQuickDieHandler);	/* hard crash time */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
-	pqsignal(SIGUSR1, WalRcvSigUsr1Handler);
+	pqsignal(SIGUSR1, procsignal_sigusr1_handler);
 	pqsignal(SIGUSR2, SIG_IGN);
 
 	/* Reset some signals that are accepted by postmaster but not here */
@@ -766,17 +765,6 @@ WalRcvSigHupHandler(SIGNAL_ARGS)
 }
 
 
-/* SIGUSR1: used by latch mechanism */
-static void
-WalRcvSigUsr1Handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /* SIGTERM: set flag for ProcessWalRcvInterrupts */
 static void
 WalRcvShutdownHandler(SIGNAL_ARGS)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 483f705305..c8c48d8497 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1885,6 +1885,10 @@ BufferSync(int flags)
 
 		cur_tsid = CkptBufferIds[i].tsId;
 
+		/* XXX: need a more principled approach here */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Grow array of per-tablespace status structs, every time a new
 		 * tablespace is found.
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 7605b2c367..9aed52df4a 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,8 +18,10 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "replication/walsender.h"
 #include "storage/latch.h"
 #include "storage/ipc.h"
@@ -62,9 +64,11 @@ typedef struct
 
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
+volatile sig_atomic_t GlobalBarrierInterruptPending = false;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
+static void HandleGlobalBarrierSignal(void);
 
 /*
  * ProcSignalShmemSize
@@ -262,6 +266,8 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pg_read_barrier();
+
 	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
 		HandleCatchupInterrupt();
 
@@ -292,9 +298,144 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_GLOBAL_BARRIER))
+		HandleGlobalBarrierSignal();
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
 
 	errno = save_errno;
 }
+
+/*
+ *
+ */
+uint64
+EmitGlobalBarrier(GlobalBarrierKind kind)
+{
+	uint64 generation;
+
+	/*
+	 * Broadcast flag, without incrementing generation. This ensures that all
+	 * backends could know about this.
+	 *
+	 * It's OK if the to-be-signalled backend enters after our check here. A
+	 * new backend should have current settings.
+	 */
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+
+		if (proc->pid == 0)
+			continue;
+
+		pg_atomic_fetch_or_u32(&proc->barrierFlags, (uint32) kind);
+
+		elog(LOG, "setting flags for %u", proc->pid);
+	}
+
+	/*
+	 * Broadcast flag generation. If any backend joins after this, it's either
+	 * going to be signalled below, or has read a new enough generation that
+	 * WaitForGlobalBarrier() will not wait for it.
+	 */
+	generation = pg_atomic_add_fetch_u64(&ProcGlobal->globalBarrierGen, 1);
+
+	/* Wake up each backend (including ours) */
+	for (int i = 0; i < NumProcSignalSlots; i++)
+	{
+		ProcSignalSlot *slot = &ProcSignalSlots[i];
+
+		if (slot->pss_pid == 0)
+			continue;
+
+		/* Atomically set the proper flag */
+		slot->pss_signalFlags[PROCSIG_GLOBAL_BARRIER] = true;
+
+		pg_write_barrier();
+
+		/* Send signal */
+		kill(slot->pss_pid, SIGUSR1);
+	}
+
+	return generation;
+}
+
+/*
+ * Wait for all barriers to be absorbed.  This guarantees that all changes
+ * requested by a specific EmitGlobalBarrier() have taken effect.
+ */
+void
+WaitForGlobalBarrier(uint64 generation)
+{
+	pgstat_report_wait_start(WAIT_EVENT_GLOBAL_BARRIER);
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+		uint64 oldval;
+
+		pg_memory_barrier();
+		oldval = pg_atomic_read_u64(&proc->barrierGen);
+
+		/*
+		 * Unused proc slots get their barrierGen set to UINT64_MAX, so we
+		 * need not care about that.
+		 */
+		while (oldval < generation)
+		{
+			CHECK_FOR_INTERRUPTS();
+			pg_usleep(10000);
+
+			pg_memory_barrier();
+			oldval = pg_atomic_read_u64(&proc->barrierGen);
+		}
+	}
+	pgstat_report_wait_end();
+}
+
+/*
+ * Absorb the global barrier procsignal.
+ */
+static void
+HandleGlobalBarrierSignal(void)
+{
+	InterruptPending = true;
+	GlobalBarrierInterruptPending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Perform global barrier related interrupt checking. If CHECK_FOR_INTERRUPTS
+ * is used, it'll be called by that, if a backend type doesn't do so, it has
+ * to be called explicitly.
+ */
+void
+ProcessGlobalBarrierIntterupt(void)
+{
+	if (GlobalBarrierInterruptPending)
+	{
+		uint64 generation;
+		uint32 flags;
+
+		GlobalBarrierInterruptPending = false;
+
+		generation = pg_atomic_read_u64(&ProcGlobal->globalBarrierGen);
+		pg_memory_barrier();
+		flags = pg_atomic_exchange_u32(&MyProc->barrierFlags, 0);
+		pg_memory_barrier();
+
+		if (flags & GLOBBAR_CHECKSUM)
+		{
+			/*
+			 * By virtue of getting here (i.e. interrupts being processed), we
+			 * know that this backend won't have any in-progress writes (which
+			 * might have missed the checksum change).
+			 */
+		}
+
+		pg_atomic_write_u64(&MyProc->barrierGen, generation);
+
+		elog(LOG, "processed interrupts for %u", MyProcPid);
+	}
+}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 498373fd0e..ae52b9e9ac 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -190,6 +190,7 @@ InitProcGlobal(void)
 	ProcGlobal->checkpointerLatch = NULL;
 	pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
 	pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+	pg_atomic_init_u64(&ProcGlobal->globalBarrierGen, 1);
 
 	/*
 	 * Create and initialize all the PGPROC structures we'll need.  There are
@@ -284,6 +285,9 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+
+		pg_atomic_init_u32(&procs[i].barrierFlags, 0);
+		pg_atomic_init_u64(&procs[i].barrierGen, PG_UINT64_MAX);
 	}
 
 	/*
@@ -442,6 +446,12 @@ InitProcess(void)
 	MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
 	Assert(pg_atomic_read_u32(&MyProc->clogGroupNext) == INVALID_PGPROCNO);
 
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 	/*
 	 * Acquire ownership of the PGPROC's latch, so that we can use WaitLatch
 	 * on it.  That allows us to repoint the process latch, which so far
@@ -585,6 +595,13 @@ InitAuxiliaryProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -883,6 +900,9 @@ ProcKill(int code, Datum arg)
 		LWLockRelease(leader_lwlock);
 	}
 
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen, PG_UINT64_MAX);
+
 	/*
 	 * Reset MyLatch to the process local one.  This is so that signal
 	 * handlers et al can continue using the latch after the shared latch
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e8d8e6f828..976e966565 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -612,6 +612,10 @@ ProcessClientWriteInterrupt(bool blocked)
 			SetLatch(MyLatch);
 	}
 
+	/* safe to handle during client communication */
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
+
 	errno = save_errno;
 }
 
@@ -3159,6 +3163,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..c997add881 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,6 +824,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_GLOBAL_BARRIER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATING,
 	WAIT_EVENT_HASH_BATCH_ELECTING,
 	WAIT_EVENT_HASH_BATCH_LOADING,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db725..f108ac52c6 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -203,6 +203,13 @@ struct PGPROC
 	PGPROC	   *lockGroupLeader;	/* lock group leader, if I'm a member */
 	dlist_head	lockGroupMembers;	/* list of members, if I'm a leader */
 	dlist_node	lockGroupLink;	/* my member link, if I'm a member */
+
+	/*
+	 * Support for "super barriers". These can be used to e.g. make sure that
+	 * all backends have acknowledged a configuration change.
+	 */
+	pg_atomic_uint64 barrierGen;
+	pg_atomic_uint32 barrierFlags;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */
@@ -272,6 +279,8 @@ typedef struct PROC_HDR
 	int			startupProcPid;
 	/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
 	int			startupBufferPinWaitBufId;
+
+	pg_atomic_uint64 globalBarrierGen;
 } PROC_HDR;
 
 extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a05c..a978db9b24 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -14,8 +14,9 @@
 #ifndef PROCSIGNAL_H
 #define PROCSIGNAL_H
 
-#include "storage/backendid.h"
+#include <signal.h>
 
+#include "storage/backendid.h"
 
 /*
  * Reasons for signalling a Postgres child process (a backend or an auxiliary
@@ -42,6 +43,8 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	PROCSIG_GLOBAL_BARRIER,
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
@@ -57,4 +60,22 @@ extern int	SendProcSignal(pid_t pid, ProcSignalReason reason,
 
 extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
 
+/*
+ * These collapse. The flag values better be distinct bits.
+ */
+typedef enum GlobalBarrierKind
+{
+	/*
+	 * Guarantee that all processes have the correct view of whether checksums
+	 * enabled/disabled, and no writes are in-progress with previous value(s).
+	 */
+	GLOBBAR_CHECKSUM = 1 << 0
+} GlobalBarrierKind;
+
+extern uint64 EmitGlobalBarrier(GlobalBarrierKind kind);
+extern void WaitForGlobalBarrier(uint64 generation);
+extern void ProcessGlobalBarrierIntterupt(void);
+
+extern PGDLLIMPORT volatile sig_atomic_t GlobalBarrierInterruptPending;
+
 #endif							/* PROCSIGNAL_H */
-- 
2.11.0

0002-Online-checksums-patch-for-v13.patchtext/x-patch; charset=US-ASCII; name=0002-Online-checksums-patch-for-v13.patchDownload

From 08ea831dc3e6b22ce914fea50436e0dba0345816 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 10 Jul 2019 11:25:25 +0200
Subject: [PATCH 2/2] Online checksums patch for v13

Updated from previous patches and now using the global barriers
---
 doc/src/sgml/func.sgml                      |  65 ++
 doc/src/sgml/ref/initdb.sgml                |   7 +-
 doc/src/sgml/wal.sgml                       |  81 +++
 src/backend/access/rmgrdesc/xlogdesc.c      |  16 +
 src/backend/access/transam/xlog.c           | 131 +++-
 src/backend/access/transam/xlogfuncs.c      |  57 ++
 src/backend/catalog/system_views.sql        |   5 +
 src/backend/postmaster/Makefile             |   5 +-
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/checksumhelper.c     | 909 ++++++++++++++++++++++++++++
 src/backend/postmaster/pgstat.c             |   6 +
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/logical/decode.c    |   1 +
 src/backend/storage/ipc/ipci.c              |   2 +
 src/backend/storage/lmgr/lwlocknames.txt    |   1 +
 src/backend/storage/page/README             |   3 +-
 src/backend/storage/page/bufpage.c          |   6 +-
 src/backend/utils/adt/pgstatfuncs.c         |   4 +-
 src/backend/utils/misc/guc.c                |  36 +-
 src/bin/pg_upgrade/controldata.c            |   9 +
 src/bin/pg_upgrade/pg_upgrade.h             |   2 +-
 src/include/access/xlog.h                   |  10 +-
 src/include/access/xlog_internal.h          |   7 +
 src/include/catalog/pg_control.h            |   1 +
 src/include/catalog/pg_proc.dat             |  16 +
 src/include/pgstat.h                        |   4 +-
 src/include/postmaster/checksumhelper.h     |  31 +
 src/include/storage/bufpage.h               |   1 +
 src/include/storage/checksum.h              |   7 +
 src/test/Makefile                           |   3 +-
 src/test/checksum/.gitignore                |   2 +
 src/test/checksum/Makefile                  |  24 +
 src/test/checksum/README                    |  22 +
 src/test/checksum/t/001_standby_checksum.pl | 104 ++++
 34 files changed, 1552 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0aa399dc2f..bc1f128574 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21272,6 +21272,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..b545ad73cb 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -217,9 +217,10 @@ PostgreSQL documentation
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
         may incur a noticeable performance penalty. If set, checksums
-        are calculated for all objects, in all databases. All checksum
-        failures will be reported in the
-        <xref linkend="pg-stat-database-view"/> view.
+        are calculated for all objects, in all databases. All
+        checksum failures will be reported in the <xref
+        linkend="pg-stat-database-view"/> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..7838f3616a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,87 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3042..ced4ab6d78 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61ba6b852e..a7adae44dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -867,6 +867,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1049,7 +1050,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4779,10 +4780,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 /*
@@ -4819,12 +4816,93 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7761,6 +7839,18 @@ StartupXLOG(void)
 	CompleteCommitTsInitialization();
 
 	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 * This is because we cannot launch a dynamic background worker directly
+	 * from here, it has to be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
 	 * All done with end-of-recovery actions.
 	 *
 	 * Now allow backends to write WAL and update the control file status in
@@ -9484,6 +9574,24 @@ XLogReportParameters(void)
 }
 
 /*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
+/*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
@@ -9934,6 +10042,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 8f179887ab..4bd87e887c 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/smgr.h"
@@ -785,3 +786,59 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	StartChecksumHelperLauncher(cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fe4a4794a..6649c9afe2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1155,6 +1155,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 71c23211b2..ee8f8c1cd3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -12,7 +12,8 @@ subdir = src/backend/postmaster
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o fork_process.o \
-	pgarch.o pgstat.o postmaster.o startup.o syslogger.o walwriter.o
+OBJS = autovacuum.o bgworker.o bgwriter.o checkpointer.o checksumhelper.o \
+	fork_process.o pgarch.o pgstat.o postmaster.o startup.o syslogger.o \
+	walwriter.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index f300f9285b..f40b7044bd 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..06db05979c
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,909 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * ChecksumHelperLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they
+	 * are in shared memory, they are never concurrently accessed. When
+	 * a worker is running, the launcher is only waiting for that worker
+	 * to finish.
+	 */
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+void
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->abort)
+	{
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been canceled")));
+	}
+
+	if (ChecksumHelperShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	ChecksumHelperShmem->launcher_started = true;
+	LWLockRelease(ChecksumHelperLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+		ChecksumHelperShmem->launcher_started = false;
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper launcher")));
+	}
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->launcher_started)
+		ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 * It's safe to check this without a lock, because if we miss it being
+		 * set, we will try again soon.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = false;
+	ChecksumHelperShmem->launcher_started = false;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL     hash_ctl;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This can probably never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult result;
+			Oid *oid;
+
+			/* Skup if this database has been processed already */
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+			{
+				pfree(db->dbname);
+				pfree(db);
+				continue;
+			}
+
+			result = ProcessDatabase(db);
+
+			/* Make a copy of the oid so we can free the rest of the structure */
+			oid = palloc(sizeof(Oid));
+			*oid = db->dboid;
+			pfree(db->dbname);
+			pfree(db);
+
+			hash_search(ProcessedDatabases, (void *) oid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (result == SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we
+				 * don't have to process them again.
+				 */
+				if (ChecksumHelperShmem->process_shared_catalogs)
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (result == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because the
+	 * database was dropped between us getting the database list and trying to
+	 * process it. Get a fresh list of databases to detect the second case
+	 * where the database was dropped before we had started processing it. If a
+	 * database still exists, but enabling checksums failed then we fail the
+	 * entire checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now
+	 * for testing.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relations types that have local storage
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 819381a2ae..b2aadc1a40 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4310,6 +4310,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index d0f210de8c..5e04c9bbe8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1398,7 +1398,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c53e7e2279..2f71e1d1f9 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -199,6 +199,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 885370698f..7f299f8e50 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/slot.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..d50b4b13e1 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+ChecksumHelperLock					45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f873fb0eea 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6b49810e37..6e3bfa045a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1171,7 +1171,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1198,7 +1198,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..61e856deac 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1527,7 +1527,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1545,7 +1545,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2178e1cf5e..dc4402189d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -71,6 +72,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/standby.h"
 #include "storage/fd.h"
@@ -471,6 +473,16 @@ static struct config_enum_entry shared_memory_options[] = {
 };
 
 /*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
+/*
  * Options for enum values stored in other modules
  */
 extern const struct config_enum_entry wal_level_options[];
@@ -572,7 +584,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1824,17 +1836,6 @@ static struct config_bool ConfigureNamesBool[] =
 	},
 
 	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
-	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
 			NULL
@@ -4537,6 +4538,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 38236415be..9e196cc2dd 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -658,6 +658,15 @@ check_control_data(ControlData *oldctrl,
 	 */
 
 	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
+	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
 	 */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index f724ecf9ca..66758bbd7f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -220,7 +220,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..ae02c09c84 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -189,7 +189,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -293,7 +293,13 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 3f0de6625d..d4e3a3eab2 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -241,6 +242,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index ff98d9e91a..8177414854 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58ea5b982b..4fc08d19d6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10665,6 +10665,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c997add881..346de83de9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -727,7 +727,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..556f801668
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+void		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 4ef6d8ddd4..cf31f24b01 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 7ef32a3baa..2c414aa1e7 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..22a3b64dd8
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..891743fa6c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the master
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.11.0

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 6 years ago

In reply to: Magnus Hagander (#3)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Bruce Momjian

bruce@momjian.us

over 6 years ago

In reply to: Tomas Vondra (#4)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it. Is that a
method we support now for adding checksums to Postgres? Do we need the
ability to do it in-place too?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Bruce Momjian (#5)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any

reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how

sad

your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it

will

work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it. Is that a
method we support now for adding checksums to Postgres? Do we need the
ability to do it in-place too?

I definitely think we need the ability to do it in-place as well, yes.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Bruce Momjian

bruce@momjian.us

over 6 years ago

In reply to: Magnus Hagander (#6)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 04:57:41PM +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 4:53 PM Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Sep 30, 2019 at 02:49:44PM +0200, Tomas Vondra wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

Other than bots, this patch doesn't seem to have attracted any

reviewers

this time around.ï¿½ Perhaps you need to bribe someone?ï¿½ (Maybe "how

sad

your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it

will

work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

And now please uncomment my commit SSH key again, please ;-)

For adding cluster-level encryption to Postgres, the plan is to create a
standby that has encryption enabled, then switchover to it.ï¿½ Is that a
method we support now for adding checksums to Postgres?ï¿½ Do we need the
ability to do it in-place too?

I definitely think we need the ability to do it in-place as well, yes.ï¿½

OK, just has to ask. I think for encryption, for the first version, we
will do just replica-only changes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Tomas Vondra (#4)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

On Mon, Sep 30, 2019 at 01:03:20PM +0200, Magnus Hagander wrote:

On Thu, Sep 26, 2019 at 9:48 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

On 2019-Aug-26, Magnus Hagander wrote:

OK, let's try this again :)

This is work mainly based in the first version of the online checksums
patch, but based on top of Andres WIP patchset for global barriers (

/messages/by-id/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de

)

Andres patch has been enhanced with wait events per

/messages/by-id/CABUevEwy4LUFqePC5YzanwtzyDDpYvgrj6R5WNznwrO5ouVg1w@mail.gmail.com

.

Travis says your SGML doesn't compile (maybe you just forgot to "git
add" and edit allfiles.sgml?):

Nope, even easier -- the reference pgVerifyChecksums was renamed to
pgChecksums and for some reason we missed that in the merge.

I've rebased again on top of todays master, but that was the only change I
had to make.

Other than bots, this patch doesn't seem to have attracted any reviewers

this time around. Perhaps you need to bribe someone? (Maybe "how sad
your committer SSH key stopped working" would do?)

Hmm. I don't think that's a bribe, that's a threat. However, maybe it will
work.

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

Not to downvalue your review, but I'd really appreciate a review from
someone who was one of the ones who spotted the issue initially.

Especially -- Andres, any chance I can bribe you to take another look?

And now please uncomment my commit SSH key again, please ;-)

I'll consider it...

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Magnus Hagander (#8)

Re: Online checksums patch - once again

Hi,

On 2019-09-30 16:59:00 +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware of.
So +1 from me to go ahead and push it.

I don't think the global barrier part is necessarily ready. I wrote it
on a flight back from a conference, to allow Magnus to make some
progress. And I don't think it has received meaningful review so far.

Especially -- Andres, any chance I can bribe you to take another look?

I'll try to take a look.

Greetings,

Andres Freund

#10

Magnus Hagander

magnus@hagander.net

over 6 years ago

In reply to: Andres Freund (#9)

Re: Online checksums patch - once again

On Mon, Sep 30, 2019 at 6:11 PM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-09-30 16:59:00 +0200, Magnus Hagander wrote:

On Mon, Sep 30, 2019 at 2:49 PM Tomas Vondra <

tomas.vondra@2ndquadrant.com>

wrote:

IMHO the patch is ready to go - I think the global barrier solves the
issue in the previous version, and that's the only problem I'm aware

of.

So +1 from me to go ahead and push it.

I don't think the global barrier part is necessarily ready. I wrote it
on a flight back from a conference, to allow Magnus to make some
progress. And I don't think it has received meaningful review so far.

I don't believe it has, no. I wouldn't trust my own level of review a
tleast :)

Especially -- Andres, any chance I can bribe you to take another look?

I'll try to take a look.

Much appreciated!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#11

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Magnus Hagander (#10)

Re: Online checksums patch - once again

On Wed, Oct 02, 2019 at 08:59:27PM +0200, Magnus Hagander wrote:

Much appreciated!

The latest patch does not apply, could you send a rebase? Moved it to
next CF, waiting on author.
--
Michael

#12

Daniel Gustafsson

daniel@yesql.se

about 6 years ago

In reply to: Michael Paquier (#11)

2 attachment(s)

Re: Online checksums patch - once again

On 1 Dec 2019, at 03:32, Michael Paquier <michael@paquier.xyz> wrote:

The latest patch does not apply, could you send a rebase? Moved it to
next CF, waiting on author.

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

cheers ./daniel

Attachments:

0001-Global-Barriers.patchapplication/octet-stream; name=0001-Global-Barriers.patch; x-unix-mode=0644Download

From f9845f70b90859f1816df53cfd0a692896cf842c Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 3 Dec 2019 19:00:40 +0100
Subject: [PATCH 1/2] Global Barriers

---
 src/backend/postmaster/autovacuum.c   |   3 +-
 src/backend/postmaster/bgworker.c     |  31 ++++--
 src/backend/postmaster/bgwriter.c     |   3 +
 src/backend/postmaster/checkpointer.c |   3 +
 src/backend/postmaster/pgstat.c       |   3 +
 src/backend/postmaster/startup.c      |   3 +
 src/backend/postmaster/walwriter.c    |   2 +
 src/backend/replication/walreceiver.c |   9 +-
 src/backend/storage/buffer/bufmgr.c   |   4 +
 src/backend/storage/ipc/procsignal.c  | 141 ++++++++++++++++++++++++++
 src/backend/storage/lmgr/proc.c       |  20 ++++
 src/backend/tcop/postgres.c           |   7 ++
 src/include/pgstat.h                  |   1 +
 src/include/storage/proc.h            |   9 ++
 src/include/storage/procsignal.h      |  23 ++++-
 15 files changed, 244 insertions(+), 18 deletions(-)

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c1dd8168ca..623934e084 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -649,8 +649,9 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		ResetLatch(MyLatch);
 
-		/* Process sinval catchup interrupts that happened while sleeping */
+		/* Process pending interrupts. */
 		ProcessCatchupInterrupt();
+		ProcessGlobalBarrierIntterupt();
 
 		/* the normal shutdown case */
 		if (got_SIGTERM)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5f8a007e73..51612257c3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -734,23 +734,32 @@ StartBackgroundWorker(void)
 	/*
 	 * Set up signal handlers.
 	 */
+
+
+	/*
+	 * SIGINT is used to signal canceling the current action for processes
+	 * able to run queries.
+	 */
 	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
-	{
-		/*
-		 * SIGINT is used to signal canceling the current action
-		 */
 		pqsignal(SIGINT, StatementCancelHandler);
-		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
-		pqsignal(SIGFPE, FloatExceptionHandler);
-
-		/* XXX Any other handlers needed here? */
-	}
 	else
-	{
 		pqsignal(SIGINT, SIG_IGN);
+
+	/*
+	 * Everything with a PGPROC should be able to receive procsignal.h style
+	 * signals.
+	 */
+	if (worker->bgw_flags & (BGWORKER_BACKEND_DATABASE_CONNECTION |
+							 BGWORKER_SHMEM_ACCESS))
+		pqsignal(SIGUSR1, procsignal_sigusr1_handler);
+	else
 		pqsignal(SIGUSR1, bgworker_sigusr1_handler);
+
+	if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
+		pqsignal(SIGFPE, FloatExceptionHandler);
+	else
 		pqsignal(SIGFPE, SIG_IGN);
-	}
+
 	pqsignal(SIGTERM, bgworker_die);
 	pqsignal(SIGHUP, SIG_IGN);
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 2fa631ea7a..39e2a5bb21 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -256,6 +256,9 @@ BackgroundWriterMain(void)
 			/* Normal exit from the bgwriter is here */
 			proc_exit(0);		/* done */
 		}
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
 
 		/*
 		 * Do one cycle of dirty-buffer writing.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index d93c941871..a78facce50 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -346,6 +346,9 @@ CheckpointerMain(void)
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
 
+		/* Process all pending interrupts. */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
 		/*
 		 * Process any requests or signals received recently.
 		 */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index fabcf31de8..8ff66e0c13 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3763,6 +3763,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
+		case WAIT_EVENT_GLOBAL_BARRIER:
+			event_name = "GlobalBarrier";
+			break;
 		case WAIT_EVENT_HASH_BATCH_ALLOCATING:
 			event_name = "Hash/Batch/Allocating";
 			break;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index f43e57dadb..da0a670bdf 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -151,6 +151,9 @@ HandleStartupProcInterrupts(void)
 	 */
 	if (IsUnderPostmaster && !PostmasterIsAlive())
 		exit(1);
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index cce9713408..19120aa6e1 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -255,6 +255,8 @@ WalWriterMain(void)
 			/* Normal exit from the walwriter is here */
 			proc_exit(0);		/* done */
 		}
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
 
 		/*
 		 * Do what we're here for; then, if XLogBackgroundFlush() found useful
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index c1e439adb4..2a6617876a 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -146,11 +146,10 @@ static void WalRcvQuickDieHandler(SIGNAL_ARGS);
 void
 ProcessWalRcvInterrupts(void)
 {
-	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32.
-	 */
+  	/*
+	 * The CHECK_FOR_INTERRUPTS() call ensures global barriers are handled,
+	 * and incoming signals on Win32 are received.
+  	 */
 	CHECK_FOR_INTERRUPTS();
 
 	if (got_SIGTERM)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 7ad10736d5..92b5bbfcf3 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1885,6 +1885,10 @@ BufferSync(int flags)
 
 		cur_tsid = CkptBufferIds[i].tsId;
 
+		/* XXX: need a more principled approach here */
+		if (GlobalBarrierInterruptPending)
+			ProcessGlobalBarrierIntterupt();
+
 		/*
 		 * Grow array of per-tablespace status structs, every time a new
 		 * tablespace is found.
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index fde97a1036..434fb17c33 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,8 +18,10 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "replication/walsender.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
@@ -61,9 +63,11 @@ typedef struct
 
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
+volatile sig_atomic_t GlobalBarrierInterruptPending = false;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
+static void HandleGlobalBarrierSignal(void);
 
 /*
  * ProcSignalShmemSize
@@ -261,6 +265,8 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 {
 	int			save_errno = errno;
 
+	pg_read_barrier();
+
 	if (CheckProcSignal(PROCSIG_CATCHUP_INTERRUPT))
 		HandleCatchupInterrupt();
 
@@ -291,9 +297,144 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_GLOBAL_BARRIER))
+		HandleGlobalBarrierSignal();
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
 
 	errno = save_errno;
 }
+
+/*
+ *
+ */
+uint64
+EmitGlobalBarrier(GlobalBarrierKind kind)
+{
+	uint64 generation;
+
+	/*
+	 * Broadcast flag, without incrementing generation. This ensures that all
+	 * backends could know about this.
+	 *
+	 * It's OK if the to-be-signalled backend enters after our check here. A
+	 * new backend should have current settings.
+	 */
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+
+		if (proc->pid == 0)
+			continue;
+
+		pg_atomic_fetch_or_u32(&proc->barrierFlags, (uint32) kind);
+
+		elog(LOG, "setting flags for %u", proc->pid);
+	}
+
+	/*
+	 * Broadcast flag generation. If any backend joins after this, it's either
+	 * going to be signalled below, or has read a new enough generation that
+	 * WaitForGlobalBarrier() will not wait for it.
+	 */
+	generation = pg_atomic_add_fetch_u64(&ProcGlobal->globalBarrierGen, 1);
+
+	/* Wake up each backend (including ours) */
+	for (int i = 0; i < NumProcSignalSlots; i++)
+	{
+		ProcSignalSlot *slot = &ProcSignalSlots[i];
+
+		if (slot->pss_pid == 0)
+			continue;
+
+		/* Atomically set the proper flag */
+		slot->pss_signalFlags[PROCSIG_GLOBAL_BARRIER] = true;
+
+		pg_write_barrier();
+
+		/* Send signal */
+		kill(slot->pss_pid, SIGUSR1);
+	}
+
+	return generation;
+}
+
+/*
+ * Wait for all barriers to be absorbed.  This guarantees that all changes
+ * requested by a specific EmitGlobalBarrier() have taken effect.
+ */
+void
+WaitForGlobalBarrier(uint64 generation)
+{
+	pgstat_report_wait_start(WAIT_EVENT_GLOBAL_BARRIER);
+	for (int i = 0; i < (MaxBackends + max_prepared_xacts); i++)
+	{
+		PGPROC *proc = &ProcGlobal->allProcs[i];
+		uint64 oldval;
+
+		pg_memory_barrier();
+		oldval = pg_atomic_read_u64(&proc->barrierGen);
+
+		/*
+		 * Unused proc slots get their barrierGen set to UINT64_MAX, so we
+		 * need not care about that.
+		 */
+		while (oldval < generation)
+		{
+			CHECK_FOR_INTERRUPTS();
+			pg_usleep(10000);
+
+			pg_memory_barrier();
+			oldval = pg_atomic_read_u64(&proc->barrierGen);
+		}
+	}
+	pgstat_report_wait_end();
+}
+
+/*
+ * Absorb the global barrier procsignal.
+ */
+static void
+HandleGlobalBarrierSignal(void)
+{
+	InterruptPending = true;
+	GlobalBarrierInterruptPending = true;
+	SetLatch(MyLatch);
+}
+
+/*
+ * Perform global barrier related interrupt checking. If CHECK_FOR_INTERRUPTS
+ * is used, it'll be called by that, if a backend type doesn't do so, it has
+ * to be called explicitly.
+ */
+void
+ProcessGlobalBarrierIntterupt(void)
+{
+	if (GlobalBarrierInterruptPending)
+	{
+		uint64 generation;
+		uint32 flags;
+
+		GlobalBarrierInterruptPending = false;
+
+		generation = pg_atomic_read_u64(&ProcGlobal->globalBarrierGen);
+		pg_memory_barrier();
+		flags = pg_atomic_exchange_u32(&MyProc->barrierFlags, 0);
+		pg_memory_barrier();
+
+		if (flags & GLOBBAR_CHECKSUM)
+		{
+			/*
+			 * By virtue of getting here (i.e. interrupts being processed), we
+			 * know that this backend won't have any in-progress writes (which
+			 * might have missed the checksum change).
+			 */
+		}
+
+		pg_atomic_write_u64(&MyProc->barrierGen, generation);
+
+		elog(LOG, "processed interrupts for %u", MyProcPid);
+	}
+}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index fff0628e58..27d8a20fca 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -189,6 +189,7 @@ InitProcGlobal(void)
 	ProcGlobal->checkpointerLatch = NULL;
 	pg_atomic_init_u32(&ProcGlobal->procArrayGroupFirst, INVALID_PGPROCNO);
 	pg_atomic_init_u32(&ProcGlobal->clogGroupFirst, INVALID_PGPROCNO);
+	pg_atomic_init_u64(&ProcGlobal->globalBarrierGen, 1);
 
 	/*
 	 * Create and initialize all the PGPROC structures we'll need.  There are
@@ -283,6 +284,9 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+
+		pg_atomic_init_u32(&procs[i].barrierFlags, 0);
+		pg_atomic_init_u64(&procs[i].barrierGen, PG_UINT64_MAX);
 	}
 
 	/*
@@ -441,6 +445,12 @@ InitProcess(void)
 	MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
 	Assert(pg_atomic_read_u32(&MyProc->clogGroupNext) == INVALID_PGPROCNO);
 
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 	/*
 	 * Acquire ownership of the PGPROC's latch, so that we can use WaitLatch
 	 * on it.  That allows us to repoint the process latch, which so far
@@ -584,6 +594,13 @@ InitAuxiliaryProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+
+	/* pairs with globalBarrierGen increase */
+	pg_memory_barrier();
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen,
+						pg_atomic_read_u64(&ProcGlobal->globalBarrierGen));
+
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -882,6 +899,9 @@ ProcKill(int code, Datum arg)
 		LWLockRelease(leader_lwlock);
 	}
 
+	pg_atomic_write_u32(&MyProc->barrierFlags, 0);
+	pg_atomic_write_u64(&MyProc->barrierGen, PG_UINT64_MAX);
+
 	/*
 	 * Reset MyLatch to the process local one.  This is so that signal
 	 * handlers et al can continue using the latch after the shared latch
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3b85e48333..b721b5a929 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -606,6 +606,10 @@ ProcessClientWriteInterrupt(bool blocked)
 			SetLatch(MyLatch);
 	}
 
+	/* safe to handle during client communication */
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
+
 	errno = save_errno;
 }
 
@@ -3181,6 +3185,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (GlobalBarrierInterruptPending)
+		ProcessGlobalBarrierIntterupt();
 }
 
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..c997add881 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -824,6 +824,7 @@ typedef enum
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
 	WAIT_EVENT_EXECUTE_GATHER,
+	WAIT_EVENT_GLOBAL_BARRIER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATING,
 	WAIT_EVENT_HASH_BATCH_ELECTING,
 	WAIT_EVENT_HASH_BATCH_LOADING,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 281e1db725..f108ac52c6 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -203,6 +203,13 @@ struct PGPROC
 	PGPROC	   *lockGroupLeader;	/* lock group leader, if I'm a member */
 	dlist_head	lockGroupMembers;	/* list of members, if I'm a leader */
 	dlist_node	lockGroupLink;	/* my member link, if I'm a member */
+
+	/*
+	 * Support for "super barriers". These can be used to e.g. make sure that
+	 * all backends have acknowledged a configuration change.
+	 */
+	pg_atomic_uint64 barrierGen;
+	pg_atomic_uint32 barrierFlags;
 };
 
 /* NOTE: "typedef struct PGPROC PGPROC" appears in storage/lock.h. */
@@ -272,6 +279,8 @@ typedef struct PROC_HDR
 	int			startupProcPid;
 	/* Buffer id of the buffer that Startup process waits for pin on, or -1 */
 	int			startupBufferPinWaitBufId;
+
+	pg_atomic_uint64 globalBarrierGen;
 } PROC_HDR;
 
 extern PGDLLIMPORT PROC_HDR *ProcGlobal;
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 05b186a05c..a978db9b24 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -14,8 +14,9 @@
 #ifndef PROCSIGNAL_H
 #define PROCSIGNAL_H
 
-#include "storage/backendid.h"
+#include <signal.h>
 
+#include "storage/backendid.h"
 
 /*
  * Reasons for signalling a Postgres child process (a backend or an auxiliary
@@ -42,6 +43,8 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	PROCSIG_GLOBAL_BARRIER,
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
@@ -57,4 +60,22 @@ extern int	SendProcSignal(pid_t pid, ProcSignalReason reason,
 
 extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
 
+/*
+ * These collapse. The flag values better be distinct bits.
+ */
+typedef enum GlobalBarrierKind
+{
+	/*
+	 * Guarantee that all processes have the correct view of whether checksums
+	 * enabled/disabled, and no writes are in-progress with previous value(s).
+	 */
+	GLOBBAR_CHECKSUM = 1 << 0
+} GlobalBarrierKind;
+
+extern uint64 EmitGlobalBarrier(GlobalBarrierKind kind);
+extern void WaitForGlobalBarrier(uint64 generation);
+extern void ProcessGlobalBarrierIntterupt(void);
+
+extern PGDLLIMPORT volatile sig_atomic_t GlobalBarrierInterruptPending;
+
 #endif							/* PROCSIGNAL_H */
-- 
2.21.0 (Apple Git-122.2)

0002-Online-Checksums-v14.patchapplication/octet-stream; name=0002-Online-Checksums-v14.patch; x-unix-mode=0644Download

From 8d275f4091958d4197cbc43b684b5b233b0bec1d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 4 Dec 2019 00:22:02 +0100
Subject: [PATCH 2/2] Online Checksums v14

---
 doc/src/sgml/func.sgml                      |  65 ++
 doc/src/sgml/ref/initdb.sgml                |   7 +-
 doc/src/sgml/wal.sgml                       |  81 ++
 src/backend/access/rmgrdesc/xlogdesc.c      |  16 +
 src/backend/access/transam/xlog.c           | 131 ++-
 src/backend/access/transam/xlogfuncs.c      |  57 ++
 src/backend/catalog/system_views.sql        |   5 +
 src/backend/postmaster/Makefile             |   1 +
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/checksumhelper.c     | 909 ++++++++++++++++++++
 src/backend/postmaster/pgstat.c             |   6 +
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/logical/decode.c    |   1 +
 src/backend/storage/ipc/ipci.c              |   2 +
 src/backend/storage/lmgr/lwlocknames.txt    |   1 +
 src/backend/storage/page/README             |   3 +-
 src/backend/storage/page/bufpage.c          |   6 +-
 src/backend/utils/adt/pgstatfuncs.c         |   4 +-
 src/backend/utils/misc/guc.c                |  36 +-
 src/bin/pg_upgrade/controldata.c            |   9 +
 src/bin/pg_upgrade/pg_upgrade.h             |   2 +-
 src/include/access/xlog.h                   |  10 +-
 src/include/access/xlog_internal.h          |   7 +
 src/include/catalog/pg_control.h            |   1 +
 src/include/catalog/pg_proc.dat             |  16 +
 src/include/pgstat.h                        |   4 +-
 src/include/postmaster/checksumhelper.h     |  31 +
 src/include/storage/bufpage.h               |   1 +
 src/include/storage/checksum.h              |   7 +
 src/test/Makefile                           |   3 +-
 src/test/checksum/.gitignore                |   2 +
 src/test/checksum/Makefile                  |  24 +
 src/test/checksum/README                    |  22 +
 src/test/checksum/t/001_standby_checksum.pl | 104 +++
 34 files changed, 1550 insertions(+), 33 deletions(-)
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..dca1745716 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21265,6 +21265,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>in progress</literal> and start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch to checksums
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..b545ad73cb 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -217,9 +217,10 @@ PostgreSQL documentation
         Use checksums on data pages to help detect corruption by the
         I/O system that would otherwise be silent. Enabling checksums
         may incur a noticeable performance penalty. If set, checksums
-        are calculated for all objects, in all databases. All checksum
-        failures will be reported in the
-        <xref linkend="pg-stat-database-view"/> view.
+        are calculated for all objects, in all databases. All
+        checksum failures will be reported in the <xref
+        linkend="pg-stat-database-view"/> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..7838f3616a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,87 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be enabled for a cluster.
+   When enabled, each data page will be assigned a checksum that is updated when the page is
+   written and verified every time the page is read. Only data pages are protected by checksums,
+   internal data structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using
+   <link linkend="app-initdb-data-checksums"><application>initdb</application></link>. They
+   can also be enabled or disabled at runtime. In all cases, checksums are enabled or disabled
+   at the full cluster level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the value
+   of the read-only configuration variable <xref linkend="guc-data-checksums" /> by
+   issuing the command <command>SHOW data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
+   protection in order to recover data. To do this, temporarily set the configuration parameter
+   <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line enabling of checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
+    During this time, checksums will be written but not verified. In addition to
+    this, a background worker process is started that enables checksums on all
+    existing data in the cluster. Once this worker has completed processing all
+    databases in the cluster, the checksum mode will automatically switch to
+    <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3042..ced4ab6d78 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..d6c716a773 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -867,6 +867,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1049,7 +1050,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4776,10 +4777,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 /*
@@ -4816,12 +4813,93 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForGlobalBarrier(EmitGlobalBarrier(GLOBBAR_CHECKSUM));
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7782,6 +7860,18 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums.
+	 * This is because we cannot launch a dynamic background worker directly
+	 * from here, it has to be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9505,6 +9595,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9956,6 +10064,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 1fccf29a36..02dd3cf383 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,59 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	StartChecksumHelperLauncher(cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..1d4660597a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1160,6 +1160,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 03e3d3650a..25f1c5e745 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	checksumhelper.o \
 	fork_process.o \
 	pgarch.o \
 	pgstat.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 51612257c3..4a0ee811fe 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
@@ -129,6 +130,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..06db05979c
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,909 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time, no extra process
+ * is required as each page is checksummed, and verified, at accesses.  When
+ * enabling checksums on an already running cluster, which was not initialized
+ * with checksums, this helper worker will ensure that all pages are
+ * checksummed before verification of the checksums is turned on.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * ChecksumHelperLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they
+	 * are in shared memory, they are never concurrently accessed. When
+	 * a worker is running, the launcher is only waiting for that worker
+	 * to finish.
+	 */
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+void
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->abort)
+	{
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been canceled")));
+	}
+
+	if (ChecksumHelperShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	ChecksumHelperShmem->launcher_started = true;
+	LWLockRelease(ChecksumHelperLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+		ChecksumHelperShmem->launcher_started = false;
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper launcher")));
+	}
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->launcher_started)
+		ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here.
+		 * It's safe to check this without a lock, because if we miss it being
+		 * set, we will try again soon.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there are no
+		 * pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = false;
+	ChecksumHelperShmem->launcher_started = false;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL     hash_ctl;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This can probably never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult result;
+			Oid *oid;
+
+			/* Skup if this database has been processed already */
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+			{
+				pfree(db->dbname);
+				pfree(db);
+				continue;
+			}
+
+			result = ProcessDatabase(db);
+
+			/* Make a copy of the oid so we can free the rest of the structure */
+			oid = palloc(sizeof(Oid));
+			*oid = db->dboid;
+			pfree(db->dbname);
+			pfree(db);
+
+			hash_search(ProcessedDatabases, (void *) oid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (result == SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we
+				 * don't have to process them again.
+				 */
+				if (ChecksumHelperShmem->process_shared_catalogs)
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (result == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because the
+	 * database was dropped between us getting the database list and trying to
+	 * process it. Get a fresh list of databases to detect the second case
+	 * where the database was dropped before we had started processing it. If a
+	 * database still exists, but enabling checksums failed then we fail the
+	 * entire checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now
+	 * for testing.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relations types that have local storage
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database.
+	 * We need to wait until they are all gone until we are done, since
+	 * we cannot access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums.
+	 * Any temp tables created after we started will already have checksums
+	 * in them (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List *CurrentTempTables;
+		ListCell *lc;
+		int numleft;
+		char activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8ff66e0c13..ca633f272c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4308,6 +4308,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..c18b0b4ecb 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1397,7 +1397,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index bc532d027b..5951f3250d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -196,6 +196,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 4829953ee6..4a9822a19d 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..d50b4b13e1 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+ChecksumHelperLock					45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f873fb0eea 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6b49810e37..6e3bfa045a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1171,7 +1171,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1198,7 +1198,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..61e856deac 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1527,7 +1527,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1545,7 +1545,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5fccc9683e..c065318af3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -72,6 +73,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -473,6 +475,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -579,7 +591,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1830,17 +1842,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4590,6 +4591,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 93f3c34b74..4fed2450fc 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 729f86aa32..97fc7b4de7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -220,7 +220,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9b588c87a5..dda9c9abad 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -189,7 +189,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -292,7 +292,13 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index e295dc65fb..a588cb7ed9 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -245,6 +246,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index cf7d4485e9..c79b3fa365 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..5ea54f968d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10663,6 +10663,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c997add881..346de83de9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -727,7 +727,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..556f801668
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+void		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 4ef6d8ddd4..cf31f24b01 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 7ef32a3baa..2c414aa1e7 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..22a3b64dd8
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..891743fa6c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the master
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.0 (Apple Git-122.2)

#13

Robert Haas

robertmhaas@gmail.com

about 6 years ago

In reply to: Daniel Gustafsson (#12)

Re: Online checksums patch - once again

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14

Daniel Gustafsson

daniel@yesql.se

about 6 years ago

In reply to: Robert Haas (#13)

Re: Online checksums patch - once again

On 5 Dec 2019, at 16:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:
Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

I am currently reviewing your latest patchset, but need a bit more time.

cheers ./daniel

#15

Robert Haas

robertmhaas@gmail.com

about 6 years ago

In reply to: Daniel Gustafsson (#14)

Re: Online checksums patch - once again

On Thu, Dec 5, 2019 at 10:28 AM Daniel Gustafsson <daniel@yesql.se> wrote:

I am currently reviewing your latest patchset, but need a bit more time.

Oh, great, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Daniel Gustafsson

daniel@yesql.se

about 6 years ago

In reply to: Robert Haas (#13)

1 attachment(s)

Re: Online checksums patch - once again

On 5 Dec 2019, at 16:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 3, 2019 at 6:41 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a rebased v14 patchset on top of maser. The Global Barriers patch
is left as a prerequisite, but it will obviously be dropped, or be
significantly changed, once the work Robert is doing with ProcSignalBarrier
lands.

Any chance you and/or Magnus could offer opinions on some of those
patches? I am reluctant to start committing things with nobody having
replied.

Attached is a v15 of the online checksums patchset (minus 0005), rebased on top
of your v3 ProcSignalBarrier patch rather than Andres' PoC GlobalBarrier patch.
It does take the, perhaps, controversial approach of replacing the SAMPLE
barrier with the CHECKSUM barrier. The cfbot will be angry since this email
doesn't contain the procsignalbarrier patch, but it sounded like that would go
in shortly so opted for that.

This version also contains touchups to the documentation part, as well as a
pgindent run.

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

+   /*
+    * Force a checkpoint to get everything out to disk. XXX: this should
+    * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+    * testing.
+    */
+   RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);

cheers ./daniel

Attachments:

online_checksums15.patchapplication/octet-stream; name=online_checksums15.patch; x-unix-mode=0644Download

From 5f771fcc396ba21171bff8b15e7c49d437f6bf85 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 16 Dec 2019 15:06:25 +0100
Subject: [PATCH] Online enable and disable of data checksums

This allows data checksums to be enabled, or disabled, in a running
cluster without requiring a restart. The data checksum mode will be
set to "inprogress" while the checksumhelper background worker walks
the database and rewrites all the data pages to include checksums.
Once all pages have been checksummed, the data checksum mode turns
to "on". The synchronization of the checksumhelper is based on the
procbarrier infrastructure.

This adds a new section in the documention for checksums, covering
the basics on how to enable and disable with the various means that
are possible. More information can probably be added covering other
aspects of data checksums.

In passing, this patch removes the SAMPLE procbarrier and replaces
it with the CHECKSUM procbarrier, as there is now a consumer of the
procbarrier API and the "dummy" barrier can be removed (it was needed
to keep the enum with something).
---
 doc/src/sgml/func.sgml                      |  65 ++
 doc/src/sgml/ref/initdb.sgml                |   1 +
 doc/src/sgml/wal.sgml                       |  96 +++
 src/backend/access/rmgrdesc/xlogdesc.c      |  16 +
 src/backend/access/transam/xlog.c           | 130 ++-
 src/backend/access/transam/xlogfuncs.c      |  57 ++
 src/backend/catalog/system_views.sql        |   5 +
 src/backend/postmaster/Makefile             |   1 +
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/checksumhelper.c     | 909 ++++++++++++++++++++
 src/backend/postmaster/pgstat.c             |   6 +
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/logical/decode.c    |   1 +
 src/backend/storage/ipc/ipci.c              |   2 +
 src/backend/storage/ipc/procsignal.c        |  20 +-
 src/backend/storage/lmgr/lwlocknames.txt    |   1 +
 src/backend/storage/page/README             |   3 +-
 src/backend/storage/page/bufpage.c          |   6 +-
 src/backend/utils/adt/pgstatfuncs.c         |   4 +-
 src/backend/utils/misc/guc.c                |  36 +-
 src/bin/pg_upgrade/controldata.c            |   9 +
 src/bin/pg_upgrade/pg_upgrade.h             |   2 +-
 src/include/access/xlog.h                   |  10 +-
 src/include/access/xlog_internal.h          |   7 +
 src/include/catalog/pg_control.h            |   1 +
 src/include/catalog/pg_proc.dat             |  16 +
 src/include/pgstat.h                        |   4 +-
 src/include/postmaster/checksumhelper.h     |  31 +
 src/include/storage/bufpage.h               |   1 +
 src/include/storage/checksum.h              |   7 +
 src/include/storage/procsignal.h            |   2 +-
 src/test/Makefile                           |   3 +-
 src/test/checksum/.gitignore                |   2 +
 src/test/checksum/Makefile                  |  24 +
 src/test/checksum/README                    |  22 +
 src/test/checksum/t/001_standby_checksum.pl | 104 +++
 36 files changed, 1570 insertions(+), 43 deletions(-)
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..05a1c393ed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21265,6 +21265,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>inprogress</literal> as well as start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch data checksums mode to
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..69b7f91cbc 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -220,6 +220,7 @@ PostgreSQL documentation
         are calculated for all objects, in all databases. All checksum
         failures will be reported in the
         <xref linkend="pg-stat-database-view"/> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..505c422af2 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,102 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later timne, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start over from the beginning.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="pg_checksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 33060f3042..ced4ab6d78 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..5d5b2ea2d9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -867,6 +867,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1049,7 +1050,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4776,10 +4777,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 /*
@@ -4816,12 +4813,93 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "Checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7782,6 +7860,17 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums. This is because we cannot launch a dynamic background worker
+	 * directly from here, it has to be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9505,6 +9594,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9956,6 +10063,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 1fccf29a36..02dd3cf383 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,59 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	StartChecksumHelperLauncher(cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..1d4660597a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1160,6 +1160,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..73df17e5f3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	checksumhelper.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index a4c347d524..1bf1e1a49e 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..f5c9ac3283
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,909 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, at accesses.  When enabling checksums on an already running
+ * cluster, which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the checksums
+ * is turned on.
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+
+typedef enum
+{
+	SUCCESSFUL = 0,
+	ABORTED,
+	FAILED
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * ChecksumHelperLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they are
+	 * in shared memory, they are never concurrently accessed. When a worker
+	 * is running, the launcher is only waiting for that worker to finish.
+	 */
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+void
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->abort)
+	{
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: has been canceled")));
+	}
+
+	if (ChecksumHelperShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("could not start checksumhelper: already running")));
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	ChecksumHelperShmem->launcher_started = true;
+	LWLockRelease(ChecksumHelperLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+		ChecksumHelperShmem->launcher_started = false;
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("failed to start checksum helper launcher")));
+	}
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->launcher_started)
+		ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "Checksumhelper starting to process relation %d", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there
+		 * are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(LOG,
+				(errmsg("failed to start worker for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status != BGWH_STARTED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker startup for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	ereport(DEBUG1,
+			(errmsg("started background worker for checksums in \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status != BGWH_STOPPED)
+	{
+		ereport(LOG,
+				(errmsg("failed to wait for worker shutdown for checksumhelper in \"%s\"",
+						db->dbname)));
+		return FAILED;
+	}
+
+	if (ChecksumHelperShmem->success == ABORTED)
+		ereport(LOG,
+				(errmsg("checksumhelper was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker for checksums in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = false;
+	ChecksumHelperShmem->launcher_started = false;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		elog(DEBUG1, "Checking old transactions");
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %d)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_PG_SLEEP);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	List	   *FailedDatabases = NIL;
+	ListCell   *lc,
+			   *lc2;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("checksumhelper launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER), "", "", "");
+
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResult);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This can probably never
+		 * happen, but just in case.
+		 */
+		if (DatabaseList == NIL || list_length(DatabaseList) == 0)
+			return;
+
+		processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult result;
+			Oid		   *oid;
+
+			/* Skup if this database has been processed already */
+			if (hash_search(ProcessedDatabases, (void *) &db->dboid, HASH_FIND, NULL))
+			{
+				pfree(db->dbname);
+				pfree(db);
+				continue;
+			}
+
+			result = ProcessDatabase(db);
+
+			/* Make a copy of the oid so we can free the rest of the structure */
+			oid = palloc(sizeof(Oid));
+			*oid = db->dboid;
+			pfree(db->dbname);
+			pfree(db);
+
+			hash_search(ProcessedDatabases, (void *) oid, HASH_ENTER, NULL);
+			processed_databases++;
+
+			if (result == SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (ChecksumHelperShmem->process_shared_catalogs)
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (result == FAILED)
+			{
+				/*
+				 * Put failed databases on the remaining list.
+				 */
+				FailedDatabases = lappend(FailedDatabases, db);
+			}
+			else
+				/* Abort flag set, so exit the whole process */
+				return;
+		}
+
+		elog(DEBUG1, "Completed one loop of checksum enabling, %i databases processed", processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * FailedDatabases now has all databases that failed one way or another.
+	 * This can be because they actually failed for some reason, or because
+	 * the database was dropped between us getting the database list and
+	 * trying to process it. Get a fresh list of databases to detect the
+	 * second case where the database was dropped before we had started
+	 * processing it. If a database still exists, but enabling checksums
+	 * failed then we fail the entire checksumming process and exit with an
+	 * error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, FailedDatabases)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		bool		found = false;
+
+		foreach(lc2, DatabaseList)
+		{
+			ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+			if (db->dboid == db2->dboid)
+			{
+				found = true;
+				ereport(WARNING,
+						(errmsg("failed to enable checksums in \"%s\"",
+								db->dbname)));
+				break;
+			}
+		}
+
+		if (found)
+			found_failed = true;
+		else
+		{
+			ereport(LOG,
+					(errmsg("database \"%s\" has been dropped, skipping",
+							db->dbname)));
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksumhelper failed to enable checksums in all databases, aborting")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+	 * testing.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled, checksumhelper launcher shutting down")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == 't')
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relations types that have local storage
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != 't')
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("checksum worker starting for database oid %d", dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = ABORTED;
+		ereport(DEBUG1,
+				(errmsg("checksum worker aborted in database oid %d", dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_PG_SLEEP);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("checksum worker completed in database oid %d", dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7410b2ff5e..0008879acc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4303,6 +4303,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 1fa4551eff..c18b0b4ecb 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1397,7 +1397,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index bc532d027b..5951f3250d 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -196,6 +196,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 4829953ee6..4a9822a19d 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 9399ab0be4..96d257c9f6 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -456,8 +456,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_SAMPLE))
-		ProcessBarrierSample();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+	{
+		/*
+		 * By virtue of getting here (i.e. interrupts being processed), we
+		 * know that this backend won't have any in-progress writes (which
+		 * might have missed the checksum change).
+		 */
+	}
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -469,16 +475,6 @@ ProcessProcSignalBarrier(void)
 	pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierGeneration, generation);
 }
 
-static void
-ProcessBarrierSample(void)
-{
-	/*
-	 * XXX. This should be something no-fail, which elog() is not, but this is
-	 * just for testing purposes.
-	 */
-	elog(LOG, "ProcessBarrierSample");
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..d50b4b13e1 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+ChecksumHelperLock					45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f873fb0eea 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6b49810e37..6e3bfa045a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1171,7 +1171,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1198,7 +1198,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..61e856deac 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1527,7 +1527,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1545,7 +1545,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 8d951ce404..7a8228be4b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -72,6 +73,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -473,6 +475,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -580,7 +592,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1840,17 +1852,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4600,6 +4601,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 93f3c34b74..4fed2450fc 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 729f86aa32..97fc7b4de7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -220,7 +220,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9b588c87a5..dda9c9abad 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -189,7 +189,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -292,7 +292,13 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsInProgress(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index e295dc65fb..a588cb7ed9 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -245,6 +246,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index cf7d4485e9..c79b3fa365 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ac8f64b219..5ea54f968d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10663,6 +10663,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f2e873d048..8704b68de8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -727,7 +727,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..556f801668
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+void		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 4ef6d8ddd4..cf31f24b01 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 7ef32a3baa..2c414aa1e7 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 90eda29d15..02e22f38f5 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -47,7 +47,7 @@ typedef enum
 
 typedef enum
 {
-	PROCSIGNAL_BARRIER_SAMPLE = 0
+	PROCSIGNAL_BARRIER_CHECKSUM = 0
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..22a3b64dd8
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..891743fa6c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the master
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.0 (Apple Git-122.2)

#17

Sergei Kornilov

sk@zsrv.org

about 6 years ago

In reply to: Daniel Gustafsson (#16)

Re: Online checksums patch - once again

Hello

Attached is a v15 of the online checksums patchset (minus 0005), rebased on top
of your v3 ProcSignalBarrier patch rather than Andres' PoC GlobalBarrier patch.
It does take the, perhaps, controversial approach of replacing the SAMPLE
barrier with the CHECKSUM barrier. The cfbot will be angry since this email
doesn't contain the procsignalbarrier patch, but it sounded like that would go
in shortly so opted for that.

ProcSignalBarrier was committed, so online checksums patchset has no other pending dependencies and should be applied cleanly on master. Right? The patchset needs another rebase in this case, does not apply...

regards, Sergei

#18

Robert Haas

robertmhaas@gmail.com

about 6 years ago

In reply to: Daniel Gustafsson (#16)

Re: Online checksums patch - once again

On Mon, Dec 16, 2019 at 10:16 AM Daniel Gustafsson <daniel@yesql.se> wrote:

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

I think this patch still needs a lot of work.

- doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+ doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites ||
DataChecksumsInProgress());

This will have a small performance cost in a pretty hot code path. Not
sure that it's enough to worry about, though.

-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
{
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
}

This seems troubling, because data_checksum_version can now change,
but you're still accessing it without a lock. This complain applies
likewise to a bunch of related functions in xlog.c as well.

+ elog(ERROR, "Checksums not in inprogress mode");

Questionable capitalization and punctuation.

+void
+SetDataChecksumsOff(void)
+{
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ ControlFile->data_checksum_version = 0;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+ XlogChecksums(0);
+}

This looks racey. Suppose that checksums are on. Other backends will
see that checksums are disabled as soon as
ControlFile->data_checksum_version = 0 happens, and they will feel
free to write blocks without checksums. Now we crash, and those blocks
exist on disk even though the on-disk state still otherwise shows
checksums fully enabled. It's a little better if we stop reading
data_checksum_version without a lock, because now nobody else can see
the updated state until we've actually updated the control file. But
even then, isn't it strange that writes of non-checksummed stuff could
appear or be written to disk before XlogChecksums(0) happens? If
that's safe, it seems like it deserves some kind of comment.

+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));

This seems pretty half-baked.

+ (errmsg("could not start checksumhelper: has been canceled")));
+ (errmsg("could not start checksumhelper: already running")));
+ (errmsg("failed to start checksum helper launcher")));

These seem rough around the edges. Using an internal term like
'checksumhelper' in a user-facing error message doesn't seem great.
Generally primary error messages are phrased as a single utterance
where we can, rather than colon-separated fragments like this. The
third message calls it 'checksum helper launcher' whereas the other
two call it 'checksumhelper'. It also isn't very helpful; I don't
think most people like a message saying that something failed with no
explanation given.

+ elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer
exists", relationId);

Here's another way to spell 'checksumhelper', and this time it refers
to the worker rather than the launcher. Also, relation IDs are OIDs,
so need to be printed with %u, and usually we try to print names if
possible. Also, this message, like a lot of messages in this patch,
begins with a capital letter and does not end with a period. That is
neither the style for primary messages nor the style for detail
messages. As these are primary messages, the primary message style
should be used. That style is no capital and no period.

+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ {
+ ereport(LOG,
+ (errmsg("failed to start worker for checksumhelper in \"%s\"",
+ db->dbname)));
+ return FAILED;
+ }

I don't think having constants with names like
SUCCESSFUL/ABORTED/FAILED is a very good idea. Too much chance of name
collisions. I suggest adding a prefix.

Also, the retry logic here doesn't look particularly robust.
RegisterDynamicBackgroundWorker will fail if all slots are available;
if that happens twice for the same database, once on first attempting
it and again when retrying it, the whole process fails, all state is
lost, and all work has to be redone. That seems neither particularly
unlikely nor pleasant.

+ if (DatabaseList == NIL || list_length(DatabaseList) == 0)

I don't think that the second half of this test serves any purpose.

+ snprintf(activity, sizeof(activity), "Waiting for current
transactions to finish (waiting for %d)", waitforxid);

%u here too.

+ if (pgc->relpersistence == 't')

Use the relevant constant.

+ /*
+ * Wait for all temp tables that existed when we started to go away. This
+ * is necessary since we cannot "reach" them to enable checksums. Any temp
+ * tables created after we started will already have checksums in them
+ * (due to the inprogress state), so those are safe.
+ */

This does not seem very nice. It just leaves a worker running more or
less forever. It's essentially waiting for temp-table using sessions
to go away, but that could take a really long time.

+ WAIT_EVENT_PG_SLEEP);

You need to invent a new wait event and add docs for it.

+ if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+ {
+ /*
+ * By virtue of getting here (i.e. interrupts being processed), we
+ * know that this backend won't have any in-progress writes (which
+ * might have missed the checksum change).
+ */
+ }

I don't believe this. I already wrote some about this over here:

/messages/by-id/CA+TgmobORrsgSUydZ3MsSw9L5MBUGz7jRK+973uPZgiyCQ81ag@mail.gmail.com

As a general point, I think that the idea of the ProcSignalBarrier
mechanism is that every backend has some private state that needs to
be updated, and when it absorbs the barrier you know that it's updated
that state, and so when everybody's absorbed the barrier you know that
all the state has been updated. Here, however, there actually is no
backend-private state. The only state that everyone's consulting is
the shared state stored in ControlFile->data_checksum_version. So what
does absorbing the barrier prove? Only that we've reached a
CHECK_FOR_INTERRUPTS(). But that is a useful guarantee only if we
never check for interrupts between the time we examine
ControlFile->data_checksum_version and the time we use it, and I see
no particular reason to believe that should always be true, and I
suspect it isn't, and even if it happens to be true today I think it
could get broken in the future pretty easily. There are no particular
restrictions documented in terms of where DataChecksumsNeedWrite(),
XLogHintBitIsNeeded(), etc. can be checked or what can be done between
checking the value and using it. The issue doesn't arise for today's
DataChecksumsEnabled() because the value can't ever change, but with
this patch things can change, and to me what the patch does about that
doesn't really look adequate.

I'm sort of out of time for right now, but I think this patch needs a
lot more work on the concurrency end of things. It seems to me that it
probably mostly works in practice, but that the whole concurrency
mechanism is not very solid and probably has a lot of rare cases where
it can just misbehave if you get unlucky. I'll try to spend some more
time thinking about this next week. I also think that the fact that
the mechanism for getting from 'inprogress' to 'on' seems fragile and
under-engineered. It still bothers me that there's no mechanism for
persisting the progress that we've made in enabling checksums; but
even apart from that, I think that there just hasn't been enough
thought given here to error/problem cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#18)

1 attachment(s)

Re: Online checksums patch - once again

On 3 Jan 2020, at 23:07, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Dec 16, 2019 at 10:16 AM Daniel Gustafsson <daniel@yesql.se> wrote:

If reviewers think this version is nearing completion, then a v16 should
address the comment below, but as this version switches its underlying
infrastructure it seemed usefel for testing still.

I think this patch still needs a lot of work.

Thanks a lot for your thorough review, much appreciated! Also, sorry for being
slow to respond. Below are fixes and responses to most of the feedback, but I
need a bit more time to think about the concurrency aspects that you brought
up. However, in the spirit of showing work early/often I opted for still
sending the partial response, to perhaps be able to at least close some of the
raised issues in the meantime.

- doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+ doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites ||
DataChecksumsInProgress());
This will have a small performance cost in a pretty hot code path. Not
sure that it's enough to worry about, though.

Not sure either, and/or how clever compilers are about inlining this. As a
test, I've switched over this to be a static inline function, as it's only
consumer is in xlog.c. Now, as mentioned later in this review, reading the
version unlocked has issues so do consider this a WIP test, not a final
suggestion.

-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
{
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
}

This seems troubling, because data_checksum_version can now change,
but you're still accessing it without a lock. This complain applies
likewise to a bunch of related functions in xlog.c as well.

Right, let me do some more thinking on this before addressing in a next version
of the patch. Simply wrapping it in a SHARED lock still has TOCTOU problems so
a bit more work/thinking is needed.

+ elog(ERROR, "Checksums not in inprogress mode");

Questionable capitalization and punctuation.

Fixed capitalization, but elogs shouldn't end with a period so left that.

+void
+SetDataChecksumsOff(void)
+{
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ ControlFile->data_checksum_version = 0;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+ XlogChecksums(0);
+}
This looks racey. Suppose that checksums are on. Other backends will
see that checksums are disabled as soon as
ControlFile->data_checksum_version = 0 happens, and they will feel
free to write blocks without checksums. Now we crash, and those blocks
exist on disk even though the on-disk state still otherwise shows
checksums fully enabled. It's a little better if we stop reading
data_checksum_version without a lock, because now nobody else can see
the updated state until we've actually updated the control file. But
even then, isn't it strange that writes of non-checksummed stuff could
appear or be written to disk before XlogChecksums(0) happens? If
that's safe, it seems like it deserves some kind of comment.

As mentioned above, I would like to address this in the next version. I'm
working on it, just need a little more time and wanted to share progress on the
other bits.

+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));

This seems pretty half-baked.

I don't disagree with that. However, given that enabling checksums is a pretty
intensive operation it seems somewhat unfriendly to automatically restart. As
a DBA I wouldn't want that to kick off without manual intervention, but there
is also the risk of this being missed due to assumptions that it would restart.
Any ideas on how to treat this?

If/when we can restart the processing where it left off, without the need to go
over all data again, things might be different wrt the default action.

+ (errmsg("could not start checksumhelper: has been canceled")));
+ (errmsg("could not start checksumhelper: already running")));
+ (errmsg("failed to start checksum helper launcher")));
These seem rough around the edges. Using an internal term like
'checksumhelper' in a user-facing error message doesn't seem great.
Generally primary error messages are phrased as a single utterance
where we can, rather than colon-separated fragments like this. The
third message calls it 'checksum helper launcher' whereas the other
two call it 'checksumhelper'. It also isn't very helpful; I don't
think most people like a message saying that something failed with no
explanation given.

+ elog(DEBUG1, "Checksumhelper skipping relation %d as it no longer
exists", relationId);

Here's another way to spell 'checksumhelper', and this time it refers
to the worker rather than the launcher. Also, relation IDs are OIDs,
so need to be printed with %u, and usually we try to print names if
possible. Also, this message, like a lot of messages in this patch,
begins with a capital letter and does not end with a period. That is
neither the style for primary messages nor the style for detail
messages. As these are primary messages, the primary message style
should be used. That style is no capital and no period.

I've removed checksumhelper from all user facing strings, and only kept them in
the DEBUG strings (which to some extent probably will be removed before a final
version of the patch, so didn't spend too much time on those just now). The
bgworker name is still checksumhelper launcher and checksumhelper worker, but
that could perhaps do with a better name too.

+ if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+ {
+ ereport(LOG,
+ (errmsg("failed to start worker for checksumhelper in \"%s\"",
+ db->dbname)));
+ return FAILED;
+ }
I don't think having constants with names like
SUCCESSFUL/ABORTED/FAILED is a very good idea. Too much chance of name
collisions. I suggest adding a prefix.

Fixed.

Also, the retry logic here doesn't look particularly robust.
RegisterDynamicBackgroundWorker will fail if all slots are available;
if that happens twice for the same database, once on first attempting
it and again when retrying it, the whole process fails, all state is
lost, and all work has to be redone. That seems neither particularly
unlikely nor pleasant.

Agreed, this was a brick or two shy of a load. I've rewritten this logic to
cope better with the conditions around startup/shutdown of bgworkers. I think
there should be some form of backoff mechanism as well in case there
temporarily aren't any slots, to avoid running through all the databases in
short order only to run up the retry counter. Something like if X databases in
succession fail on no slot being available, back off a little before trying X+1
to allow for operations that consume the slot(s) to finish. Or something. It
wont help for systems which are permanently starved with a too low
max_worker_processes, but nothing sort of will. For the latter, I've added a
note to the documentation.

+ if (DatabaseList == NIL || list_length(DatabaseList) == 0)

I don't think that the second half of this test serves any purpose.

True, but I think the code is clearer if the second half is the one we keep, so
went with that.

+ snprintf(activity, sizeof(activity), "Waiting for current
transactions to finish (waiting for %d)", waitforxid);

%u here too.

Fixed.

+ if (pgc->relpersistence == 't')

Use the relevant constant.

Fixed.

+ /*
+ * Wait for all temp tables that existed when we started to go away. This
+ * is necessary since we cannot "reach" them to enable checksums. Any temp
+ * tables created after we started will already have checksums in them
+ * (due to the inprogress state), so those are safe.
+ */
This does not seem very nice. It just leaves a worker running more or
less forever. It's essentially waiting for temp-table using sessions
to go away, but that could take a really long time.

It can, but is there a realistic alternative? I can't think of one but if you
have ideas I'd love for this requirement to go away, or be made less blocking.

At the same time, enabling checksums is hardly the kind of operation one does
casually in a busy database, but probably a more planned operation. This
requirement is mentioned in the documentation such that a DBA can plan for when
to start the processing.

+ WAIT_EVENT_PG_SLEEP);

You need to invent a new wait event and add docs for it.

Done. I failed to figure out a (IMO) good name though, and welcome suggestions
that are more descriptive. CHECKSUM_ENABLE_STARTCONDITION was what I settled on
but I'm not too excited about it.

+ if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+ {
+ /*
+ * By virtue of getting here (i.e. interrupts being processed), we
+ * know that this backend won't have any in-progress writes (which
+ * might have missed the checksum change).
+ */
+ }
I don't believe this. I already wrote some about this over here:

/messages/by-id/CA+TgmobORrsgSUydZ3MsSw9L5MBUGz7jRK+973uPZgiyCQ81ag@mail.gmail.com

As a general point, I think that the idea of the ProcSignalBarrier
mechanism is that every backend has some private state that needs to
be updated, and when it absorbs the barrier you know that it's updated
that state, and so when everybody's absorbed the barrier you know that
all the state has been updated. Here, however, there actually is no
backend-private state. The only state that everyone's consulting is
the shared state stored in ControlFile->data_checksum_version. So what
does absorbing the barrier prove? Only that we've reached a
CHECK_FOR_INTERRUPTS(). But that is a useful guarantee only if we
never check for interrupts between the time we examine
ControlFile->data_checksum_version and the time we use it, and I see
no particular reason to believe that should always be true, and I
suspect it isn't, and even if it happens to be true today I think it
could get broken in the future pretty easily. There are no particular
restrictions documented in terms of where DataChecksumsNeedWrite(),
XLogHintBitIsNeeded(), etc. can be checked or what can be done between
checking the value and using it. The issue doesn't arise for today's
DataChecksumsEnabled() because the value can't ever change, but with
this patch things can change, and to me what the patch does about that
doesn't really look adequate.

I don't disagree with this, but I need to do a bit more thinking before
presenting a suggested fix for this concurrency issue.

I'm sort of out of time for right now, but I think this patch needs a
lot more work on the concurrency end of things. It seems to me that it
probably mostly works in practice, but that the whole concurrency
mechanism is not very solid and probably has a lot of rare cases where
it can just misbehave if you get unlucky. I'll try to spend some more
time thinking about this next week. I also think that the fact that
the mechanism for getting from 'inprogress' to 'on' seems fragile and
under-engineered. It still bothers me that there's no mechanism for
persisting the progress that we've made in enabling checksums; but
even apart from that, I think that there just hasn't been enough
thought given here to error/problem cases.

Thanks again for reviewing (and working on the infrastructure required for this
patch to begin with)! Regarding the persisting the progress; that would be a
really neat feature but I don't have any suggestion on how to do that safely
for real use-cases.

Attached is a v16 rebased on top of current master which addresses the above
commented points, and which I am basing the concurrency work on.

cheers ./daniel

Attachments:

online_checksums16.patchapplication/octet-stream; name=online_checksums16.patch; x-unix-mode=0644Download

From 71cf350965e596aaf0bf7bb74395a35eef3c0f9d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 16 Dec 2019 15:06:25 +0100
Subject: [PATCH] Online enable and disable of data checksums

This allows data checksums to be enabled, or disabled, in a running
cluster without requiring a restart. The data checksum mode will be
set to "inprogress" while the checksumhelper background worker walks
the database and rewrites all the data pages to include checksums.
Once all pages have been checksummed, the data checksum mode turns
to "on". The synchronization of the checksumhelper is based on the
procbarrier infrastructure.

This adds a new section in the documention for checksums, covering
the basics on how to enable and disable with the various means that
are possible. More information can probably be added covering other
aspects of data checksums.

In passing, this patch removes the SAMPLE procbarrier and replaces
it with the CHECKSUM procbarrier, as there is now a consumer of the
procbarrier API and the "dummy" barrier can be removed (it was needed
to keep the enum with something).
---
 doc/src/sgml/func.sgml                      |  65 ++
 doc/src/sgml/monitoring.sgml                |   4 +
 doc/src/sgml/ref/initdb.sgml                |   1 +
 doc/src/sgml/wal.sgml                       |  99 ++
 src/backend/access/rmgrdesc/xlogdesc.c      |  16 +
 src/backend/access/transam/xlog.c           | 131 ++-
 src/backend/access/transam/xlogfuncs.c      |  57 ++
 src/backend/catalog/system_views.sql        |   5 +
 src/backend/postmaster/Makefile             |   1 +
 src/backend/postmaster/bgworker.c           |   7 +
 src/backend/postmaster/checksumhelper.c     | 950 ++++++++++++++++++++
 src/backend/postmaster/pgstat.c             |   9 +
 src/backend/replication/basebackup.c        |   2 +-
 src/backend/replication/logical/decode.c    |   1 +
 src/backend/storage/ipc/ipci.c              |   2 +
 src/backend/storage/ipc/procsignal.c        |  24 +-
 src/backend/storage/lmgr/lwlocknames.txt    |   1 +
 src/backend/storage/page/README             |   3 +-
 src/backend/storage/page/bufpage.c          |   6 +-
 src/backend/utils/adt/pgstatfuncs.c         |   4 +-
 src/backend/utils/misc/guc.c                |  36 +-
 src/bin/pg_upgrade/controldata.c            |   9 +
 src/bin/pg_upgrade/pg_upgrade.h             |   2 +-
 src/include/access/xlog.h                   |   9 +-
 src/include/access/xlog_internal.h          |   7 +
 src/include/catalog/pg_control.h            |   1 +
 src/include/catalog/pg_proc.dat             |  16 +
 src/include/pgstat.h                        |   5 +-
 src/include/postmaster/checksumhelper.h     |  31 +
 src/include/storage/bufpage.h               |   1 +
 src/include/storage/checksum.h              |   7 +
 src/include/storage/procsignal.h            |   7 +-
 src/test/Makefile                           |   3 +-
 src/test/checksum/.gitignore                |   2 +
 src/test/checksum/Makefile                  |  24 +
 src/test/checksum/README                    |  22 +
 src/test/checksum/t/001_standby_checksum.pl | 104 +++
 37 files changed, 1622 insertions(+), 52 deletions(-)
 create mode 100644 src/backend/postmaster/checksumhelper.c
 create mode 100644 src/include/postmaster/checksumhelper.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_standby_checksum.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 72072e7545..823a5fdf41 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21316,6 +21316,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>inprogress</literal> as well as start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch data checksums mode to
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0bfd6151c4..628580dfd1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1339,6 +1339,10 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
          <entry><literal>CheckpointStart</literal></entry>
          <entry>Waiting for a checkpoint to start.</entry>
         </row>
+        <row>
+         <entry><literal>ChecksumEnableStartcondition</literal></entry>
+         <entry>Waiting for transactions to finish before starting to enable checksums.</entry>
+        </row>
         <row>
          <entry><literal>ClogGroupUpdate</literal></entry>
          <entry>Waiting for group leader to update transaction status at transaction end.</entry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index da5c8f5307..69b7f91cbc 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -220,6 +220,7 @@ PostgreSQL documentation
         are calculated for all objects, in all databases. All checksum
         failures will be reported in the
         <xref linkend="pg-stat-database-view"/> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4eb8feb903..2d8693def1 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,105 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later timne, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start over from the beginning.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="pg_checksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..779567736d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7f4f784c0e..7ad1f02321 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -866,6 +866,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -940,6 +941,8 @@ static void WALInsertLockAcquireExclusive(void);
 static void WALInsertLockRelease(void);
 static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
 
+static bool DataChecksumsInProgress(void);
+
 /*
  * Insert an XLOG record represented by an already-constructed chain of data
  * chunks.  This is a low-level routine; to construct the WAL record header
@@ -1048,7 +1051,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4775,10 +4778,6 @@ ReadControlFile(void)
 		(SizeOfXLogLongPHD - SizeOfXLogShortPHD);
 
 	CalculateCheckpointSegments();
-
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
 /*
@@ -4815,12 +4814,92 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
 	Assert(ControlFile != NULL);
 	return (ControlFile->data_checksum_version > 0);
 }
 
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(ControlFile != NULL);
+
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+static inline bool
+DataChecksumsInProgress(void)
+{
+	return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}
+
+void
+SetDataChecksumsInProgress(void)
+{
+	Assert(ControlFile != NULL);
+	if (ControlFile->data_checksum_version > 0)
+		return;
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM));
+
+	XlogChecksums(0);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		return "inprogress";
+	else
+		return "off";
+}
+
 /*
  * Returns a fake LSN for unlogged relations.
  *
@@ -7777,6 +7856,17 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in inprogress state, we notify
+	 * the user that they need to manually restart the process to enable
+	 * checksums. This is because we cannot launch a dynamic background worker
+	 * directly from here, it has to be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+		ereport(WARNING,
+				(errmsg("checksum state is \"inprogress\" with no worker"),
+				 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9500,6 +9590,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -9951,6 +10059,17 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index d4e21f90a8..89ef077d59 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/checksumhelper.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -770,3 +771,59 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		ereport(ERROR,
+				(errmsg("data checksums already disabled")));
+
+	ShutdownChecksumHelperIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	/*
+	 * Allow state change from "off" or from "inprogress", since this is how
+	 * we restart the worker if necessary.
+	 */
+	if (DataChecksumsNeedVerify())
+		ereport(ERROR,
+				(errmsg("data checksums already enabled")));
+
+	SetDataChecksumsInProgress();
+	StartChecksumHelperLauncher(cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c9e75f4370..34be3b1246 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1187,6 +1187,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..73df17e5f3 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	checksumhelper.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 75fc0d5d33..fb1bd0dace 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"ChecksumHelperLauncherMain", ChecksumHelperLauncherMain
+	},
+	{
+		"ChecksumHelperWorkerMain", ChecksumHelperWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
new file mode 100644
index 0000000000..fba7f4cfdb
--- /dev/null
+++ b/src/backend/postmaster/checksumhelper.c
@@ -0,0 +1,950 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.c
+ *	  Background worker to walk the database and write checksums to pages
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, at accesses.  When enabling checksums on an already running
+ * cluster, which was not initialized with checksums, this helper worker will
+ * ensure that all pages are checksummed before verification of the checksums
+ * is turned on.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/checksumhelper.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+
+#define CHECKSUMHELPER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	CHECKSUMHELPER_SUCCESSFUL = 0,
+	CHECKSUMHELPER_ABORTED,
+	CHECKSUMHELPER_FAILED,
+	CHECKSUMHELPER_RETRYDB,
+}			ChecksumHelperResult;
+
+typedef struct ChecksumHelperShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * ChecksumHelperLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they are
+	 * in shared memory, they are never concurrently accessed. When a worker
+	 * is running, the launcher is only waiting for that worker to finish.
+	 */
+	ChecksumHelperResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+}			ChecksumHelperShmemStruct;
+
+/* Shared memory segment for checksumhelper */
+static ChecksumHelperShmemStruct * ChecksumHelperShmem;
+
+/* Bookkeeping for work to do */
+typedef struct ChecksumHelperDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			ChecksumHelperDatabase;
+
+typedef struct ChecksumHelperRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			ChecksumHelperRelation;
+
+typedef struct ChecksumHelperResultEntry
+{
+	Oid						dboid;
+	ChecksumHelperResult	result;
+	int						retries;
+}			ChecksumHelperResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static ChecksumHelperResult ProcessDatabase(ChecksumHelperDatabase * db);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+
+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+void
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->abort)
+	{
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("checksum enabling has been aborted")));
+	}
+
+	if (ChecksumHelperShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(ChecksumHelperLock);
+		ereport(NOTICE,
+				(errmsg("checksums are already being enabled")));
+		return;
+	}
+
+	ChecksumHelperShmem->cost_delay = cost_delay;
+	ChecksumHelperShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	ChecksumHelperShmem->launcher_started = true;
+	LWLockRelease(ChecksumHelperLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+		ChecksumHelperShmem->launcher_started = false;
+		LWLockRelease(ChecksumHelperLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to enable checksums")));
+	}
+}
+
+/*
+ * ShutdownChecksumHelperIfRunning
+ *		Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+	/* If the launcher isn't started, there is nothing to shut down */
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	if (ChecksumHelperShmem->launcher_started)
+		ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks (so as not to "spam")
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+			processed_databases++;
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (ChecksumHelperShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual error
+ * is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2, "background worker \"checksumhelper\" starting to process relation %u", relationId);
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We consider this a success, since there
+		 * are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1, "background worker \"checksumhelper\" skipping relation %u as it no longer exists", relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2, "background worker \"checksumhelper\" done with relation %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static ChecksumHelperResult
+ProcessDatabase(ChecksumHelperDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	ChecksumHelperShmem->success = CHECKSUMHELPER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ChecksumHelperWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "checksumhelper worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "checksumhelper worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the checksumhelper move on to the next
+	 * database and quite likely fail with the same problem. Maybe we need a
+	 * backoff to avoid running through all the databases here in short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling checksums in \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return CHECKSUMHELPER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling checksums in \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return CHECKSUMHELPER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with checksums enabled
+	 * clusterwide so we have no alternative other than exiting.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable checksums without the postmaster process"),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("started background worker \"checksumhelper\" in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	if (ChecksumHelperShmem->success == CHECKSUMHELPER_ABORTED)
+		ereport(LOG,
+				(errmsg("background worker for enabling checksums was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"checksumhelper\" in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return ChecksumHelperShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = false;
+	ChecksumHelperShmem->launcher_started = false;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(ChecksumHelperLock, LW_EXCLUSIVE);
+	ChecksumHelperShmem->abort = true;
+	LWLockRelease(ChecksumHelperLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (true)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity, sizeof(activity), "Waiting for current transactions to finish (waiting for %u)", waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			(void) WaitLatch(MyLatch,
+							 WL_LATCH_SET | WL_TIMEOUT,
+							 5000,
+							 WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+ChecksumHelperLauncherMain(Datum arg)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"checksumhelper\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_LAUNCHER),
+					"", "", "");
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(ChecksumHelperResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	ChecksumHelperShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		/*
+		 * If there are no databases at all to checksum, we can exit
+		 * immediately as there is no work to do. This can probably never
+		 * happen, but just in case.
+		 */
+		if (list_length(DatabaseList) == 0)
+			return;
+
+		foreach(lc, DatabaseList)
+		{
+			ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+			ChecksumHelperResult result;
+			ChecksumHelperResultEntry *entry;
+			bool			found;
+
+			elog(DEBUG1, "Starting processing of database %s with oid %u", db->dbname, db->dboid);
+
+			entry = (ChecksumHelperResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+								HASH_FIND, NULL);
+
+			/* Skip if this database has been processed already */
+			if (entry)
+			{
+				if (entry->result == CHECKSUMHELPER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > CHECKSUMHELPER_MAX_DB_RETRIES)
+						entry->result = CHECKSUMHELPER_FAILED;
+				}
+
+				if (entry->result != CHECKSUMHELPER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == CHECKSUMHELPER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (ChecksumHelperShmem->process_shared_catalogs)
+					ChecksumHelperShmem->process_shared_catalogs = false;
+			}
+			else if (result == CHECKSUMHELPER_ABORTED)
+				/* Abort flag set, so exit the whole process */
+				return;
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "completed one pass over all databases for checksum enabling, %i databases processed",
+			 processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+		ChecksumHelperResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exist.
+		 */
+		if (found && *entry == CHECKSUMHELPER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. XXX: this should
+	 * probably not be an IMMEDIATE checkpoint, but leave it there for now for
+	 * testing.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	/*
+	 * Everything has been processed, so flag checksums enabled.
+	 */
+	SetDataChecksumsOn();
+
+	ereport(LOG,
+			(errmsg("checksums enabled clusterwide")));
+}
+
+/*
+ * ChecksumHelperShmemSize
+ *		Compute required space for checksumhelper-related shared memory
+ */
+Size
+ChecksumHelperShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(ChecksumHelperShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * ChecksumHelperShmemInit
+ *		Allocate and initialize checksumhelper-related shared memory
+ */
+void
+ChecksumHelperShmemInit(void)
+{
+	bool		found;
+
+	ChecksumHelperShmem = (ChecksumHelperShmemStruct *)
+		ShmemInitStruct("ChecksumHelper Data",
+						ChecksumHelperShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(ChecksumHelperShmem, 0, ChecksumHelperShmemSize());
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the checksumhelper workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		ChecksumHelperDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (ChecksumHelperDatabase *) palloc(sizeof(ChecksumHelperDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned.
+ * Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		ChecksumHelperRelation *relentry;
+
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		/*
+		 * Only include relations types that have local storage
+		 */
+		if (pgc->relkind == RELKIND_VIEW ||
+			pgc->relkind == RELKIND_COMPOSITE_TYPE ||
+			pgc->relkind == RELKIND_FOREIGN_TABLE)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (ChecksumHelperRelation *) palloc(sizeof(ChecksumHelperRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != RELPERSISTENCE_TEMP)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+ChecksumHelperWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	init_ps_display(pgstat_get_backend_desc(B_CHECKSUMHELPER_WORKER), "", "", "");
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"checksumhelper\" starting for database oid %d",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = ChecksumHelperShmem->cost_delay;
+	VacuumCostLimit = ChecksumHelperShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(ChecksumHelperShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		ChecksumHelperRelation *rel = (ChecksumHelperRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		ChecksumHelperShmem->success = CHECKSUMHELPER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("background worker \"checksumhelper\" aborted in database oid %d",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the inprogress state), so those are safe.
+	 */
+	while (true)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT,
+						 5000,
+						 WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+	}
+
+	list_free(InitialTempTableList);
+
+	ChecksumHelperShmem->success = CHECKSUMHELPER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("background worker \"checksumhelper\" completed in database oid %d",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486bebd..3754a7b0ae 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3752,6 +3752,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartcondition";
+			break;
 		case WAIT_EVENT_CLOG_GROUP_UPDATE:
 			event_name = "ClogGroupUpdate";
 			break;
@@ -4303,6 +4306,12 @@ pgstat_get_backend_desc(BackendType backendType)
 		case B_WAL_WRITER:
 			backendDesc = "walwriter";
 			break;
+		case B_CHECKSUMHELPER_LAUNCHER:
+			backendDesc = "checksumhelper launcher";
+			break;
+		case B_CHECKSUMHELPER_WORKER:
+			backendDesc = "checksumhelper worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index dea8aab45e..c54e059041 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1385,7 +1385,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5e1dc8a651..39ef348cc4 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -196,6 +196,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..be422954f2 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/checksumhelper.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	ChecksumHelperShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 65d3946386..f2088a9c8e 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -92,7 +92,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -456,8 +455,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM))
+	{
+		/*
+		 * By virtue of getting here (i.e. interrupts being processed), we
+		 * know that this backend won't have any in-progress writes (which
+		 * might have missed the checksum change).
+		 */
+	}
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -469,19 +474,6 @@ ProcessProcSignalBarrier(void)
 	pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierGeneration, generation);
 }
 
-static void
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live
-	 * in the file pertaining to that subsystem, rather than here.
-	 */
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index db47843229..d50b4b13e1 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -49,3 +49,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 CLogTruncationLock					44
+ChecksumHelperLock					45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 5127d98da3..f873fb0eea 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -9,7 +9,8 @@ have a very low measured incidence according to research on large server farms,
 http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
-Current implementation requires this be enabled system-wide at initdb time.
+Checksums can be enabled at initdb time, but can also be turned on and off
+using pg_enable_data_checksums()/pg_disable_data_checksums() at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index f47176753d..b7a4760633 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1171,7 +1171,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1198,7 +1198,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 74f899f24d..654ef89663 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1533,7 +1533,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1551,7 +1551,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e44f71e991..177206e58d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -72,6 +73,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -477,6 +479,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -584,7 +596,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1844,17 +1856,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4613,6 +4614,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		check_ssl_max_protocol_version, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..04a6999cf9 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * checksumhelper have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index b156b516cc..3f033a6854 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -220,7 +220,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 98b033fc20..69df34cc3b 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -189,7 +189,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -292,7 +292,12 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern void SetDataChecksumsInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern const char *show_data_checksums(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 087918d41d..56d2efd5b0 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -245,6 +246,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..73a5495335 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fcf2a1214c..2d81d17948 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10682,6 +10682,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 36b530bc27..2f30138a26 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -727,7 +727,9 @@ typedef enum BackendType
 	B_STARTUP,
 	B_WAL_RECEIVER,
 	B_WAL_SENDER,
-	B_WAL_WRITER
+	B_WAL_WRITER,
+	B_CHECKSUMHELPER_LAUNCHER,
+	B_CHECKSUMHELPER_WORKER
 } BackendType;
 
 
@@ -823,6 +825,7 @@ typedef enum
 	WAIT_EVENT_CLOG_GROUP_UPDATE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATING,
 	WAIT_EVENT_HASH_BATCH_ELECTING,
diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
new file mode 100644
index 0000000000..888269f3ac
--- /dev/null
+++ b/src/include/postmaster/checksumhelper.h
@@ -0,0 +1,31 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumhelper.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/checksumhelper.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUMHELPER_H
+#define CHECKSUMHELPER_H
+
+/* Shared memory */
+extern Size ChecksumHelperShmemSize(void);
+extern void ChecksumHelperShmemInit(void);
+
+/* Start the background processes for enabling checksums */
+void		StartChecksumHelperLauncher(int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownChecksumHelperIfRunning(void);
+
+/* Background worker entrypoints */
+void		ChecksumHelperLauncherMain(Datum arg);
+void		ChecksumHelperWorkerMain(Datum arg);
+
+#endif							/* CHECKSUMHELPER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 870ecb51b7..22c5e02175 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION		2
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..c991dec6a4 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,13 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 90607df106..a99eb1fe37 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -47,12 +47,7 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM = 0
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..22a3b64dd8
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
+
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..e3fbd2bdb5
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster with streaming replication.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_standby_checksum.pl b/src/test/checksum/t/001_standby_checksum.pl
new file mode 100644
index 0000000000..891743fa6c
--- /dev/null
+++ b/src/test/checksum/t/001_standby_checksum.pl
@@ -0,0 +1,104 @@
+# Test suite for testing enabling data checksums with streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+my $MAX_TRIES = 30;
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_master->backup($backup_name);
+
+# Create streaming standby linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on master to have un-checksummed data in the cluster
+$node_master->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_master->wait_for_catchup($node_standby_1, 'replay',
+	$node_master->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on master');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_master->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the master has switched to inprogress immediately
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "inprogress", 'ensure checksums are in progress on master');
+
+# Wait for checksum enable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the master
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_master->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the master
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_master->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on master');
+
+# Wait for checksums enabled on the standby
+for (my $i = 0; $i < $MAX_TRIES; $i++)
+{
+	$result = $node_standby_1->safe_psql('postgres',
+		"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+	last if ($result eq 'on');
+	sleep(1);
+}
+is ($result, "on", 'ensure checksums are enabled on standby');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_master->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_master->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on master');
+
+# Wait for checksum disable to be replayed
+$node_master->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_master->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.0 (Apple Git-122.2)

#20

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#19)

Re: Online checksums patch - once again

On Sat, Jan 18, 2020 at 6:18 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Thanks again for reviewing (and working on the infrastructure required for this
patch to begin with)! Regarding the persisting the progress; that would be a
really neat feature but I don't have any suggestion on how to do that safely
for real use-cases.

Leaving to one side the question of how much work is involved, could
we do something conceptually similar to relfrozenxid/datfrozenxid,
i.e. use catalog state to keep track of which objects have been
handled and which not?

Very rough sketch:

* set a flag indicating that checksums must be computed for all page writes
* use barriers and other magic to make sure everyone has gotten the
memo from the previous step
* use new catalog fields pg_class.relhaschecksums and
pg_database.dathaschecksums to track whether checksums are enabled
* keep launching workers for databases where !pg_class.dathaschecksums
until none remain
* mark checksums as fully enabled
* party

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Magnus Hagander

magnus@hagander.net

almost 6 years ago

In reply to: Robert Haas (#20)

Re: Online checksums patch - once again

On Mon, Jan 20, 2020 at 12:14 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 18, 2020 at 6:18 PM Daniel Gustafsson <daniel@yesql.se> wrote:

Thanks again for reviewing (and working on the infrastructure required for this
patch to begin with)! Regarding the persisting the progress; that would be a
really neat feature but I don't have any suggestion on how to do that safely
for real use-cases.

Leaving to one side the question of how much work is involved, could
we do something conceptually similar to relfrozenxid/datfrozenxid,
i.e. use catalog state to keep track of which objects have been
handled and which not?

Very rough sketch:

* set a flag indicating that checksums must be computed for all page writes
* use barriers and other magic to make sure everyone has gotten the
memo from the previous step
* use new catalog fields pg_class.relhaschecksums and
pg_database.dathaschecksums to track whether checksums are enabled
* keep launching workers for databases where !pg_class.dathaschecksums
until none remain
* mark checksums as fully enabled
* party

We did discuss this back when we started work on this (I can't
remember if it was just me and Daniel and someone else or on a list --
but that's not important atm).

The reasoning that led us to *not* doing that is that it's a one-off
operation. That along with the fact that we hope to at some point be
able to change the default to chekcsums on (and t wouldn't be
necessary for the transition on->off as that is very fast), it would
become an increasingly rate on-off operation. And by adding these
flags to the catalogs, everybody is paying the overhead for this
one-off rare operation. Another option would be to add the flag on the
pg_database level which would decrease the overhead, but our guess was
that this would also decrease the usefulness in most cases to make it
not worth it (most people with big databases don't have many big
databases in the same cluster -- it's usually just one or two, so in
the end the results would be more or less the same was we have now as
it would have to keep re-doing the big ones)

Unless we actually want to support running systems more or less
permanently with some tables with checksums and other tables without
checksums. But that's going to have an effect on the validation of
checksums that would generate a huge overhead (since each buffer check
would have to look up the pg_class entry).

FYI, Daniel is working on an update that will include this -- so we
can see what the actual outcome is of it in th case of complexity as
well. Should hopefully be ready soon.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#22

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Magnus Hagander (#21)

Re: Online checksums patch - once again

On Wed, Jan 22, 2020 at 2:50 PM Magnus Hagander <magnus@hagander.net> wrote:

The reasoning that led us to *not* doing that is that it's a one-off
operation. That along with the fact that we hope to at some point be
able to change the default to chekcsums on (and t wouldn't be
necessary for the transition on->off as that is very fast), it would
become an increasingly rate on-off operation. And by adding these
flags to the catalogs, everybody is paying the overhead for this
one-off rare operation. Another option would be to add the flag on the
pg_database level which would decrease the overhead, but our guess was
that this would also decrease the usefulness in most cases to make it
not worth it (most people with big databases don't have many big
databases in the same cluster -- it's usually just one or two, so in
the end the results would be more or less the same was we have now as
it would have to keep re-doing the big ones)

I understand, but the point for me is that the patch does not seem
robust as written. Nobody's going to be happy if there are reasonably
high-probability scenarios where it turns checksums part way on and
then just stops. Now, that can probably be improved to some degree
without adding catalog flags, but I bet it can be improved more and
for less effort if we do add catalog flags. Maybe being able to
survive a cluster restart without losing track of progress is not a
hard requirement for this feature, but it certainly seems nice. And I
would venture to say that continuing to run without giving up if there
happen to be no background workers available for a while IS a hard
requirement, because that can easily happen due to normal use of
parallel query. We do not normally commit features if, without any
error occurring, they might just give up part way through the
operation.

I think the argument about adding catalog flags adding overhead is
pretty much bogus. Fixed-width fields in catalogs are pretty cheap.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Magnus Hagander

magnus@hagander.net

almost 6 years ago

In reply to: Robert Haas (#22)

Re: Online checksums patch - once again

On Wed, Jan 22, 2020 at 12:20 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 22, 2020 at 2:50 PM Magnus Hagander <magnus@hagander.net> wrote:

The reasoning that led us to *not* doing that is that it's a one-off
operation. That along with the fact that we hope to at some point be
able to change the default to chekcsums on (and t wouldn't be
necessary for the transition on->off as that is very fast), it would
become an increasingly rate on-off operation. And by adding these
flags to the catalogs, everybody is paying the overhead for this
one-off rare operation. Another option would be to add the flag on the
pg_database level which would decrease the overhead, but our guess was
that this would also decrease the usefulness in most cases to make it
not worth it (most people with big databases don't have many big
databases in the same cluster -- it's usually just one or two, so in
the end the results would be more or less the same was we have now as
it would have to keep re-doing the big ones)

I understand, but the point for me is that the patch does not seem
robust as written. Nobody's going to be happy if there are reasonably
high-probability scenarios where it turns checksums part way on and
then just stops. Now, that can probably be improved to some degree
without adding catalog flags, but I bet it can be improved more and
for less effort if we do add catalog flags. Maybe being able to
survive a cluster restart without losing track of progress is not a
hard requirement for this feature, but it certainly seems nice. And I

It's certainly nice, but that is of course a cost/benefit tradeoff
calculation. Our thoughts on that was that the cost was higher than
the benefit -- which may of course be wrong, and in that case it's
better to have it changed.

would venture to say that continuing to run without giving up if there
happen to be no background workers available for a while IS a hard
requirement, because that can easily happen due to normal use of
parallel query. We do not normally commit features if, without any
error occurring, they might just give up part way through the
operation.

That part I agree with, but I don't think that in itself requires
per-relation level tracking.

I think the argument about adding catalog flags adding overhead is
pretty much bogus. Fixed-width fields in catalogs are pretty cheap.

If that's the general view, then yeah our "cost calculations" were
off. I guess I may have been colored by the cost of adding statistics
counters, and had that influence the thinking. Incorrect judgement on
that cost certainly contributed to the decision. back then.

But as noted, work is being done on adding it, so let's see what that
ends up looking like in reality.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#24

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Magnus Hagander (#23)

Re: Online checksums patch - once again

On Wed, Jan 22, 2020 at 3:28 PM Magnus Hagander <magnus@hagander.net> wrote:

I think the argument about adding catalog flags adding overhead is
pretty much bogus. Fixed-width fields in catalogs are pretty cheap.

If that's the general view, then yeah our "cost calculations" were
off. I guess I may have been colored by the cost of adding statistics
counters, and had that influence the thinking. Incorrect judgement on
that cost certainly contributed to the decision. back then.

For either statistics or for pg_class, the amount of data that we have
to manage is proportional to the number of relations (which could be
big) multiplied by the data stored for each relation. But the
difference is that the stats file has to be rewritten, at least on a
per-database basis, very frequently, while pg_class goes through
shared-buffers and so doesn't provoke the same stupid
write-the-whole-darn-thing behavior. That is a pretty key difference,
IMHO.

Now, it would be nice to fix the stats system, but until we do, here we are.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Robert Haas (#24)

Re: Online checksums patch - once again

On 22 Jan 2020, at 23:07, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 22, 2020 at 3:28 PM Magnus Hagander <magnus@hagander.net> wrote:

I think the argument about adding catalog flags adding overhead is
pretty much bogus. Fixed-width fields in catalogs are pretty cheap.

If that's the general view, then yeah our "cost calculations" were
off. I guess I may have been colored by the cost of adding statistics
counters, and had that influence the thinking. Incorrect judgement on
that cost certainly contributed to the decision. back then.

For either statistics or for pg_class, the amount of data that we have
to manage is proportional to the number of relations (which could be
big) multiplied by the data stored for each relation. But the
difference is that the stats file has to be rewritten, at least on a
per-database basis, very frequently, while pg_class goes through
shared-buffers and so doesn't provoke the same stupid
write-the-whole-darn-thing behavior. That is a pretty key difference,
IMHO.

I think the cost is less about performance and more about carrying around an
attribute which wont be terribly interesting during the cluster lifetime,
except for the transition. But, it's as you say probably a manageable expense.

A bigger question is how to handle the offline capabilities. pg_checksums can
enable or disable checksums in an offline cluster, which will put the cluster
in a state where the pg_control file and the catalog don't match at startup.
One strategy could be to always trust the pg_control file and alter the catalog
accordingly, but that still leaves a window of inconsistent cluster state.

cheers ./daniel

#26

Robert Haas

robertmhaas@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#25)

Re: Online checksums patch - once again

On Thu, Jan 23, 2020 at 6:19 AM Daniel Gustafsson <daniel@yesql.se> wrote:

A bigger question is how to handle the offline capabilities. pg_checksums can
enable or disable checksums in an offline cluster, which will put the cluster
in a state where the pg_control file and the catalog don't match at startup.
One strategy could be to always trust the pg_control file and alter the catalog
accordingly, but that still leaves a window of inconsistent cluster state.

I suggest that we define things so that the catalog state is only
meaningful during a state transition. That is, suppose the cluster
state is either "on", "enabling", or "off". When it's "on", checksums
are written and verified. When it is "off", checksums are not written
and not verified. When it's "enabling", checksums are written but not
verified. Also, when and only when the state is "enabling", the
background workers that try to rewrite relations to add checksums run,
and those workers look at the catalog state to figure out what to do.
Once the state changes to "on", those workers don't run any more, and
so the catalog state does not make any difference.

A tricky problem is to handling the case where the state is switched
from "enabling" to "on" and then back to "off" and then to "enabling"
again. You don't want to confuse the state from the previous round of
enabling with the state for the current round of enabling. Suppose in
addition to storing the cluster-wide state of on/off/enabling, we also
store an "enable counter" which is incremented every time the state
goes from "off" to "enabling". Then, for each database and relation,
we store a counter that indicates the value of the enable counter at
the time we last scanned/rewrote that relation to set checksums. Now,
you're covered. And, to save space, it can probably be a 32-bit
counter, since 4 billion disable/reenable cycles ought to be enough
for anybody.

It would not be strictly necessary to store this in pg_class. Another
thing that could be done is to store it in a separate system table
that could even be truncated when enabling is not in progress - though
it would be unwise to assume that it's always truncated at the
beginning of an enabling cycle, since it would be hard to guarantee
that the previous enabling cycle didn't fail when trying to truncate.
So you'd probably still end up with something like the counter
approach. I am inclined to think that inventing a whole new catalog
for this is over-engineering, but someone might think differently.
Note that creating a table while enabling is in progress needs to set
the enabling counter for the new table to the new value of the
enabling counter, not the old one, because the new table starts empty
and won't end up with any pages that don't have valid checksums.
Similarly, TRUNCATE, CLUSTER, VACUUM FULL, and rewriting variants of
ALTER TABLE can set the new value for the enabling counter as a side
effect. That's probably easier and more efficient if it's just value
in pg_class than if they have to go poking around in another catalog.
So I am tentatively inclined to think that just putting it in pg_class
makes more sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Andres Freund

andres@anarazel.de

almost 6 years ago

In reply to: Robert Haas (#26)

Re: Online checksums patch - once again

Hi,

On 2020-01-23 12:23:09 -0500, Robert Haas wrote:

On Thu, Jan 23, 2020 at 6:19 AM Daniel Gustafsson <daniel@yesql.se> wrote:

A bigger question is how to handle the offline capabilities. pg_checksums can
enable or disable checksums in an offline cluster, which will put the cluster
in a state where the pg_control file and the catalog don't match at startup.
One strategy could be to always trust the pg_control file and alter the catalog
accordingly, but that still leaves a window of inconsistent cluster state.

I suggest that we define things so that the catalog state is only
meaningful during a state transition. That is, suppose the cluster
state is either "on", "enabling", or "off". When it's "on", checksums
are written and verified. When it is "off", checksums are not written
and not verified. When it's "enabling", checksums are written but not
verified. Also, when and only when the state is "enabling", the
background workers that try to rewrite relations to add checksums run,
and those workers look at the catalog state to figure out what to do.
Once the state changes to "on", those workers don't run any more, and
so the catalog state does not make any difference.

A tricky problem is to handling the case where the state is switched
from "enabling" to "on" and then back to "off" and then to "enabling"
again. You don't want to confuse the state from the previous round of
enabling with the state for the current round of enabling. Suppose in
addition to storing the cluster-wide state of on/off/enabling, we also
store an "enable counter" which is incremented every time the state
goes from "off" to "enabling". Then, for each database and relation,
we store a counter that indicates the value of the enable counter at
the time we last scanned/rewrote that relation to set checksums. Now,
you're covered. And, to save space, it can probably be a 32-bit
counter, since 4 billion disable/reenable cycles ought to be enough
for anybody.

It would not be strictly necessary to store this in pg_class. Another
thing that could be done is to store it in a separate system table
that could even be truncated when enabling is not in progress - though
it would be unwise to assume that it's always truncated at the
beginning of an enabling cycle, since it would be hard to guarantee
that the previous enabling cycle didn't fail when trying to truncate.
So you'd probably still end up with something like the counter
approach. I am inclined to think that inventing a whole new catalog
for this is over-engineering, but someone might think differently.
Note that creating a table while enabling is in progress needs to set
the enabling counter for the new table to the new value of the
enabling counter, not the old one, because the new table starts empty
and won't end up with any pages that don't have valid checksums.
Similarly, TRUNCATE, CLUSTER, VACUUM FULL, and rewriting variants of
ALTER TABLE can set the new value for the enabling counter as a side
effect. That's probably easier and more efficient if it's just value
in pg_class than if they have to go poking around in another catalog.
So I am tentatively inclined to think that just putting it in pg_class
makes more sense.

I'm somewhat inclined to think that it's worth first making this robust
without catalog state - even though I find restartability
important. Especially due to not having convenient ways to have cross
database state that we can reset without again needing background
workers. I also wonder if it's not worthwhile to design the feature in a
way that, *in the future*, checksums could be separately set on the
standby/primary - without needing to ship the whole database through
WAL.

Oh, if all relation types had a metapage with a common header, this
would be so much easier...

It'd also be a lot easier if we could map from relfilenode back to a
relation oid, without needing catalog access. That'd allow us to acquire
locks on the relation for a filenode, without needing to be connected to
a database. Again, with a metapage, that'd be quite doable.

Probably not worth doing just for this, but I'm wondering about solving
the metapage issue by just adding a metadata relation fork. Sucks to
increase the number of files further, but I don't really see a path
towards having a metapage for everything, given pg_upgrade compat
requirements. Such a metadata fork, in contrast, could easily be filled
by pg_upgrade. That metadata file could perhaps replace init forks too.

For debuggability storing some information about the relation in that
metadata fork would be great. Being able to identify the type of
relation etc from there, and perhaps even the relname at creation, would
certainly be helpful for cases the database doesn't start up anymore.

With a bit of care, we could allow AMs to store additional information
in there, by having a offset pointer for am information in the common
header.

E.g. for tables it'd be feasible to have the types of columns in there
(since it's associated with a relfilenode, rather than relation, there's
no problem with rewrites), allowing to correctly interpret data without
catalog access when shit has hit the fan.

Greetings,

Andres Freund

#28

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Daniel Gustafsson (#19)

Re: Online checksums patch - once again

On 1/18/20 6:18 PM, Daniel Gustafsson wrote:

Attached is a v16 rebased on top of current master which addresses the above
commented points, and which I am basing the concurrency work on.

This patch no longer applies cleanly:
http://cfbot.cputube.org/patch_27_2260.log

The CF entry has been updated to Waiting on Author.

Regards,

--
-David
david@pgmasters.net

#29

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: David Steele (#28)

Re: Online checksums patch - once again

On 4/1/20 11:30 AM, David Steele wrote:

On 1/18/20 6:18 PM, Daniel Gustafsson wrote:

Attached is a v16 rebased on top of current master which addresses the
above
commented points, and which I am basing the concurrency work on.

This patch no longer applies cleanly:
http://cfbot.cputube.org/patch_27_2260.log

The CF entry has been updated to Waiting on Author.

Regards,

There has been review on this patch but no updates in some time. As far
as I can see there's debate on how to mark relations as fully
checksummed and/or how to resume.

I'm marking this patch Returned with Feedback. Please feel free to
resubmit when it is again ready for review.

Regards,
--
-David
david@pgmasters.net

#30

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#26)

1 attachment(s)

Re: Online checksums patch - once again

On 23 Jan 2020, at 18:23, Robert Haas <robertmhaas@gmail.com> wrote:

..That's probably easier and more efficient if it's just value
in pg_class than if they have to go poking around in another catalog.
So I am tentatively inclined to think that just putting it in pg_class
makes more sense.

..which is what I did, but more on that later.

Attached is a new version of the online checksums patch which, I hope, address
most of the concerns raised in previous reviews. There has been a fair amount
of fiddling done, so below is a summary of what has been done.

Error handling and synchronization around pg_control have been overhauled as
well as the absorption of the process barriers. The comment there suggested
that the absorbing function shouldn't reside with the procsignal code, input is
gladly received on where it makes the most sense since I can see merit to quite
a few places.

The checksumhelper is renamed datachecksumsworker, since checksumhelper is now
used since c12e43a2e0d45a6b59f2. I think there is room for better casing here
and there on this.

Restartability is implemented by keeping state in pg_class. I opted for a bool
which is cleared as the first step of checksum enable, since it offers fewer
synchronization cornercases I think. The field is only useful during
processing, and is not guaranteed to reflect reality outside of processing.
The current name I came up with does not convey that, better suggestions are
more than welcome. For now, the process must be restarted manually by running
pg_enable_data_checksums() again, which I sort of like but that might just be
Stockholm syndrome from having enabled/disabled checksums locally a gazillion
times.

Testing has been extended to cover basics but also restartability. Testing a
resumed restart is a tad tricky while still avoiding timing related tests, so
I've (possibly ab-)used an interactive psql session to act as a blocker to keep
processing from finishing. I did extend poll_query_until to take a non-default
timeout to make sure these tests finish in a reasonable time during hacking,
but thats left out of this version.

There are a few TODO markers left where I'd appreciate input from reviewers,
for example what to do for disabling already disabled checksums (is it a LOG,
NOTICE, ERROR or silent return?)

This is an executive summary of hacking done, and I have most likely forgotten
to mention something important, but I hope it covers most things. Few, if any,
changes are made to the interface of this, changes are contained under the
hood. I will stick this patch in the upcoming commitfest.

cheers ./daniel

Attachments:

online_checksums18.patchapplication/octet-stream; name=online_checksums18.patch; x-unix-mode=0644Download

From a11c59b790202f9fa7c768be6c8bdd724ac3488b Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Thu, 26 Mar 2020 14:40:23 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v18

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A new value "inprogress" is added for data_checksums during which
writes will set the checksum but reads wont enforce it. When all pages
have been checksummed the value will change to "on" which will enforce
the checksums on read. At this point, the cluster has the same state
as if checksums were enabled via initdb. If the cluster is restarted
during processing, the worker will attempt to resume to avoid doing
the same relation twice.

Checksums are being added via a background worker DatachecksumsWorker
which will process all pages in all databases. Pages accessed via
concurrent write operations will be checksummed with the normal process.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   65 +
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/rmgrdesc/xlogdesc.c       |   16 +
 src/backend/access/transam/xlog.c            |  173 ++-
 src/backend/access/transam/xlogfuncs.c       |   99 ++
 src/backend/catalog/heap.c                   |    1 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1197 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    2 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/ipc/ipci.c               |    2 +
 src/backend/storage/ipc/procsignal.c         |   43 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |    6 +-
 src/backend/utils/adt/pgstatfuncs.c          |    4 +-
 src/backend/utils/cache/relcache.c           |    4 +
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   36 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   15 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   42 +
 src/include/storage/bufpage.h                |    2 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |    9 +-
 src/include/utils/rel.h                      |    7 +
 src/test/Makefile                            |    3 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   86 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |   96 ++
 47 files changed, 2204 insertions(+), 48 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 5a66115df1..854eb283cd 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2157,6 +2157,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b7c450ea29..7dc8d9c21d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25077,6 +25077,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>inprogress</literal> as well as start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch data checksums mode to
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 1635fcb1fd..365e4acb69 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index bd9fae544c..fdc0fc2080 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later timne, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="pg_checksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..f5b75a843d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1256a103b..8ca8aa4f4a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -251,6 +252,11 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -892,6 +898,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1077,7 +1084,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4888,9 +4895,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4927,10 +4932,116 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsNeedVerify(void)
+{
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	Assert(ControlFile != NULL);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	XlogChecksums(0);
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress";
+	else
+		return "off";
 }
 
 /*
@@ -7916,6 +8027,18 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in an inprogress state (either
+	 * being enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9759,6 +9882,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10214,6 +10355,26 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..6c7b674f90 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,101 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled. TODO: it could be argued that this should be a NOTICE, LOG
+	 * or perhaps even an error; or maybe nothing at all with a silent return.
+	 * For now we LOG and return, but this needs to be revisited.
+	 */
+	if (!DataChecksumsNeedWrite())
+	{
+		ereport(LOG,
+				(errmsg("data checksums already disabled")));
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * Shutting down a concurrently running datachecksumworker will not block
+	 * awaiting shutdown, but we can continue turning off checksums anyway
+	 * since it will at most finish the block it had already started and then
+	 * abort.
+	 */
+	ShutdownDatachecksumsWorkerIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR, (errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR, (errmsg("cost limit must be a positive value")));
+
+	if (DataChecksumWorkerStarted())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksum worker already running"),
+				 errhint("Retry the operation later to allow time for the worker to finish.")));
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * data checksums on -> on is not a valid state transition as there is
+	 * nothing to do, but it's debatable whether it should be an ERROR, a
+	 * LOG/NOTICE or just returning VOID silently. Figuring this out is a TODO
+	 * much like for the inverse case of disabling disabled checksums.
+	 */
+	if (DataChecksumsNeedVerify())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksums already enabled")));
+	}
+	/*
+	 * If the state is set to inprogress but the worker isn't running, then
+	 * the data checksumming was prematurely terminated. Attempt to continue
+	 * processing data pages where we left off based on state stored in the
+	 * catalog.
+	 */
+	else if (DataChecksumsOnInProgress())
+	{
+		ereport(LOG,
+				(errmsg("data checksums partly enabled, continuing processing")));
+
+		StartDatachecksumsWorkerLauncher(ENABLE_CHECKSUMS, cost_delay, cost_limit);
+	}
+	/*
+	 * We are starting a checksumming process scratch, and need to start by
+	 * clearing the state in pg_class in case checksums have ever been enabled
+	 * before (either fully or partly). As soon as we set the checksum state
+	 * to inprogress new relations will set relhaschecksums in pg_class so it
+	 * must be done first.
+	 */
+	else
+	{
+		StartDatachecksumsWorkerLauncher(RESET_STATE, cost_delay, cost_limit);
+	}
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9c45544815..07d70aa9b6 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -921,6 +921,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relispopulated - 1] = BoolGetDatum(rd_rel->relispopulated);
 	values[Anum_pg_class_relreplident - 1] = CharGetDatum(rd_rel->relreplident);
 	values[Anum_pg_class_relispartition - 1] = BoolGetDatum(rd_rel->relispartition);
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	values[Anum_pg_class_relrewrite - 1] = ObjectIdGetDatum(rd_rel->relrewrite);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
 	values[Anum_pg_class_relminmxid - 1] = MultiXactIdGetDatum(rd_rel->relminmxid);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..f9745cc09c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1222,6 +1222,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2212b19b86 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumStateInDatabase", ResetDataChecksumStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..4295215d5e
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1197 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which was not initialized with checksums, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and controlfile, no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled clusterwide, with on/off being
+ * the endstate for data_checkums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checkums in an online cluster, data_checksums will be set to
+ * inprogress which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exists at the
+ * start of checksumming, and all of these which havent been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * inprogress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum, this will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the inprogress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * In case checksums have been enabled and later disabled, when re-enabling
+ * pg_class.relhaschecksums will be reset to false before entering inprogress
+ * mode to ensure that all relations are re-processed.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * Disabling checksums is done as an immediate operation as it only updates
+ * the controlfile and accompanying local state in the backends. No changes
+ * to pg_class.relhaschecksums is performed as it only tracks state during
+ * enabling.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they are
+	 * in shared memory, they are never concurrently accessed. When a worker
+	 * is running, the launcher is only waiting for that worker to finish.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+	DataChecksumOperation	operation;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			DatachecksumsWorkerRelation;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid						dboid;
+	DatachecksumsWorkerResult	result;
+	int						retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase * db);
+static bool ProcessAllDatabases(bool already_connected);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+
+bool
+DataChecksumWorkerStarted(void)
+{
+	bool		started = false;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort)
+		started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+/*
+ * Main entry point for datachecksumsworker launcher process.
+ */
+void
+StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+								 int cost_delay, int cost_limit)
+{
+	BackgroundWorker		bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * This can be hit during a short window during which the worker is
+	 * shutting down. Once done the worker will clear the abort flag and
+	 * re-processing can be performed.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("data checksums worker has been aborted")));
+	}
+
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(NOTICE,
+				(errmsg("data checksums worker is already running")));
+		return;
+	}
+
+	/* Whether to enable or disable checksums */
+	DatachecksumsWorkerShmem->operation = op;
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "background worker \"datachecksumsworker\" starting to process relation %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We don't consider this an error since
+		 * there are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1,
+			 "background worker \"datachecksumsworker\" skipping relation %u as it no longer exists",
+			 relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "background worker \"datachecksumsworker\" done with relation %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation		rel;
+	Form_pg_class	pg_class_tuple;
+	HeapTuple		tuple;
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerMain");
+	else if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ResetDataChecksumStateInDatabase");
+	else
+		elog(ERROR, "invalid datachecksumworker operation requested: %d",
+			 DatachecksumsWorkerShmem->operation);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the next
+	 * database and quite likely fail with the same problem. TODO: Maybe we
+	 * need a backoff to avoid running through all the databases here in short
+	 * order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling checksums in \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling checksums in \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database
+	 * so we have no alternative other than exiting. When enabling checksums
+	 * we won't at this time have changed the pg_control version to enabled
+	 * so when the cluster comes back up processing will habe to be resumed.
+	 * When disabling, the pg_control version will be set to off before this
+	 * so when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable checksums without the postmaster process"),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("started background worker \"datachecksumsworker\" in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("background worker for enabling checksums was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId	waitforxid;
+	bool			aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId	oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we wont be able to enable checksums
+			 * clusterwide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool connected = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+	{
+		if (!ProcessAllDatabases(connected))
+		{
+			/*
+			 * Before we error out make sure we clear state since this may
+			 * otherwise render the worker stuck without possibility of a
+			 * restart.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->launcher_started = false;
+			DatachecksumsWorkerShmem->abort = false;
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to finish processing")));
+		}
+
+		connected = true;
+		SetDataChecksumsOnInProgress();
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->operation = ENABLE_CHECKSUMS;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	/*
+	 * Prepare for datachecksumworker shutdown, once we signal that checksums
+	 * are enabled we want the worker to be done and exited to avoid races
+	 * with immediate disabling/enabling.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * If processing succeeds for ENABLE_CHECKSUMS, then everything has been
+	 * processed so set checksums as enabled clusterwide
+	 */
+	if (ProcessAllDatabases(connected))
+	{
+		SetDataChecksumsOn();
+		ereport(LOG,
+			(errmsg("checksums enabled clusterwide")));
+	}
+}
+
+static bool
+ProcessAllDatabases(bool already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool			found;
+
+			elog(DEBUG1, "Starting processing of database %s with oid %u", db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+								HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+				/* Abort flag set, so exit the whole process */
+				return false;
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "completed one pass over all databases for checksum enabling, %i databases processed",
+			 processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exist.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+		/*
+		 * Even if this is a redundant assignment, we want to be explicit about
+		 * our intent for readability, since we want to be able to query this
+		 * state in case of restartability.
+		 */
+		DatachecksumsWorkerShmem->launcher_started = false;
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned. Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		DatachecksumsWorkerRelation *relentry;
+
+		if (!RELKIND_HAS_STORAGE(pgc->relkind) ||
+			pgc->relpersistence == RELPERSISTENCE_TEMP)
+			continue;
+
+		if (pgc->relhaschecksums)
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (DatachecksumsWorkerRelation *) palloc(sizeof(DatachecksumsWorkerRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != RELPERSISTENCE_TEMP)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+void
+ResetDataChecksumStateInDatabase(Datum arg)
+{
+	Relation		rel;
+	HeapTuple		tuple;
+	Oid				dboid = DatumGetObjectId(arg);
+	TableScanDesc	scan;
+	Form_pg_class	pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" starting for database oid %d to reset state",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" completed resetting state in database oid %d",
+					dboid)));
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" starting for database oid %d",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		DatachecksumsWorkerRelation *rel = (DatachecksumsWorkerRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("background worker \"datachecksumsworker\" aborted in database oid %d",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the inprogress state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we wont be able to enable checksums
+		 * clusterwide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" completed in database oid %d",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c022597bc0..4b211e8298 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3770,6 +3770,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 096b0fcef0..5f81bbb78d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1595,7 +1595,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..1e0226166f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -196,6 +196,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..ddfcbdd61a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4fa385b0ec..91340192c5 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,10 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +499,18 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+	{
+		ProcessBarrierChecksumOnInProgress();
+	}
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+	{
+		ProcessBarrierChecksumOn();
+	}
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+	{
+		ProcessBarrierChecksumOff();
+	}
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +523,21 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..42f1b23aec 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+DatachecksumsWorkerLock				45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 4e45bd92ab..3ab61abd0a 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index d708117a40..4c6deaae8b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1167,7 +1167,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1194,7 +1194,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466..6f04af6c4c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1559,7 +1559,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1577,7 +1577,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0b9eb00d2d..aceebf58ad 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1875,6 +1875,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3483,6 +3485,8 @@ RelationBuildLocalRelation(const char *relname,
 	else
 		rel->rd_rel->relispopulated = true;
 
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/* set replica identity -- system catalogs and non-tables don't have one */
 	if (!IsCatalogNamespace(relnamespace) &&
 		(relkind == RELKIND_RELATION ||
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index cca9704d2d..09d36c507b 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -247,6 +247,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f4247ea70d..06443b08b1 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 75fc6f11d6..f9adfc3bc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -73,6 +74,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -494,6 +496,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -602,7 +614,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1903,17 +1915,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4755,6 +4756,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 1daa5aed0e..f5468d0cd9 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -597,7 +597,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..4bbf7b36a5 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8b90cefbe0..a806cc6d0e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 347a38f57c..3cffe4f828 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -199,7 +199,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -315,7 +315,18 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..b180ca7b0f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -245,6 +246,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 78b33b2a7f..1b8b291d2b 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..73a5495335 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..287f618197 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10862,6 +10862,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 18bc8a7b90..41d2082b29 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -322,6 +322,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..701bde851b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -852,6 +852,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..62aeabf9f6
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 0,
+	RESET_STATE
+}			DataChecksumOperation;
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void 		ResetDataChecksumStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 3f88683a05..7f2dbbf630 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,8 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..05f85861e3 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,9 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0b5957ba02..ab30511d84 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -611,6 +611,13 @@ typedef struct ViewOptions
  */
 #define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
 
+/*
+ * RelationHasDataChecksums
+ *		True if all data pages of the relation have data checksums.
+ */
+#define RelationHasDataChecksums(relation) \
+	((relation->rd_rel->relhasdatachecksums)
+
 /*
  * RelationIsAccessibleInLogicalDecoding
  *		True if we need to log enough information to have access via
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..9dbb660937
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d908b95561
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..b276027453
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,96 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to inprogress
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the primary
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#31

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Daniel Gustafsson (#30)

Re: Online checksums patch - once again

On Mon, Jun 22, 2020 at 8:27 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Restartability is implemented by keeping state in pg_class. I opted for a bool
which is cleared as the first step of checksum enable, since it offers fewer
synchronization cornercases I think.

Unless you take AccessExclusiveLock on the table, this probably needs
to be three-valued. Or maybe I am misunderstanding the design...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#31)

1 attachment(s)

Re: Online checksums patch - once again

On 22 Jun 2020, at 18:29, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 22, 2020 at 8:27 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Restartability is implemented by keeping state in pg_class. I opted for a bool
which is cleared as the first step of checksum enable, since it offers fewer
synchronization cornercases I think.

Unless you take AccessExclusiveLock on the table, this probably needs
to be three-valued. Or maybe I am misunderstanding the design...

Sorry being a bit thick, can you elaborate which case you're thinking about?
CREATE TABLE sets the attribute according to the value of data_checksums, and
before enabling checksums (and before changing data_checksums to inprogress)
the bgworker will update all relhaschecksums from true (if any) to false. Once
the state is set to inprogress all new relations will set relhaschecksums to
true.

The attached v19 fixes a few doc issues I had missed.

cheers ./daniel

Attachments:

online_checksums19.patchapplication/octet-stream; name=online_checksums19.patch; x-unix-mode=0644Download

From ba00b283f2000dcb82551010140ff84832fdcb94 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Thu, 26 Mar 2020 14:40:23 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v19

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A new value "inprogress" is added for data_checksums during which
writes will set the checksum but reads wont enforce it. When all pages
have been checksummed the value will change to "on" which will enforce
the checksums on read. At this point, the cluster has the same state
as if checksums were enabled via initdb.

Checksums are being added via a background worker ChecksumHelper which
will process all pages in all databases. Pages accessed via concurrent
write operations will be checksummed with the normal process.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   65 +
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/rmgrdesc/xlogdesc.c       |   16 +
 src/backend/access/transam/xlog.c            |  173 ++-
 src/backend/access/transam/xlogfuncs.c       |   99 ++
 src/backend/catalog/heap.c                   |    1 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1197 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    2 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/ipc/ipci.c               |    2 +
 src/backend/storage/ipc/procsignal.c         |   43 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |    6 +-
 src/backend/utils/adt/pgstatfuncs.c          |    4 +-
 src/backend/utils/cache/relcache.c           |    4 +
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   36 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   15 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   42 +
 src/include/storage/bufpage.h                |    2 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |    9 +-
 src/include/utils/rel.h                      |    7 +
 src/test/Makefile                            |    3 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   86 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |   96 ++
 47 files changed, 2204 insertions(+), 48 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 49a881b262..1896e39acb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2157,6 +2157,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b7c450ea29..7dc8d9c21d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25077,6 +25077,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Checksum <acronym>SQL</acronym> Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+      </row>
+     </thead>
+     <tbody>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_enable_data_checksums(<optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional>)</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        <para>
+         Initiates data checksums for the cluster. This will switch the data checksums mode
+         to <literal>inprogress</literal> as well as start a background worker that will process
+         all data in the database and enable checksums for it. When all data pages have had
+         checksums enabled, the cluster will automatically switch data checksums mode to
+         <literal>on</literal>.
+        </para>
+        <para>
+         If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+         specified, the speed of the process is throttled using the same principles as
+         <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+        </para>
+       </entry>
+      </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <literal><function>pg_disable_data_checksums()</function></literal>
+       </entry>
+       <entry>
+        void
+       </entry>
+       <entry>
+        Disables data checksums for the cluster.
+       </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 1635fcb1fd..365e4acb69 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index bd9fae544c..fde104350d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later timne, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 1cd97852e8..f5b75a843d 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e455384b5b..bf8d3647e1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -251,6 +252,11 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -895,6 +901,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1080,7 +1087,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4891,9 +4898,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4930,10 +4935,116 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsNeedVerify(void)
+{
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * inprogress state they are only updated, not verified.
+	 */
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	Assert(ControlFile != NULL);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in inprogress mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	XlogChecksums(0);
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress";
+	else
+		return "off";
 }
 
 /*
@@ -7919,6 +8030,18 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in an inprogress state (either
+	 * being enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9767,6 +9890,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10222,6 +10363,26 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..6c7b674f90 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,101 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled. TODO: it could be argued that this should be a NOTICE, LOG
+	 * or perhaps even an error; or maybe nothing at all with a silent return.
+	 * For now we LOG and return, but this needs to be revisited.
+	 */
+	if (!DataChecksumsNeedWrite())
+	{
+		ereport(LOG,
+				(errmsg("data checksums already disabled")));
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * Shutting down a concurrently running datachecksumworker will not block
+	 * awaiting shutdown, but we can continue turning off checksums anyway
+	 * since it will at most finish the block it had already started and then
+	 * abort.
+	 */
+	ShutdownDatachecksumsWorkerIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR, (errmsg("cost delay cannot be less than zero")));
+	if (cost_limit <= 0)
+		ereport(ERROR, (errmsg("cost limit must be a positive value")));
+
+	if (DataChecksumWorkerStarted())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksum worker already running"),
+				 errhint("Retry the operation later to allow time for the worker to finish.")));
+		PG_RETURN_VOID();
+	}
+
+	/*
+	 * data checksums on -> on is not a valid state transition as there is
+	 * nothing to do, but it's debatable whether it should be an ERROR, a
+	 * LOG/NOTICE or just returning VOID silently. Figuring this out is a TODO
+	 * much like for the inverse case of disabling disabled checksums.
+	 */
+	if (DataChecksumsNeedVerify())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksums already enabled")));
+	}
+	/*
+	 * If the state is set to inprogress but the worker isn't running, then
+	 * the data checksumming was prematurely terminated. Attempt to continue
+	 * processing data pages where we left off based on state stored in the
+	 * catalog.
+	 */
+	else if (DataChecksumsOnInProgress())
+	{
+		ereport(LOG,
+				(errmsg("data checksums partly enabled, continuing processing")));
+
+		StartDatachecksumsWorkerLauncher(ENABLE_CHECKSUMS, cost_delay, cost_limit);
+	}
+	/*
+	 * We are starting a checksumming process scratch, and need to start by
+	 * clearing the state in pg_class in case checksums have ever been enabled
+	 * before (either fully or partly). As soon as we set the checksum state
+	 * to inprogress new relations will set relhaschecksums in pg_class so it
+	 * must be done first.
+	 */
+	else
+	{
+		StartDatachecksumsWorkerLauncher(RESET_STATE, cost_delay, cost_limit);
+	}
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 3c83fe6bab..835bea230a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -921,6 +921,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relispopulated - 1] = BoolGetDatum(rd_rel->relispopulated);
 	values[Anum_pg_class_relreplident - 1] = CharGetDatum(rd_rel->relreplident);
 	values[Anum_pg_class_relispartition - 1] = BoolGetDatum(rd_rel->relispartition);
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	values[Anum_pg_class_relrewrite - 1] = ObjectIdGetDatum(rd_rel->relrewrite);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
 	values[Anum_pg_class_relminmxid - 1] = MultiXactIdGetDatum(rd_rel->relminmxid);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..f9745cc09c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1222,6 +1222,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..2212b19b86 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumStateInDatabase", ResetDataChecksumStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..4295215d5e
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1197 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which was not initialized with checksums, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and controlfile, no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled clusterwide, with on/off being
+ * the endstate for data_checkums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checkums in an online cluster, data_checksums will be set to
+ * inprogress which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exists at the
+ * start of checksumming, and all of these which havent been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * inprogress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum, this will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the inprogress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * In case checksums have been enabled and later disabled, when re-enabling
+ * pg_class.relhaschecksums will be reset to false before entering inprogress
+ * mode to ensure that all relations are re-processed.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * Disabling checksums is done as an immediate operation as it only updates
+ * the controlfile and accompanying local state in the backends. No changes
+ * to pg_class.relhaschecksums is performed as it only tracks state during
+ * enabling.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Access to other members can be done without a lock, as while they are
+	 * in shared memory, they are never concurrently accessed. When a worker
+	 * is running, the launcher is only waiting for that worker to finish.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+	/* Parameter values set on start */
+	int			cost_delay;
+	int			cost_limit;
+	DataChecksumOperation	operation;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			DatachecksumsWorkerRelation;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid						dboid;
+	DatachecksumsWorkerResult	result;
+	int						retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase * db);
+static bool ProcessAllDatabases(bool already_connected);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+
+bool
+DataChecksumWorkerStarted(void)
+{
+	bool		started = false;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort)
+		started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+/*
+ * Main entry point for datachecksumsworker launcher process.
+ */
+void
+StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+								 int cost_delay, int cost_limit)
+{
+	BackgroundWorker		bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * This can be hit during a short window during which the worker is
+	 * shutting down. Once done the worker will clear the abort flag and
+	 * re-processing can be performed.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("data checksums worker has been aborted")));
+	}
+
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(NOTICE,
+				(errmsg("data checksums worker is already running")));
+		return;
+	}
+
+	/* Whether to enable or disable checksums */
+	DatachecksumsWorkerShmem->operation = op;
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber b;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	for (b = 0; b < numblocks; b++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((b % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], b, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "background worker \"datachecksumsworker\" starting to process relation %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exist. We don't consider this an error since
+		 * there are no pages in it that need checksums, and thus return true.
+		 */
+		elog(DEBUG1,
+			 "background worker \"datachecksumsworker\" skipping relation %u as it no longer exists",
+			 relationId);
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "background worker \"datachecksumsworker\" done with relation %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation		rel;
+	Form_pg_class	pg_class_tuple;
+	HeapTuple		tuple;
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerMain");
+	else if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ResetDataChecksumStateInDatabase");
+	else
+		elog(ERROR, "invalid datachecksumworker operation requested: %d",
+			 DatachecksumsWorkerShmem->operation);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the next
+	 * database and quite likely fail with the same problem. TODO: Maybe we
+	 * need a backoff to avoid running through all the databases here in short
+	 * order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling checksums in \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling checksums in \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database
+	 * so we have no alternative other than exiting. When enabling checksums
+	 * we won't at this time have changed the pg_control version to enabled
+	 * so when the cluster comes back up processing will habe to be resumed.
+	 * When disabling, the pg_control version will be set to off before this
+	 * so when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable checksums without the postmaster process"),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("started background worker \"datachecksumsworker\" in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart the checksumming process by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("background worker for enabling checksums was aborted during processing in \"%s\"",
+						db->dbname)));
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" in \"%s\" completed",
+					db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId	waitforxid;
+	bool			aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId	oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we wont be able to enable checksums
+			 * clusterwide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool connected = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+	{
+		if (!ProcessAllDatabases(connected))
+		{
+			/*
+			 * Before we error out make sure we clear state since this may
+			 * otherwise render the worker stuck without possibility of a
+			 * restart.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->launcher_started = false;
+			DatachecksumsWorkerShmem->abort = false;
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to finish processing")));
+		}
+
+		connected = true;
+		SetDataChecksumsOnInProgress();
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->operation = ENABLE_CHECKSUMS;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	/*
+	 * Prepare for datachecksumworker shutdown, once we signal that checksums
+	 * are enabled we want the worker to be done and exited to avoid races
+	 * with immediate disabling/enabling.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * If processing succeeds for ENABLE_CHECKSUMS, then everything has been
+	 * processed so set checksums as enabled clusterwide
+	 */
+	if (ProcessAllDatabases(connected))
+	{
+		SetDataChecksumsOn();
+		ereport(LOG,
+			(errmsg("checksums enabled clusterwide")));
+	}
+}
+
+static bool
+ProcessAllDatabases(bool already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool			found;
+
+			elog(DEBUG1, "Starting processing of database %s with oid %u", db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+								HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+				/* Abort flag set, so exit the whole process */
+				return false;
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "completed one pass over all databases for checksum enabling, %i databases processed",
+			 processed_databases);
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exist.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+		/*
+		 * Even if this is a redundant assignment, we want to be explicit about
+		 * our intent for readability, since we want to be able to query this
+		 * state in case of restartability.
+		 */
+		DatachecksumsWorkerShmem->launcher_started = false;
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to add
+ * checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned. Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		DatachecksumsWorkerRelation *relentry;
+
+		if (!RELKIND_HAS_STORAGE(pgc->relkind) ||
+			pgc->relpersistence == RELPERSISTENCE_TEMP)
+			continue;
+
+		if (pgc->relhaschecksums)
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (DatachecksumsWorkerRelation *) palloc(sizeof(DatachecksumsWorkerRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in database
+ *
+ * Returns a List of oids.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != RELPERSISTENCE_TEMP)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+void
+ResetDataChecksumStateInDatabase(Datum arg)
+{
+	Relation		rel;
+	HeapTuple		tuple;
+	Oid				dboid = DatumGetObjectId(arg);
+	TableScanDesc	scan;
+	Form_pg_class	pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" starting for database oid %d to reset state",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" completed resetting state in database oid %d",
+					dboid)));
+}
+
+/*
+ * Main function for enabling checksums in a single database
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" starting for database oid %d",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access those files and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		DatachecksumsWorkerRelation *rel = (DatachecksumsWorkerRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("background worker \"datachecksumsworker\" aborted in database oid %d",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the inprogress state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table left to wait for */
+		snprintf(activity, sizeof(activity), "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we wont be able to enable checksums
+		 * clusterwide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" completed in database oid %d",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c022597bc0..4b211e8298 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3770,6 +3770,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 096b0fcef0..5f81bbb78d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1595,7 +1595,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index c2e5e3abf8..1e0226166f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -196,6 +196,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 427b0d59cd..ddfcbdd61a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -27,6 +27,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -255,6 +256,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4fa385b0ec..91340192c5 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,10 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +499,18 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+	{
+		ProcessBarrierChecksumOnInProgress();
+	}
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+	{
+		ProcessBarrierChecksumOn();
+	}
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+	{
+		ProcessBarrierChecksumOff();
+	}
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +523,21 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6985e8eed..42f1b23aec 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -50,3 +50,4 @@ MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
 LogicalRepWorkerLock				43
 XactTruncationLock					44
+DatachecksumsWorkerLock				45
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index 4e45bd92ab..3ab61abd0a 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index d708117a40..4c6deaae8b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1167,7 +1167,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1194,7 +1194,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2aff739466..6f04af6c4c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1559,7 +1559,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1577,7 +1577,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 0b9eb00d2d..aceebf58ad 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1875,6 +1875,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3483,6 +3485,8 @@ RelationBuildLocalRelation(const char *relname,
 	else
 		rel->rd_rel->relispopulated = true;
 
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/* set replica identity -- system catalogs and non-tables don't have one */
 	if (!IsCatalogNamespace(relnamespace) &&
 		(relkind == RELKIND_RELATION ||
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index cca9704d2d..09d36c507b 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -247,6 +247,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f4247ea70d..06443b08b1 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 75fc6f11d6..f9adfc3bc2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -73,6 +74,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -494,6 +496,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -602,7 +614,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1903,17 +1915,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4755,6 +4756,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 1daa5aed0e..f5468d0cd9 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -597,7 +597,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..4bbf7b36a5 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The user
+	 * should either let the process finish, or turn off checksums, before
+	 * retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8b90cefbe0..a806cc6d0e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 77ac4e785f..f46816d6f9 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -199,7 +199,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -317,7 +317,18 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index c8869d5226..b180ca7b0f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -245,6 +246,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 78b33b2a7f..1b8b291d2b 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index de5670e538..73a5495335 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 61f2c2f5b4..287f618197 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10862,6 +10862,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 18bc8a7b90..41d2082b29 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -322,6 +322,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..701bde851b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -852,6 +852,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..62aeabf9f6
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 0,
+	RESET_STATE
+}			DataChecksumOperation;
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void 		ResetDataChecksumStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 3f88683a05..7f2dbbf630 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,8 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..05f85861e3 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,9 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 0b5957ba02..ab30511d84 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -611,6 +611,13 @@ typedef struct ViewOptions
  */
 #define RelationIsPopulated(relation) ((relation)->rd_rel->relispopulated)
 
+/*
+ * RelationHasDataChecksums
+ *		True if all data pages of the relation have data checksums.
+ */
+#define RelationHasDataChecksums(relation) \
+	((relation->rd_rel->relhasdatachecksums)
+
 /*
  * RelationIsAccessibleInLogicalDecoding
  *		True if we need to log enough information to have access via
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..9dbb660937
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d908b95561
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..b276027453
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,96 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to inprogress
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to inprogress or on
+# Normally it would be "inprogress", but it is theoretically possible for the primary
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#33

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Daniel Gustafsson (#32)

Re: Online checksums patch - once again

On Thu, Jun 25, 2020 at 5:43 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Sorry being a bit thick, can you elaborate which case you're thinking about?
CREATE TABLE sets the attribute according to the value of data_checksums, and
before enabling checksums (and before changing data_checksums to inprogress)
the bgworker will update all relhaschecksums from true (if any) to false. Once
the state is set to inprogress all new relations will set relhaschecksums to
true.

Oh, I think I was the one who was confused. I guess relhaschecksums
only really has meaning when we're in the process of enabling
checksums? So if we're in that state, then the Boolean tells us
whether a particular relation is done, and otherwise it doesn't
matter?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#34

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#33)

Re: Online checksums patch - once again

On 26 Jun 2020, at 14:12, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 25, 2020 at 5:43 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Sorry being a bit thick, can you elaborate which case you're thinking about?
CREATE TABLE sets the attribute according to the value of data_checksums, and
before enabling checksums (and before changing data_checksums to inprogress)
the bgworker will update all relhaschecksums from true (if any) to false. Once
the state is set to inprogress all new relations will set relhaschecksums to
true.

Oh, I think I was the one who was confused. I guess relhaschecksums
only really has meaning when we're in the process of enabling
checksums? So if we're in that state, then the Boolean tells us
whether a particular relation is done, and otherwise it doesn't
matter?

That is correct (which is why the name is terrible since it doesn't convey
that at all).

cheers ./daniel

#35

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Daniel Gustafsson (#32)

Re: Online checksums patch - once again

On Thu, Jun 25, 2020 at 11:43:00AM +0200, Daniel Gustafsson wrote:

The attached v19 fixes a few doc issues I had missed.

+ They can also be enabled or disabled at a later timne, either as an offline
=> time

+ * awaiting shutdown, but we can continue turning off checksums anyway
=> a waiting

+ * We are starting a checksumming process scratch, and need to start by
=> FROM scratch

+ * to inprogress new relations will set relhaschecksums in pg_class so it
=> inprogress COMMA

+ * Relation no longer exist. We don't consider this an error since
=> exists

+ * so when the cluster comes back up processing will habe to be resumed.
=> have

+ "completed one pass over all databases for checksum enabling, %i databases processed",
=> I think this will be confusing to be hardcoded "one". It'll say "one" over
and over.

+ * still exist.
=> exists

In many places, you refer to "datachecksumsworker" (sums) but in nine places
you refer to datachecksumworker (sum).

+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);

=> I think looping over numblocks is safe since new blocks are intended to be
written with checksum, right? Maybe it's good to say that here.

+ BlockNumber b;

blknum will be easier to grep for

+ (errmsg("background worker \"datachecksumsworker\" starting for database oid %d",
=> Should be %u or similar (several of these)

Some questions:

It looks like you rewrite every page, even if it already has correct checksum,
to handle replicas. I wonder if it's possible/reasonable/good to skip pages
with correct checksum when wal_level=minimal ?

It looks like it's not possible to change the checksum delay while a checksum
worker is already running. That may be important to allow: 1) decreased delay
during slow periods; 2) increased delay if the process is significantly done,
but needs to be throttled to avoid disrupting production environment.

Have you collaborated with Julien about this one? His patch adds new GUCs:
/messages/by-id/20200714090808.GA20780@nol
checksum_cost_delay
checksum_cost_page
checksum_cost_limit

Maybe you'd say that Julien's pg_check_relation() should accept parameters
instead of adding GUCs. I think you should be in agreement on that. It'd be
silly if the verification function added three GUCs and allowed adjusting
throttle midcourse, but the checksum writer process didn't use them.

If you used something like that, I guess you'd also want to distinguish
checksum_cost_page_read vs write. Possibly, the GUCs part should be a
preliminary shared patch 0001 that you both used.

--
Justin

#36

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Daniel Gustafsson (#30)

Re: Online checksums patch - once again

On Mon, Jun 22, 2020 at 8:27 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a new version of the online checksums patch which, I hope, address
most of the concerns raised in previous reviews. There has been a fair amount
of fiddling done, so below is a summary of what has been done.

Here are a bunch of comments based on a partial read-through of this
patch. The most serious concerns, around synchronization, are down
toward at the bottom. Sorry this is a bit eclectic as a review, but I
wrote things down as I read through the patch more or less in the
order I ran across them.

Regarding disable_data_checksums(), I disagree with ereport(LOG, ...)
here. If you want to indicate to the caller whether or not a state
change occurred, you could consider returning a Boolean instead of
void. If you want to do it with log messages, I vote for NOTICE, not
LOG. Maybe NOTICE is better, because enable_data_checksums() seems to
want to convey more information that you can represent in a Boolean,
but then it should use NOTICE consistently, not a mix of NOTICE and
LOG.

Formatting needs work for project style: typically no braces around
single statements, "ereport(WHATEVER," should always have a line break
at that point.

+ * cluster, which was not initialized with checksums, this worker will ensure

"which was not initialized with checksums" => "that does not running
with checksums enabled"?

+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and controlfile, no changes are performed
+ * on the data pages or in the catalog.

Comma splice. Either write "controlfile; no" or "controlfile, and no".

My spell-checker complains that controfile, clusterwide, inprogress,
and endstate are not words. I think you should think about inserting
spaces or, in the case of cluster-wide, a dash, unless they are being
used as literals, in which case perhaps those instances should be
quoted. "havent" needs an apostrophe.

+ * DataChecksumsWorker will compile a list of databases which exists at the

which exist

+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum, this will generate

Comma splice. Split into two sentences.

+ * In case checksums have been enabled and later disabled, when re-enabling
+ * pg_class.relhaschecksums will be reset to false before entering inprogress
+ * mode to ensure that all relations are re-processed.

"If checksums are enabled, then disabled, and then re-enabled, every
relation's pg_class.relhaschecksums field will be reset to false
before entering the in-progress mode."

+ * Disabling checksums is done as an immediate operation as it only updates

s/done as //

+ * to pg_class.relhaschecksums is performed as it only tracks state during

is performed -> are necessary

+ * Access to other members can be done without a lock, as while they are
+ * in shared memory, they are never concurrently accessed. When a worker
+ * is running, the launcher is only waiting for that worker to finish.

The way this is written, it sounds like you're saying that concurrent
access might be possible when this structure isn't in shared memory.
But since it's called DatachecksumsWorkerShmemStruct that's not likely
a correct conclusion, so I think it needs rephrasing.

+ if (DatachecksumsWorkerShmem->launcher_started &&
!DatachecksumsWorkerShmem->abort)
+ started = true;

Why not started = a && b instead of started = false; if (a && b) started = true?

+ {
+ LWLockRelease(DatachecksumsWorkerLock);
+ ereport(ERROR,
+ (errmsg("data checksums worker has been aborted")));
+ }

Errors always release LWLocks, so this seems unnecessary. Also, the
error message looks confusing from a user perspective. What does it
mean if I ask you to make me a cheeseburger and you tell me the
cheeseburger has been eaten? I'm asking for a *new* cheeseburger (or
in this case, a new worker).

I wonder why this thing is inventing a brand new way of aborting a
worker, anyway. Why not just keep track of the PID and send it SIGINT
and have it use CHECK_FOR_INTERRUPTS()? That's already sprinkled all
over the code, so it's likely to work better than some brand-new
mechanism that will probably have checks in a lot fewer places.

+ vacuum_delay_point();

Huh? Why?

+ elog(DEBUG2,
+ "background worker \"datachecksumsworker\" starting to process relation %u",
+ relationId);

This and similar messages seem likely they refer needlessly to
internals, e.g. this could be "adding checksums to relation with OID
%u" without needing to reference background workers or
datachecksumworker. It would be even better if we could find a way to
report relation names.

+ * so when the cluster comes back up processing will habe to be resumed.

habe -> have

+ ereport(FATAL,
+ (errmsg("cannot enable checksums without the postmaster process"),
+ errhint("Restart the database and restart the checksumming process
by calling pg_enable_data_checksums().")));

I understand the motivation for this design and it may be the best we
can do, but honestly it kinda sucks. It would be nice if the system
itself figured out whether or not the worker should be running and, if
yes, ran it. Like, if we're in this state when we exit recovery (or
decide to skip recovery), just register the worker then automatically.
Now that could still fail for lack of slots, so I guess to make this
really robust we'd need a way for the registration to get retried,
e.g. autovacuum could try to reregister it periodically, and we could
just blow off the case where autovacuum=off. I don't know. I'd rather
avoid burdening users with an implementation detail if we can get
there, or at least minimize what they need to worry about.

+ snprintf(activity, sizeof(activity) - 1,
+ "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+ pgstat_report_activity(STATE_RUNNING, activity);

So we only know how to run one such worker at a time?

Maybe WaitForAllTransactionsToFinish should advertise something in
pg_stat_activity.

I think you should try to give all of the functions header comments,
or at least all the bigger ones.

+ else if (result == DATACHECKSUMSWORKER_ABORTED)
+ /* Abort flag set, so exit the whole process */
+ return false;

I'd put braces here. And also, why bail out like this instead of
retrying periodically until we succeed?

+ * True if all data pages of the relation have data checksums.

Not fully accurate, right?

+ /*
+ * Force a checkpoint to get everything out to disk. TODO: we probably
+ * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+ * for testing until the patch is fully baked, as it may otherwise make
+ * tests take a lot longer.
+ */
+ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);

Do we need to verify that the checkpoint succeeded before we can
declare victory and officially change state?

+ PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+ PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+ PROCSIGNAL_BARRIER_CHECKSUM_ON

I don't think it's a good idea to have three separate types of barrier
here. I think you should just have a barrier indicating that the state
has changed, and then backends need to reread the state from shared
memory when they absorb the barrier.

But the bigger problem here, and the thing that makes me intensely
doubtful that the synchronization in this patch is actually correct,
is that I can't find any machinery in the patch guarding against
TOCTTOU issues, nor any comments explaining why I shouldn't be afraid
of them. Suppose you got rid of the barriers and just changed all the
places that check LocalDataChecksumVersion to read from a shared
memory value directly instead. Would that be equivalent to what you've
got here, or would it break something? If you can't clearly explain
why that would be broken as compared with what you have, then either
the barriers aren't really necessary (which I doubt) or the
synchronization isn't really right (which I suspect to be true).

In the case of the ALTER SYSTEM READ ONLY patch, this was by far the
hardest part to get right, and I'm still not positive that it's
completely correct, but the basic thing we figured out there is that
you are in big trouble if the system goes read-only AFTER you've
decided to write a WAL record. That is, this is bugged:

if (WALIsProhibited())
ereport(ERROR, errmsg("i'm sorry i can't do that"));
...
CHECK_FOR_INTERRUPTS();
...
START_CRIT_SECTION();
XLogBeginInsert();

If the CHECK_FOR_INTERRUPTS() absorbs a state change, then the
XLogBeginInsert() is going to hit an elog(ERROR) which, because we're
in a critical section, will be promoted to PANIC, which is bad. To
avoid that, the patch introduces a whole hairy system to make sure
that there can never be a CFI after we check whether it's OK to insert
WAL and before we actually do it. That stuff is designed in such a way
that it will make assertion fail even if you're not actually *trying*
to make the system read-only.

So the comparable problem here would be if we decide that we don't
need to set checksums on a page when modifying it, and then we absorb
a barrier that flips the state to in-progress, and then we actually
perform the page modification. Now you have a race condition: the page
was modified without checksums after we'd acknowledged to the process
pushing out the barrier that all of our future page modifications
would set checksums. So, maybe that's not possible here. For instance,
if we never examine the checksum-enabled state outside of a critical
section, then we're fine, because we can't absorb a barrier without
processing an interrupt, and we don't process interrupts in critical
sections. But if that's the case, then it seems to me that it would be
good to insert some cross-checks. Like, suppose we only ever access
the local variable that contains this state through a static inline
function that also asserts that InteruptHoldoffCount > 0 ||
CritSectionCount > 0. Then, if there is a place where we don't
actually follow that rule (only rely on that value within critical
sections) we're pretty likely to trip an assert just running the
regression tests. It's not foolproof, not only because the regression
tests are incomplete but because in theory we could fetch the value in
a crit section and then keep it around and rely on it some more after
we've processed interrupts again, but that seems like a less-likely
thing for somebody to do.

If, on the other hand, there are stretches of code that fetch this
value outside of a crit section and without interrupts held, then we
need some other kind of mechanism here to make it safe. We have to
make sure that not only does the present code not permit race
conditions of the type described above, but that future modifications
are quite unlikely to introduce any. I might be missing something, but
I don't see any kind of checks like this in the patch now. I think
there should be, and the rationale behind them should be written up,
too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#36)

Re: Online checksums patch - once again

On 29 Jul 2020, at 19:58, Robert Haas <robertmhaas@gmail.com> wrote:

Here are a bunch of comments based on a partial read-through of this
patch.

Thanks a lot Robert and Justin for the reviews! With the commitfest wrap-up
imminent and being on vacation I will have a hard time responding properly
before the end of CF so I'm moving it to the next CF for now.

cheers ./daniel

#38

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Justin Pryzby (#35)

Re: Online checksums patch - once again

On 28 Jul 2020, at 04:33, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Thu, Jun 25, 2020 at 11:43:00AM +0200, Daniel Gustafsson wrote:

The attached v19 fixes a few doc issues I had missed.

+ They can also be enabled or disabled at a later timne, either as an offline
=> time

Fixed.

+ * awaiting shutdown, but we can continue turning off checksums anyway
=> a waiting

This was intentional, as it refers to the impending requested shutdown and not
one which is blocked waiting. I've reworded the comment to make this clearer.

+ * We are starting a checksumming process scratch, and need to start by
=> FROM scratch

Fixed.

+ * to inprogress new relations will set relhaschecksums in pg_class so it
=> inprogress COMMA

Fixed.

+ * Relation no longer exist. We don't consider this an error since
=> exists

Fixed.

+ * so when the cluster comes back up processing will habe to be resumed.
=> have

Fixed.

+ "completed one pass over all databases for checksum enabling, %i databases processed",
=> I think this will be confusing to be hardcoded "one". It'll say "one" over
and over.

Good point, I've reworded this based on the number of processed databases to
indicate what will happen next. This call should probably be removed in case
we merge and ship this feature, but it's handy for this part of the patch
process.

+ * still exist.
=> exists

Fixed.

In many places, you refer to "datachecksumsworker" (sums) but in nine places
you refer to datachecksumworker (sum).

Good catch, they should all be using "sums". Fixed.

+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
=> I think looping over numblocks is safe since new blocks are intended to be
written with checksum, right? Maybe it's good to say that here.

Fixed.

+ BlockNumber b;

blknum will be easier to grep for

Fixed.

+ (errmsg("background worker \"datachecksumsworker\" starting for database oid %d",
=> Should be %u or similar (several of these)

Fixed. As per Roberts review downthread, these will be reworded in a future
version.

It looks like you rewrite every page, even if it already has correct checksum,
to handle replicas. I wonder if it's possible/reasonable/good to skip pages
with correct checksum when wal_level=minimal ?

That would AFAICT be possible, but I'm not sure it's worth adding that before
the patch is deemed safe in its simpler form. I've added a comment to record
this as a potential future optimization.

It looks like it's not possible to change the checksum delay while a checksum
worker is already running. That may be important to allow: 1) decreased delay
during slow periods; 2) increased delay if the process is significantly done,
but needs to be throttled to avoid disrupting production environment.

Have you collaborated with Julien about this one? His patch adds new GUCs:
/messages/by-id/20200714090808.GA20780@nol
checksum_cost_delay
checksum_cost_page
checksum_cost_limit

I honestly hadn't thought about that, but I very much agree that any controls
introduced should work the same for both of these patches.

Maybe you'd say that Julien's pg_check_relation() should accept parameters
instead of adding GUCs. I think you should be in agreement on that. It'd be
silly if the verification function added three GUCs and allowed adjusting
throttle midcourse, but the checksum writer process didn't use them.

Agreed. I'm not a fan of using yet more GUCs for controlling this, but I don't
a good argument against it. It's in line with the cost-based vacuum delays so
I guess it's the most appropriate interface.

If you used something like that, I guess you'd also want to distinguish
checksum_cost_page_read vs write. Possibly, the GUCs part should be a
preliminary shared patch 0001 that you both used.

+1.

Thanks for the review! I will attach a v20 to Robers email with these changes
included as well.

cheers ./daniel

#39

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Robert Haas (#36)

1 attachment(s)

Re: Online checksums patch - once again

On 29 Jul 2020, at 19:58, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 22, 2020 at 8:27 AM Daniel Gustafsson <daniel@yesql.se> wrote:

Attached is a new version of the online checksums patch which, I hope, address
most of the concerns raised in previous reviews. There has been a fair amount
of fiddling done, so below is a summary of what has been done.

Here are a bunch of comments based on a partial read-through of this
patch. The most serious concerns, around synchronization, are down
toward at the bottom. Sorry this is a bit eclectic as a review, but I
wrote things down as I read through the patch more or less in the
order I ran across them.

Not need to apologize, many thanks for the review! This is a partial response,
since I need to spend a bit more time to properly respond to the
synchronization questions, but I also wanted to submit a new version which
applies in the CF patch tester. Anything not addressed here will be be in a
follow-up version.

The attached v20 contains fixes from this review as well as Justin's review
upthread.

Regarding disable_data_checksums(), I disagree with ereport(LOG, ...)
here. If you want to indicate to the caller whether or not a state
change occurred, you could consider returning a Boolean instead of
void. If you want to do it with log messages, I vote for NOTICE, not
LOG. Maybe NOTICE is better, because enable_data_checksums() seems to
want to convey more information that you can represent in a Boolean,
but then it should use NOTICE consistently, not a mix of NOTICE and
LOG.

I agree with this, I've moved to returning a bool rather than ereporting NOTICE
(or LOG).

Formatting needs work for project style: typically no braces around
single statements, "ereport(WHATEVER," should always have a line break
at that point.

I think I've fixed all these instances, and the attached patch have been run
through pgindent as well.

+ * cluster, which was not initialized with checksums, this worker will ensure

"which was not initialized with checksums" => "that does not running
with checksums enabled"?

Fixed, but it's "run" rather than "running" right?

+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and controlfile, no changes are performed
+ * on the data pages or in the catalog.

Comma splice. Either write "controlfile; no" or "controlfile, and no".

Fixed.

My spell-checker complains that controfile, clusterwide, inprogress,
and endstate are not words. I think you should think about inserting
spaces or, in the case of cluster-wide, a dash, unless they are being
used as literals, in which case perhaps those instances should be
quoted. "havent" needs an apostrophe.

This is me writing Swedish in English. Fixed.

+ * DataChecksumsWorker will compile a list of databases which exists at the

which exist

Fixed.

+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum, this will generate

Comma splice. Split into two sentences.

Fixed.

+ * In case checksums have been enabled and later disabled, when re-enabling
+ * pg_class.relhaschecksums will be reset to false before entering inprogress
+ * mode to ensure that all relations are re-processed.
"If checksums are enabled, then disabled, and then re-enabled, every
relation's pg_class.relhaschecksums field will be reset to false
before entering the in-progress mode."

Replaced with your version, thanks.

+ * Disabling checksums is done as an immediate operation as it only updates

s/done as //

Fixed.

+ * to pg_class.relhaschecksums is performed as it only tracks state during

is performed -> are necessary

Fixed.

+ * Access to other members can be done without a lock, as while they are
+ * in shared memory, they are never concurrently accessed. When a worker
+ * is running, the launcher is only waiting for that worker to finish.
The way this is written, it sounds like you're saying that concurrent
access might be possible when this structure isn't in shared memory.
But since it's called DatachecksumsWorkerShmemStruct that's not likely
a correct conclusion, so I think it needs rephrasing.

Right, that was a pretty poorly worded comment. Rewritten and expanded upon.

+ if (DatachecksumsWorkerShmem->launcher_started &&
!DatachecksumsWorkerShmem->abort)
+ started = true;
Why not started = a && b instead of started = false; if (a && b) started = true?

I don't have strong feelings either format, so changed according to your suggestion.

+ {
+ LWLockRelease(DatachecksumsWorkerLock);
+ ereport(ERROR,
+ (errmsg("data checksums worker has been aborted")));
+ }
Errors always release LWLocks, so this seems unnecessary. Also, the
error message looks confusing from a user perspective. What does it
mean if I ask you to make me a cheeseburger and you tell me the
cheeseburger has been eaten? I'm asking for a *new* cheeseburger (or
in this case, a new worker).

Now you made me hungry for a green chili cheeseburger..

This case covers when the user disables a running datachecksumsworker and
then enables it again before the worker has finished the current page and thus
observed the abort request.

If the worker learns to distinguish between a user abort request and and an
internal cancellation (due to WL_POSTMASTER_DEATH) this window could be handled
by clearing the user request and keep going. It would be the same thing as
killing the worker and restarting, except fewer moving parts. Either way, I
agree that it's a confusing error path, and one which should be addressed.

I wonder why this thing is inventing a brand new way of aborting a
worker, anyway. Why not just keep track of the PID and send it SIGINT
and have it use CHECK_FOR_INTERRUPTS()? That's already sprinkled all
over the code, so it's likely to work better than some brand-new
mechanism that will probably have checks in a lot fewer places.

I'm not convinced that the current coding is less responsive, and signalling
for launcher/worker isn't entirely straightforward, but I agree that it's
better to stick to established patterns. Will rewrite to use pqsignal/SIGINT
and will address the previous paragraph in that as well.

+ vacuum_delay_point();

Huh? Why?

The datachecksumsworker is using the same machinery for throttling as the
cost-based vacuum delay, hence this call. Do you object to using that same
machinery, or the implementation and/or documentation of it?

+ elog(DEBUG2,
+ "background worker \"datachecksumsworker\" starting to process relation %u",
+ relationId);
This and similar messages seem likely they refer needlessly to
internals, e.g. this could be "adding checksums to relation with OID
%u" without needing to reference background workers or
datachecksumworker.

I have reworded these as well as removed a few that seemed a bit uninteresting.

It would be even better if we could find a way to
report relation names.

True, but doesn't really seem worth the overhead for a debug log.

+ * so when the cluster comes back up processing will habe to be resumed.

habe -> have

Fixed (also noted by Justin upthread).

+ ereport(FATAL,
+ (errmsg("cannot enable checksums without the postmaster process"),
+ errhint("Restart the database and restart the checksumming process
by calling pg_enable_data_checksums().")));
I understand the motivation for this design and it may be the best we
can do, but honestly it kinda sucks. It would be nice if the system
itself figured out whether or not the worker should be running and, if
yes, ran it. Like, if we're in this state when we exit recovery (or
decide to skip recovery), just register the worker then automatically.
Now that could still fail for lack of slots, so I guess to make this
really robust we'd need a way for the registration to get retried,
e.g. autovacuum could try to reregister it periodically, and we could
just blow off the case where autovacuum=off. I don't know. I'd rather
avoid burdening users with an implementation detail if we can get
there, or at least minimize what they need to worry about.

I don't disagree with you, but I don't see how an automatic restart could be
made safe/good enough to be worth the complexity, since it's nigh impossible to
make it always Just Work. Exposing implementation details to users is clearly
not a good design choice, if it can be avoided.

+ snprintf(activity, sizeof(activity) - 1,
+ "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+ pgstat_report_activity(STATE_RUNNING, activity);

So we only know how to run one such worker at a time?

Correct, there is currently one worker at a time.

Maybe WaitForAllTransactionsToFinish should advertise something in
pg_stat_activity.

It currently does this:

snprintf(activity,
sizeof(activity),
"Waiting for current transactions to finish (waiting for %u)",
waitforxid);
pgstat_report_activity(STATE_RUNNING, activity);

Did you have anything else in mind?

I think you should try to give all of the functions header comments,
or at least all the bigger ones.

I've done a first pass over the patch.

+ else if (result == DATACHECKSUMSWORKER_ABORTED)
+ /* Abort flag set, so exit the whole process */
+ return false;

I'd put braces here.

Fixed.

And also, why bail out like this instead of
retrying periodically until we succeed?

Currently, the abort request can come from the user disabling data checksums
during processing, postmaster dying or SIGINT. Neither of these cases qualify
for retrying in the current design.

+ * True if all data pages of the relation have data checksums.

Not fully accurate, right?

Ugh.. thats a leftover from previous hacking that I've since ripped out.
Sorry about that, it's been removed.

+ /*
+ * Force a checkpoint to get everything out to disk. TODO: we probably
+ * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+ * for testing until the patch is fully baked, as it may otherwise make
+ * tests take a lot longer.
+ */
+ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);

Do we need to verify that the checkpoint succeeded before we can
declare victory and officially change state?

I don't think we need a verification step here. With CHECKPOINT_WAIT we are
blocking until the checkpoint has completed. If it fails we won't enable data
checksums until the process has been restarted and a subsequent checkpoint
succeeded.

+ PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+ PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+ PROCSIGNAL_BARRIER_CHECKSUM_ON
I don't think it's a good idea to have three separate types of barrier
here. I think you should just have a barrier indicating that the state
has changed, and then backends need to reread the state from shared
memory when they absorb the barrier.

I'm not sure I follow why, my understanding of the infrastructure was to make
it finegrained like this. But, you have clearly spent more time thinking about
this functionality so I'm curious to learn. Can you elaborate on your
thinking?

But the bigger problem here, and the thing that makes me intensely
doubtful that the synchronization in this patch is actually correct,
is that I can't find any machinery in the patch guarding against
TOCTTOU issues, nor any comments explaining why I shouldn't be afraid
of them.

<snip>

I unfortunately haven't had time to read the READ ONLY patch so I can't comment
on how these two patches do things in relation to each other.

The main synchronization mechanisms are the use of the inprogress mode where
data checksums are written but not verified, and by waiting for all
pre-existing non-compatible processes (transactions, temp tables) to disappear
before enabling.

That being handwavily said, I've started to write down a matrix with classes of
possible synchronization bugs and how the patch handles them in order to
properly respond.

cheers ./daniel

Attachments:

online_checksums20.patchapplication/octet-stream; name=online_checksums20.patch; x-unix-mode=0644Download

From 609b02f6f439e631ba5f3dffe0a574fc36d00ec6 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 1 Sep 2020 15:45:38 +0200
Subject: [PATCH] Support checksum enable/disable in running cluster v20

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A new value "inprogress" is added for data_checksums during which
writes will set the checksum but reads wont enforce it. When all pages
have been checksummed the value will change to "on" which will enforce
the checksums on read. At this point, the cluster has the same state
as if checksums were enabled via initdb.

Checksums are being added via a background worker ChecksumHelper which
will process all pages in all databases. Pages accessed via concurrent
write operations will be checksummed with the normal process.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/rmgrdesc/xlogdesc.c       |   16 +
 src/backend/access/transam/xlog.c            |  173 ++-
 src/backend/access/transam/xlogfuncs.c       |   86 ++
 src/backend/catalog/heap.c                   |    1 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1266 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    2 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/ipc/ipci.c               |    2 +
 src/backend/storage/ipc/procsignal.c         |   37 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |    6 +-
 src/backend/utils/adt/pgstatfuncs.c          |    4 +-
 src/backend/utils/cache/relcache.c           |    4 +
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   36 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   15 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   42 +
 src/include/storage/bufpage.h                |    2 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |    9 +-
 src/test/Makefile                            |    3 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   86 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |   96 ++
 46 files changed, 2250 insertions(+), 48 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1d1b8ce8fb..b02cb1ae9f 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2162,6 +2162,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2efd80baa4..71d1a90fe1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25120,6 +25120,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster with immediate effect.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..f2de982626 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..fcfad06900 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,18 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +198,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..b8e5e55c0b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -251,6 +252,11 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -892,6 +898,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1077,7 +1084,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4888,9 +4895,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4927,10 +4932,116 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsNeedVerify(void)
+{
+	/*
+	 * Only verify checksums if they are fully enabled in the cluster. In
+	 * "inprogress" state they are only updated, not verified.
+	 */
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	Assert(ControlFile != NULL);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress\" mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	ControlFile->data_checksum_version = 0;
+	XlogChecksums(0);
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress";
+	else
+		return "off";
 }
 
 /*
@@ -7916,6 +8027,18 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9773,6 +9896,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10228,6 +10369,26 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..dc88c6409e 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,88 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * If we don't need to write new checksums, then clearly they are already
+	 * disabled.
+	 */
+	if (!DataChecksumsNeedWrite())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * Shutting down a concurrently running datachecksumsworker will not block
+	 * until the worker shuts down and exits, but we can continue turning off
+	 * checksums anyway since it will at most finish the block it had already
+	 * started and then abort.
+	 */
+	ShutdownDatachecksumsWorkerIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	if (DataChecksumsWorkerStarted())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * Data checksums on -> on is not a valid state transition as there is
+	 * nothing to do.
+	 */
+	if (DataChecksumsNeedVerify())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * If the state is set to "inprogress" but the worker isn't running, then
+	 * the data checksumming was prematurely terminated. Attempt to continue
+	 * processing data pages where we left off based on state stored in the
+	 * catalog.
+	 */
+	if (DataChecksumsOnInProgress())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksums partly enabled, continuing processing")));
+
+		StartDatachecksumsWorkerLauncher(ENABLE_CHECKSUMS, cost_delay, cost_limit);
+	}
+
+	/*
+	 * We are starting a checksumming process from scratch, and need to start
+	 * by clearing the state in pg_class in case checksums have ever been
+	 * enabled before (either fully or partly). As soon as we set the checksum
+	 * state to "inprogress", new relations will set relhaschecksums in
+	 * pg_class so it must be done first.
+	 */
+	else
+		StartDatachecksumsWorkerLauncher(RESET_STATE, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index abd5bdb866..bd94d8cfc3 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -956,6 +956,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relispopulated - 1] = BoolGetDatum(rd_rel->relispopulated);
 	values[Anum_pg_class_relreplident - 1] = CharGetDatum(rd_rel->relreplident);
 	values[Anum_pg_class_relispartition - 1] = BoolGetDatum(rd_rel->relispartition);
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	values[Anum_pg_class_relrewrite - 1] = ObjectIdGetDatum(rd_rel->relrewrite);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
 	values[Anum_pg_class_relminmxid - 1] = MultiXactIdGetDatum(rd_rel->relminmxid);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a2d61302f9..171ab83aeb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1225,6 +1225,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index beb5e85434..cf4a33eebe 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..73ce6c8b39
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1266 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checkums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checkums in an online cluster, data_checksums will be set to
+ * "inprogress" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * "If checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * Disabling checksums is an immediate operation as it only updates the control
+ * file and accompanying local state in the backends. No changes to
+ * pg_class.relhaschecksums are necessary as it only tracks state during
+ * enabling.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	DataChecksumOperation operation;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			DatachecksumsWorkerRelation;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase * db);
+static bool ProcessAllDatabases(bool already_connected);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ * 		Main entry point for datachecksumsworker launcher process.
+ */
+void
+StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+								 int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * This can be hit during a short window during which the worker is
+	 * shutting down. Once done the worker will clear the abort flag and
+	 * re-processing can be performed.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("data checksums worker has been aborted")));
+	}
+
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(NOTICE,
+				(errmsg("data checksums worker is already running")));
+		return;
+	}
+
+	/* Whether to enable or disable data checksums */
+	DatachecksumsWorkerShmem->operation = op;
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need checksums, and thus return true.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerMain");
+	else if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ResetDataChecksumsStateInDatabase");
+	else
+		elog(ERROR, "invalid datachecksumsworker operation requested: %d",
+			 DatachecksumsWorkerShmem->operation);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we wont be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	if (DatachecksumsWorkerShmem->operation == RESET_STATE)
+	{
+		if (!ProcessAllDatabases(connected))
+		{
+			/*
+			 * Before we error out make sure we clear state since this may
+			 * otherwise render the worker stuck without possibility of a
+			 * restart.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->launcher_started = false;
+			DatachecksumsWorkerShmem->abort = false;
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to finish processing")));
+		}
+
+		connected = true;
+		SetDataChecksumsOnInProgress();
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->operation = ENABLE_CHECKSUMS;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	/*
+	 * Prepare for datachecksumsworker shutdown, once we signal that checksums
+	 * are enabled we want the worker to be done and exited to avoid races
+	 * with immediate disabling/enabling.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * If processing succeeds for ENABLE_CHECKSUMS, then everything has been
+	 * processed so set checksums as enabled cluster-wide
+	 */
+	if (ProcessAllDatabases(connected))
+	{
+		SetDataChecksumsOn();
+		ereport(LOG,
+				(errmsg("checksums enabled cluster-wide")));
+	}
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+		/*
+		 * Even if this is a redundant assignment, we want to be explicit
+		 * about our intent for readability, since we want to be able to query
+		 * this state in case of restartability.
+		 */
+		DatachecksumsWorkerShmem->launcher_started = false;
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned. Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		DatachecksumsWorkerRelation *relentry;
+
+		if (!RELKIND_HAS_STORAGE(pgc->relkind) ||
+			pgc->relpersistence == RELPERSISTENCE_TEMP)
+			continue;
+
+		if (pgc->relhaschecksums)
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (DatachecksumsWorkerRelation *) palloc(sizeof(DatachecksumsWorkerRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in the current database
+ *
+ * Contrary to BuildRelationList this function only returns a list of oids,
+ * since the relkind is already known.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != RELPERSISTENCE_TEMP)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		DatachecksumsWorkerRelation *rel = (DatachecksumsWorkerRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we wont be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8116b23614..cd76de125a 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3770,6 +3770,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6064384e32..28e77eedea 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1595,7 +1595,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..f4dffad925 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..b1713cf751 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -259,6 +260,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 4fa385b0ec..48f2352f03 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,10 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +499,12 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +517,21 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index d708117a40..4c6deaae8b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1167,7 +1167,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1194,7 +1194,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..4f31c1dce5 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,7 +1565,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1583,7 +1583,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..379c78d82d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1876,6 +1876,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3484,6 +3486,8 @@ RelationBuildLocalRelation(const char *relname,
 	else
 		rel->rd_rel->relispopulated = true;
 
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/* set replica identity -- system catalogs and non-tables don't have one */
 	if (!IsCatalogNamespace(relnamespace) &&
 		(relkind == RELKIND_RELATION ||
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index cf8f9579c3..04bf0836b7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -249,6 +249,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7e23..e5674f4e4f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index de87ad6ef7..68f6ab11b6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,16 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +619,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1910,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4785,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index ffdc23945c..6a5a596f46 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..586bc70a70 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8b90cefbe0..a806cc6d0e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..c169ef90ee 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -199,7 +199,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +318,18 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 679eec3443..6ecec47f54 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1dd325e0e6..4e525b2c8f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10890,6 +10890,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 807a9c1edf..98d883f4f9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -852,6 +852,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..3ed9f193f1
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 0,
+	RESET_STATE
+}			DataChecksumOperation;
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 51b8f994ac..828ba1a437 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,8 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..05f85861e3 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,9 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..9dbb660937
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d908b95561
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..a5ebe6cd04
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,96 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress" or on
+# Normally it would be "inprogress", but it is theoretically possible for the primary
+# to complete the checksum enabling *and* have the standby replay that record before
+# we reach the check below.
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#40

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Daniel Gustafsson (#39)

Re: Online checksums patch - once again

On Wed, Sep 02, 2020 at 02:22:25PM +0200, Daniel Gustafsson wrote:

I unfortunately haven't had time to read the READ ONLY patch so I can't comment
on how these two patches do things in relation to each other.

The main synchronization mechanisms are the use of the inprogress mode where
data checksums are written but not verified, and by waiting for all
pre-existing non-compatible processes (transactions, temp tables) to disappear
before enabling.

The CF bot is complaining on this one with a TAP test failure:
https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/724717901

t/003_standby_checksum.pl .. 1/10
# Failed test 'ensure checksums are on or in progress on standby_1'
# at t/003_standby_checksum.pl line 59.
# 'off'
# ~~
# 'ARRAY(0x1d38c10)'
# Looks like you failed 1 test of 10.
t/003_standby_checksum.pl .. Dubious, test returned 1 (wstat 256,
0x100)
Failed 1/10 subtests

Daniel, could you look at that?
--
Michael

#41

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Daniel Gustafsson (#39)

1 attachment(s)

Re: Online checksums patch - once again

On 2 Sep 2020, at 14:22, Daniel Gustafsson <daniel@yesql.se> wrote:

The main synchronization mechanisms are the use of the inprogress mode where
data checksums are written but not verified, and by waiting for all
pre-existing non-compatible processes (transactions, temp tables) to disappear
before enabling.

That being handwavily said, I've started to write down a matrix with classes of
possible synchronization bugs and how the patch handles them in order to
properly respond.

Having spent some more time on this, I believe I have a better answer (and
patch version) to give.

First, a thank you for asking insightful questions. While working through the
cases I realized that the previous version has a problematic window: when
disabling checksums, if backend A absorbs the data checksums "off" barrier (or
starts after the controlfile has been changed) and writes a page without a
checksum, while backend B has yet to absorb the barrier and is still in "on"
and reads that very same page. The solution IMO is to introduce an inprogress
state for disabling as well where all backends keep writing checksums but not
validating them until no backend is in "on" state anymore. Once all backends
are in "inprogress-off", they can stop writing checksums and transition to
"off". I even had this state in a previous unsubmitted version, embarrassingly
so to the point of the function prototype being there as a leftover in v19.

Now, synchronization happens on two levels in this patch for both the enable
and disable case : (i) between backends when transitioning between states and
(ii) inside the worker when synchronizing the current backends.

For (i) it's using "inprogress" states to ensure a safe transition to "on" or
"off". The states are themselves transitioned to via procsignalbarriers. Both
enabling and disabling follow the same logic, with the only difference being
the order in which operations are switched on/off during inprogress. For (ii)
the workers are waiting for incompatible concurrent processing to end, such as
temporary tables etc (only affects enabling data checksums).

I've tried to write down the synchronization steps in datachecksumsworker.c to
document the code, and for ease of discussion I've pasted that part of the diff
below as well:

* Synchronization and Correctness
* -------------------------------
* The processes involved in enabling, or disabling, data checksums in an
* online cluster must be properly synchronized with the normal backends
* serving concurrent queries to ensure correctness. Correctness is defined
* as the following:
*
* - Backends SHALL NOT violate local datachecksum state
* - Data checksums SHALL NOT be considered enabled cluster-wide until all
* currently connected backends have the local state "enabled"
*
* There are two levels of synchronization required for enabling data checksums
* in an online cluster: (i) changing state in the active backends ("on",
* "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
* incompatible objects and processes are left in a database when workers end.
* The former deals with cluster-wide agreement on data checksum state and the
* latter with ensuring that any concurrent activity cannot break the data
* checksum contract during processing.
*
* Synchronizing the state change is done with procsignal barriers, where the
* backend updating the global state in the controlfile will wait for all other
* backends to absorb the barrier before WAL logging. Barrier absorption will
* happen during interrupt processing, which means that connected backends will
* change state at different times.
*
* When Enabling Data Checksums
* ----------------------------
* A process which fails to observe data checksums being enabled can induce
* two types of errors: failing to write the checksum when modifying the page
* and failing to validate the data checksum on the page when reading it.
*
* When the DataChecksumsWorker has finished writing checksums on all pages
* and enable data checksums cluster-wide, there are three sets of backends:
*
* Bg: Backend updating the global state and emitting the procsignalbarrier
* Bd: Backends on "off" state
* Be: Backends in "on" state
* Bi: Backends in "inprogress-on" state
*
* Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
*
* Backends in Bi and Be will write checksums when modifying a page, but only
* backends in Be will verify the checksum during reading. The Bg backend is
* blocked waiting for all backends in Bi to process interrupts and move to
* Be. Any backend starting will observe the global state being "on" and will
* thus automatically belong to Be. Checksums are enabled cluster-wide when
* Bi is an empty set. All sets are compatible while still operating based on
* their local state.
*
* When Disabling Data Checksums
* -----------------------------
* A process which fails to observe data checksums being disabled can induce
* two types of errors: writing the checksum when modifying the page and
* validating a data checksum which is no longer correct due to modifications
* to the page.
*
* Bg: Backend updating the global state and emitting the procsignalbarrier
* Bd: Backands in "off" state
* Be: Backends in "on" state
* Bi: Backends in "inprogress-off" state
*
* Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
*
* The goal is to transition all backends to Bd making the others empty sets.
* Backends in Bi writes data checksums, but don't validate them, such that
* backends still in Be can continue to validate pages until the barrier has
* been absorbed such that they are in Bi. Once all backends are in Bi, the
* barrier to transition to "off" can be raised and all backends can safely
* stop writing data checksums as no backend is enforcing data checksum
* validation.

I hope this clarifies the reasoning behind the implementation.

This has been implemented in the attached v21 patch. "inprogress" was in need
of a new name before, and with "inprogress-{on|off}" the need is even bigger.
Suggestions for better names are highly appreciated, I'm drawing blanks here.

There are some minor fixes and documentation touchups in this version as well,
but not the SIGINT handling since I wanted to focus on one thing at a time.

cheers ./daniel

Attachments:

online_checksums21.patchapplication/octet-stream; name=online_checksums21.patch; x-unix-mode=0644Download

From 087f887071ce49ba9e22ec286e0c65eabe2dc0d4 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 9 Sep 2020 16:01:58 +0200
Subject: [PATCH] Support checksum enable/disable in running cluster v21

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A new value "inprogress" is added for data_checksums during which
writes will set the checksum but reads wont enforce it. When all pages
have been checksummed the value will change to "on" which will enforce
the checksums on read. At this point, the cluster has the same state
as if checksums were enabled via initdb.

Checksums are being added via a background worker ChecksumHelper which
will process all pages in all databases. Pages accessed via concurrent
write operations will be checksummed with the normal process.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  273 +++-
 src/backend/access/transam/xlogfuncs.c       |   79 +
 src/backend/catalog/heap.c                   |    1 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1356 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    2 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/ipc/ipci.c               |    2 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |    6 +-
 src/backend/utils/adt/pgstatfuncs.c          |    4 +-
 src/backend/utils/cache/relcache.c           |    4 +
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   16 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   43 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    3 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   86 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |  102 ++
 46 files changed, 2457 insertions(+), 49 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 508bea3bc6..e30eda4e8e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e2e618791e..f49bde418a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25152,6 +25152,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..a9d8bd631f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4f61107a6a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..b8a801d155 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -251,6 +253,15 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing. Thus, it can be read by backends without the need for a lock.
+ * Possible values are the checksum versions defined in storage/bufpage.h and
+ * zero for when checksums are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -892,6 +903,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1077,7 +1089,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4888,9 +4900,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4924,13 +4934,198 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most callsites shouldn't need to worry about the "inprogress" states, since
+ * they should check the requirement for verification or writing. Some low-
+ * level callsites dealing with page writes need however to know. It's also
+ * used to check for aborted checksum processing which need to be restarted.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" wont get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for mvoving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also stop
+		 * writing checksums.
+		 */
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	}
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	StartDatachecksumsWorkerLauncher(RESET_STATE, 0, 0);
+
+	XlogChecksums(0);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7916,6 +8111,30 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown,
+	 * we know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9773,6 +9992,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10228,6 +10465,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..7ea91135ba 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,81 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Shutting down a concurrently running datachecksumsworker will not block
+	 * until the worker shuts down and exits, but we can continue turning off
+	 * checksums anyway since it will at most finish the block it had already
+	 * started and then abort.
+	 */
+	ShutdownDatachecksumsWorkerIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	if (DataChecksumsWorkerStarted())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * Data checksums on -> on is not a valid state transition as there is
+	 * nothing to do.
+	 */
+	if (DataChecksumsNeedVerify())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * If the state is set to "inprogress-on" but the worker isn't running,
+	 * then the data checksumming was prematurely terminated. Attempt to
+	 * continue processing data pages where we left off based on state stored
+	 * in the catalog.
+	 */
+	if (DataChecksumsOnInProgress())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksums partly enabled, continuing processing")));
+
+		StartDatachecksumsWorkerLauncher(ENABLE_CHECKSUMS, cost_delay, cost_limit);
+	}
+
+	/*
+	 * We are starting a checksumming process from scratch, and need to start
+	 * by clearing the state in pg_class in case checksums have ever been
+	 * enabled before (either fully or partly). As soon as we set the checksum
+	 * state to "inprogress-on", new relations will set relhaschecksums in
+	 * pg_class so it must be done first.
+	 */
+	else
+		StartDatachecksumsWorkerLauncher(RESET_STATE_AND_ENABLE_CHECKSUMS, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 67144aa3c9..dcd95c8acc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -950,6 +950,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relispopulated - 1] = BoolGetDatum(rd_rel->relispopulated);
 	values[Anum_pg_class_relreplident - 1] = CharGetDatum(rd_rel->relreplident);
 	values[Anum_pg_class_relispartition - 1] = BoolGetDatum(rd_rel->relispartition);
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	values[Anum_pg_class_relrewrite - 1] = ObjectIdGetDatum(rd_rel->relrewrite);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
 	values[Anum_pg_class_relminmxid - 1] = MultiXactIdGetDatum(rd_rel->relminmxid);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..4dade0c116 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1225,6 +1225,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index d043ced686..9f63c10fca 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..fce2cebb3f
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1356 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checkums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checkums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set of "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	DataChecksumOperation operation;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerRelation
+{
+	Oid			reloid;
+	char		relkind;
+}			DatachecksumsWorkerRelation;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool include_shared);
+static List *BuildTempTableList(void);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase * db);
+static bool ProcessAllDatabases(bool already_connected);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ * 		Main entry point for datachecksumsworker launcher process.
+ */
+void
+StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+								 int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * This can be hit during a short window during which the worker is
+	 * shutting down. Once done the worker will clear the abort flag and
+	 * re-processing can be performed.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("data checksums worker has been aborted")));
+	}
+
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(NOTICE,
+				(errmsg("data checksums worker is already running")));
+		return;
+	}
+
+	/* Whether to enable or disable data checksums */
+	DatachecksumsWorkerShmem->operation = op;
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need checksums, and thus return true.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerMain");
+	else if (DatachecksumsWorkerShmem->operation == RESET_STATE ||
+			 DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ResetDataChecksumsStateInDatabase");
+	else
+		elog(ERROR, "invalid datachecksumsworker operation requested: %d",
+			 DatachecksumsWorkerShmem->operation);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we wont be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	/*
+	 * Reset catalog state for checksum tracking, either to ensure that it's
+	 * cleared before enabling checksums or as part of disabling checksums.
+	 */
+	if (DatachecksumsWorkerShmem->operation == RESET_STATE ||
+		DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+	{
+		if (!ProcessAllDatabases(connected))
+		{
+			/*
+			 * Before we error out make sure we clear state since this may
+			 * otherwise render the worker stuck without possibility of a
+			 * restart.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->launcher_started = false;
+			DatachecksumsWorkerShmem->abort = false;
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to finish processing")));
+		}
+
+		connected = true;
+
+		/*
+		 * If checksums should be enabled as the next step, transition to the
+		 * ENABLE_CHECKSUMS state to keep processing in the next stage.
+		 */
+		if (DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+		{
+			SetDataChecksumsOnInProgress();
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->operation = ENABLE_CHECKSUMS;
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+	}
+
+	/*
+	 * Prepare for datachecksumsworker shutdown, once we signal that checksums
+	 * are enabled we want the worker to be done and exited to avoid races
+	 * with immediate disabling/enabling.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+	{
+		/*
+		 * If processing succeeds for ENABLE_CHECKSUMS, then everything has been
+		 * processed so set checksums as enabled cluster-wide
+		 */
+		if (ProcessAllDatabases(connected))
+		{
+			SetDataChecksumsOn();
+			ereport(LOG,
+					(errmsg("checksums enabled cluster-wide")));
+		}
+	}
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+		/*
+		 * Even if this is a redundant assignment, we want to be explicit
+		 * about our intent for readability, since we want to be able to query
+		 * this state in case of restartability.
+		 */
+		DatachecksumsWorkerShmem->launcher_started = false;
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of all relations in the database
+ *
+ * If shared is true, both shared relations and local ones are returned, else
+ * all non-shared relations are returned. Temp tables are not included.
+ */
+static List *
+BuildRelationList(bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+		DatachecksumsWorkerRelation *relentry;
+
+		if (!RELKIND_HAS_STORAGE(pgc->relkind) ||
+			pgc->relpersistence == RELPERSISTENCE_TEMP)
+			continue;
+
+		if (pgc->relhaschecksums)
+			continue;
+
+		if (pgc->relisshared && !include_shared)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		relentry = (DatachecksumsWorkerRelation *) palloc(sizeof(DatachecksumsWorkerRelation));
+
+		relentry->reloid = pgc->oid;
+		relentry->relkind = pgc->relkind;
+
+		RelationList = lappend(RelationList, relentry);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * BuildTempTableList
+ *		Compile a list of all temporary tables in the current database
+ *
+ * Contrary to BuildRelationList this function only returns a list of oids,
+ * since the relkind is already known.
+ */
+static List *
+BuildTempTableList(void)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		if (pgc->relpersistence != RELPERSISTENCE_TEMP)
+			continue;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildTempTableList();
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		DatachecksumsWorkerRelation *rel = (DatachecksumsWorkerRelation *) lfirst(lc);
+
+		if (!ProcessSingleRelationByOid(rel->reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free_deep(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildTempTableList();
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we wont be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 5f4b168fd1..9443a041e2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3770,6 +3770,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6064384e32..28e77eedea 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1595,7 +1595,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..f4dffad925 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..b1713cf751 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -259,6 +260,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index ffe67acea1..c5331a68ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index d708117a40..4c6deaae8b 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1167,7 +1167,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1194,7 +1194,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..4f31c1dce5 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,7 +1565,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1583,7 +1583,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 96ecad02dd..379c78d82d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1876,6 +1876,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3484,6 +3486,8 @@ RelationBuildLocalRelation(const char *relname,
 	else
 		rel->rd_rel->relispopulated = true;
 
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/* set replica identity -- system catalogs and non-tables don't have one */
 	if (!IsCatalogNamespace(relnamespace) &&
 		(relkind == RELKIND_RELATION ||
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index cf8f9579c3..04bf0836b7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -249,6 +249,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7e23..e5674f4e4f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index de87ad6ef7..7f55bb71fd 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1911,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4786,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index ffdc23945c..6a5a596f46 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..586bc70a70 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8b90cefbe0..a806cc6d0e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..bfe718195f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -199,7 +199,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +318,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 679eec3443..6ecec47f54 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba92..022c6bb36a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10898,6 +10898,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..1ecbe856e6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -852,6 +852,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..df153f6b01
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 0,
+	RESET_STATE_AND_ENABLE_CHECKSUMS,
+	RESET_STATE
+}			DataChecksumOperation;
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 51b8f994ac..0469df495e 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..37cd0abbd6 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..9dbb660937
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d10bd5c5c5
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..eb2bd515b0
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,102 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#42

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Paquier (#40)

Re: Online checksums patch - once again

On 7 Sep 2020, at 09:17, Michael Paquier <michael@paquier.xyz> wrote:

Daniel, could you look at that?

I believe this boils down to a timing issue, I've included a fix in the v21
patch attached to a previous mail upthread.

cheers ./daniel

#43

Justin Pryzby

pryzby@telsasoft.com

over 5 years ago

In reply to: Daniel Gustafsson (#39)

Re: Online checksums patch - once again

+ * changed to "inprogress-off", the barrier for mvoving to "off" can be
moving

+ * When disabling checksums, data_checksums will be set of "inprogress-off"
set *to*

+ get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),

I think this palloc()s a new copy of the namespace every 100 blocks.
Better do it outside the loop.

+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},

enabling / disabling ?

+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;

Should this be an bitmask, maybe
DATA_CHECKSUMS_WRITE = 1
DATA_CHECKSUMS_VERIFY = 2,

It occured to me that you could rephrase this patch as "Make checksum state
per-relation rather than cluster granularity". That's currently an
implementation detail, but could be exposed as a feature. That could be a
preliminary 0001 patch. Half the existing patch would be 0002 "Allow
online enabling checksums for a given relation/database/cluster". You might
save some of the existing effort of synchronize the cluster-wide checksum
state, since it doesn't need to be synchronized. The "data_checksums" GUC
might be removed, or changed to an enum: on/off/per_relation. Moving from
"per_relation" to "on" would be an optional metadata-only change, allowed only
when all rels in all dbs are checksumed. I'm not sure if you'd even care about
temp tables, since "relhaschecksum" would be authoritative, rather than a
secondary bit only used during processing.

XLogHintBitIsNeeded() and DataChecksumsEnabled() would need to check
relhaschecksum, which tentatively seems possible.

I'm not sure if it's possible, but maybe pg_checksums would be able to skip
rels which had already been checksummed "online" (with an option to force
reprocessing).

Maybe some people would want (no) checksums on specific tables, and that could
eventually be implemented as 0003: "ALTER TABLE SET checksums=". I'm thinking
of that being used immediately after an CREATE, but I suppose ON would cause
the backend to rewrite the table with checksums synchronously (not in the BGW),
either with AEL or by calling ProcessSingleRelationByOid().

--
Justin

#44

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Justin Pryzby (#43)

1 attachment(s)

Re: Online checksums patch - once again

On 19 Sep 2020, at 04:18, Justin Pryzby <pryzby@telsasoft.com> wrote:

Thanks for reviewing!

+ * changed to "inprogress-off", the barrier for mvoving to "off" can be
moving

Fixed.

+ * When disabling checksums, data_checksums will be set of "inprogress-off"
set *to*

Fixed.

+ get_namespace_name(RelationGetNamespace(reln)), RelationGetRelationName(reln),

I think this palloc()s a new copy of the namespace every 100 blocks.
Better do it outside the loop.

Good point, fixed.

+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},

enabling / disabling ?

Perhaps, but it doesn't match the grammatical tense of others though?

+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
Should this be an bitmask, maybe
DATA_CHECKSUMS_WRITE = 1
DATA_CHECKSUMS_VERIFY = 2,

That's an option, not sure if it would improve readability though. Anyone else
have opinions on that?

It occured to me that you could rephrase this patch as "Make checksum state
per-relation rather than cluster granularity". That's currently an
implementation detail, but could be exposed as a feature.

Thats not entirely correct. The patch tracks checksum status *during
inprogress-on* with per-relation granularity, but as it stands it doesn't
support per-relation during state "on" in any way.

A per-relation checksum mode where every relation at any point can enable or
disable checksums would require a very different synchronization mechanism from
the all-or-nothing one (which while simpler, IMO is complicated enough). My
hope is that this patch brings solid infrastructure for anyone interested in
persuing per-relation checksums, but IMHO we should focus on getting
per-cluster rock-solid first.

That could be a
preliminary 0001 patch. Half the existing patch would be 0002 "Allow
online enabling checksums for a given relation/database/cluster". You might
save some of the existing effort of synchronize the cluster-wide checksum
state, since it doesn't need to be synchronized.

I don't follow, how would a finer-grain resolution remove the need for
synchronization?

The "data_checksums" GUC
might be removed, or changed to an enum: on/off/per_relation. Moving from
"per_relation" to "on" would be an optional metadata-only change, allowed only
when all rels in all dbs are checksumed.

How would you guarantee that such a state change isn't happening concurrently
with a user doing ALTER TABLE .. checksums=off;? It would still require
synchronization along the lines of what this patch does unless I'm missing
something.

I'm not sure if you'd even care about
temp tables, since "relhaschecksum" would be authoritative, rather than a
secondary bit only used during processing.

XLogHintBitIsNeeded() and DataChecksumsEnabled() would need to check
relhaschecksum, which tentatively seems possible.

While possible, that's a pretty hot codepath so any additional checking will
need proper benchmarking.

relhaschecksum isn't guaranteed to be correct at any point other than during
checksum enabling, it's only used for tracking progress in case of a cluster
restart during processing. To better convey this, I asked for suggestions for
better name upthread since relhaschecksum carries the risk of overpromise/
underdeliver. Perhaps relchecksumprocessed would be better?

A real per-relation relhaschecksum in pg_class would also need to solve how to
keep it accurate for offline enable/disable via pg_checksums.

I'm not sure if it's possible, but maybe pg_checksums would be able to skip
rels which had already been checksummed "online" (with an option to force
reprocessing).

pg_checksums can't read the catalog state of the relations, so is has no
knowledge on where an online processing left off. That should probably be made
clearer in the docs though, so added a note on that.

Maybe some people would want (no) checksums on specific tables, and that could
eventually be implemented as 0003: "ALTER TABLE SET checksums=". I'm thinking
of that being used immediately after an CREATE, but I suppose ON would cause
the backend to rewrite the table with checksums synchronously (not in the BGW),
either with AEL or by calling ProcessSingleRelationByOid().

Holding an AEL while changing state would be easier as it skips the need for
the complex synchronization, but is that really what users would expect?
Especially if cluster-wide enable is done transparent to the user.

More importantly though, what is the use-case for per-relation that we'd be
looking at solving? Discussing the implementation without framing it in an
actual use-case runs the risk of performing successful surgery where the
patient still dies. Performing ETL ingestion without the overhead of writing
checksums? Ephemeral data? Maybe unlogged tables can handle some situation
where checksums would be appealing?

While in the patch I realized that the relationlist saved the relkind but the
code wasn't actually using it, so I've gone ahead and removed that with a lot
fewer palloc calls as a result. The attached v22 fixes that and the above.

cheers ./daniel

Attachments:

online_checksums22.patchapplication/octet-stream; name=online_checksums22.patch; x-unix-mode=0644Download

From 0e4e54c0197657a26ad6b647f2956ab0a69e74de Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 23 Sep 2020 14:21:43 +0200
Subject: [PATCH] Support checksum enable/disable in running cluster v22

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A new value "inprogress" is added for data_checksums during which
writes will set the checksum but reads wont enforce it. When all pages
have been checksummed the value will change to "on" which will enforce
the checksums on read. At this point, the cluster has the same state
as if checksums were enabled via initdb.

Checksums are being added via a background worker ChecksumHelper which
will process all pages in all databases. Pages accessed via concurrent
write operations will be checksummed with the normal process.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  273 +++-
 src/backend/access/transam/xlogfuncs.c       |   79 ++
 src/backend/catalog/heap.c                   |    1 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1326 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    2 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/ipc/ipci.c               |    2 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |    6 +-
 src/backend/utils/adt/pgstatfuncs.c          |    4 +-
 src/backend/utils/cache/relcache.c           |    4 +
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   16 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   43 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    3 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   86 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |  102 ++
 47 files changed, 2433 insertions(+), 49 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index de9bacd34f..e3008f2911 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461b748d89..9d93161b5c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25131,6 +25131,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1dd4e54ff1..0dd1c509eb 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..a9d8bd631f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4f61107a6a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..cf4eebf05b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,15 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing. Thus, it can be read by backends without the need for a lock.
+ * Possible values are the checksum versions defined in storage/bufpage.h and
+ * zero for when checksums are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -893,6 +904,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1078,7 +1090,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4889,9 +4901,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4925,13 +4935,198 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most callsites shouldn't need to worry about the "inprogress" states, since
+ * they should check the requirement for verification or writing. Some low-
+ * level callsites dealing with page writes need however to know. It's also
+ * used to check for aborted checksum processing which need to be restarted.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" wont get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also stop
+		 * writing checksums.
+		 */
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	}
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	StartDatachecksumsWorkerLauncher(RESET_STATE, 0, 0);
+
+	XlogChecksums(0);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7921,6 +8116,30 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown,
+	 * we know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9778,6 +9997,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10233,6 +10470,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..7ea91135ba 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,81 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	/*
+	 * Shutting down a concurrently running datachecksumsworker will not block
+	 * until the worker shuts down and exits, but we can continue turning off
+	 * checksums anyway since it will at most finish the block it had already
+	 * started and then abort.
+	 */
+	ShutdownDatachecksumsWorkerIfRunning();
+
+	SetDataChecksumsOff();
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	if (DataChecksumsWorkerStarted())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * Data checksums on -> on is not a valid state transition as there is
+	 * nothing to do.
+	 */
+	if (DataChecksumsNeedVerify())
+		PG_RETURN_BOOL(false);
+
+	/*
+	 * If the state is set to "inprogress-on" but the worker isn't running,
+	 * then the data checksumming was prematurely terminated. Attempt to
+	 * continue processing data pages where we left off based on state stored
+	 * in the catalog.
+	 */
+	if (DataChecksumsOnInProgress())
+	{
+		ereport(NOTICE,
+				(errmsg("data checksums partly enabled, continuing processing")));
+
+		StartDatachecksumsWorkerLauncher(ENABLE_CHECKSUMS, cost_delay, cost_limit);
+	}
+
+	/*
+	 * We are starting a checksumming process from scratch, and need to start
+	 * by clearing the state in pg_class in case checksums have ever been
+	 * enabled before (either fully or partly). As soon as we set the checksum
+	 * state to "inprogress-on", new relations will set relhaschecksums in
+	 * pg_class so it must be done first.
+	 */
+	else
+		StartDatachecksumsWorkerLauncher(RESET_STATE_AND_ENABLE_CHECKSUMS, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 67144aa3c9..dcd95c8acc 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -950,6 +950,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relispopulated - 1] = BoolGetDatum(rd_rel->relispopulated);
 	values[Anum_pg_class_relreplident - 1] = CharGetDatum(rd_rel->relreplident);
 	values[Anum_pg_class_relispartition - 1] = BoolGetDatum(rd_rel->relispartition);
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	values[Anum_pg_class_relrewrite - 1] = ObjectIdGetDatum(rd_rel->relrewrite);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel->relfrozenxid);
 	values[Anum_pg_class_relminmxid - 1] = MultiXactIdGetDatum(rd_rel->relminmxid);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..4dade0c116 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1225,6 +1225,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..aeb6d8c642 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..4d2050c381
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1326 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checkums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checkums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	DataChecksumOperation operation;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase * db);
+static bool ProcessAllDatabases(bool already_connected);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ * 		Main entry point for datachecksumsworker launcher process.
+ */
+void
+StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+								 int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * This can be hit during a short window during which the worker is
+	 * shutting down. Once done the worker will clear the abort flag and
+	 * re-processing can be performed.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("data checksums worker has been aborted")));
+	}
+
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		/* Failed to set means somebody else started */
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(NOTICE,
+				(errmsg("data checksums worker is already running")));
+		return;
+	}
+
+	/* Whether to enable or disable data checksums */
+	DatachecksumsWorkerShmem->operation = op;
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerMain");
+	else if (DatachecksumsWorkerShmem->operation == RESET_STATE ||
+			 DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "ResetDataChecksumsStateInDatabase");
+	else
+		elog(ERROR, "invalid datachecksumsworker operation requested: %d",
+			 DatachecksumsWorkerShmem->operation);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we wont be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	/*
+	 * Reset catalog state for checksum tracking, either to ensure that it's
+	 * cleared before enabling checksums or as part of disabling checksums.
+	 */
+	if (DatachecksumsWorkerShmem->operation == RESET_STATE ||
+		DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+	{
+		if (!ProcessAllDatabases(connected))
+		{
+			/*
+			 * Before we error out make sure we clear state since this may
+			 * otherwise render the worker stuck without possibility of a
+			 * restart.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->launcher_started = false;
+			DatachecksumsWorkerShmem->abort = false;
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to finish processing")));
+		}
+
+		connected = true;
+
+		/*
+		 * If checksums should be enabled as the next step, transition to the
+		 * ENABLE_CHECKSUMS state to keep processing in the next stage.
+		 */
+		if (DatachecksumsWorkerShmem->operation == RESET_STATE_AND_ENABLE_CHECKSUMS)
+		{
+			SetDataChecksumsOnInProgress();
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			DatachecksumsWorkerShmem->operation = ENABLE_CHECKSUMS;
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+	}
+
+	/*
+	 * Prepare for datachecksumsworker shutdown, once we signal that checksums
+	 * are enabled we want the worker to be done and exited to avoid races
+	 * with immediate disabling/enabling.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (DatachecksumsWorkerShmem->operation == ENABLE_CHECKSUMS)
+	{
+		/*
+		 * If processing succeeds for ENABLE_CHECKSUMS, then everything has been
+		 * processed so set checksums as enabled cluster-wide
+		 */
+		if (ProcessAllDatabases(connected))
+		{
+			SetDataChecksumsOn();
+			ereport(LOG,
+					(errmsg("checksums enabled cluster-wide")));
+		}
+	}
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	if (!found)
+	{
+		MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+		/*
+		 * Even if this is a redundant assignment, we want to be explicit
+		 * about our intent for readability, since we want to be able to query
+		 * this state in case of restartability.
+		 */
+		DatachecksumsWorkerShmem->launcher_started = false;
+	}
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when bulding a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we wont be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..d92d17af0d 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3770,6 +3770,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b89df01fa7..e42dae956a 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1598,7 +1598,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums && DataChecksumsNeedVerify())
 	{
 		char	   *filename;
 
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..f4dffad925 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..b1713cf751 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -259,6 +260,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index ffe67acea1..c5331a68ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 4bc2bf955d..6e06d26d6a 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -94,7 +94,7 @@ PageIsVerified(Page page, BlockNumber blkno)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1387,7 +1387,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	static char *pageCopy = NULL;
 
 	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return (char *) page;
 
 	/*
@@ -1414,7 +1414,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..4f31c1dce5 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,7 +1565,7 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
@@ -1583,7 +1583,7 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
+	if (!DataChecksumsNeedWrite())
 		PG_RETURN_NULL();
 
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 9061af81a3..cec6478f59 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -1883,6 +1883,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3491,6 +3493,8 @@ RelationBuildLocalRelation(const char *relname,
 	else
 		rel->rd_rel->relispopulated = true;
 
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+
 	/* set replica identity -- system catalogs and non-tables don't have one */
 	if (!IsCatalogNamespace(relnamespace) &&
 		(relkind == RELKIND_RELATION ||
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index ed2ab4b5b2..03b940dfd7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7e23..e5674f4e4f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -617,6 +617,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 596bcb7b84..51e6500f5f 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1911,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4786,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index ffdc23945c..6a5a596f46 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 00d71e3a8a..586bc70a70 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8b90cefbe0..a806cc6d0e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..bfe718195f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -199,7 +199,7 @@ extern PGDLLIMPORT int wal_level;
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +318,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 679eec3443..6ecec47f54 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..5409bf5c68 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10898,6 +10898,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..1ecbe856e6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -852,6 +852,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..df153f6b01
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,43 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 0,
+	RESET_STATE_AND_ENABLE_CHECKSUMS,
+	RESET_STATE
+}			DataChecksumOperation;
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(DataChecksumOperation op,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 51b8f994ac..0469df495e 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..37cd0abbd6 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index efb206aa75..6469ac94a4 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,8 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = perl regress isolation modules authentication recovery subscription
+SUBDIRS = perl regress isolation modules authentication recovery subscription \
+			checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..9dbb660937
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,86 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d10bd5c5c5
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..eb2bd515b0
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,102 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#45

Michael Paquier

michael@paquier.xyz

over 5 years ago

In reply to: Daniel Gustafsson (#44)

Re: Online checksums patch - once again

On Wed, Sep 23, 2020 at 02:34:36PM +0200, Daniel Gustafsson wrote:

While in the patch I realized that the relationlist saved the relkind but the
code wasn't actually using it, so I've gone ahead and removed that with a lot
fewer palloc calls as a result. The attached v22 fixes that and the above.

Some of the TAP tests are blowing up here, as the CF bot is telling:
t/003_standby_checksum.pl .. 1/11 # Looks like you planned 11 tests but ran 4.
# Looks like your test exited with 29 just after 4.
t/003_standby_checksum.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
--
Michael

#46

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Michael Paquier (#45)

Re: Online checksums patch - once again

On 24 Sep 2020, at 06:27, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Sep 23, 2020 at 02:34:36PM +0200, Daniel Gustafsson wrote:

While in the patch I realized that the relationlist saved the relkind but the
code wasn't actually using it, so I've gone ahead and removed that with a lot
fewer palloc calls as a result. The attached v22 fixes that and the above.

Some of the TAP tests are blowing up here, as the CF bot is telling:
t/003_standby_checksum.pl .. 1/11 # Looks like you planned 11 tests but ran 4.
# Looks like your test exited with 29 just after 4.
t/003_standby_checksum.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)

Interesting, I've been unable to trigger a fault and the latest Travis build
was green. I'll continue to try and see if I can shake something loose.

cheers ./daniel

#47

Heikki Linnakangas

hlinnaka@iki.fi

over 5 years ago

In reply to: Daniel Gustafsson (#44)

Re: Online checksums patch - once again

I looked at patch v22, and I can see two main issues:

1. The one that Robert talked about earlier: A backend checks the local
"checksums" state. If it's 'off', it writes a page without checksums.
How do you guarantee that the local state doesn't change in between? The
implicit assumption seems to be that there MUST NOT be any
CHECK_FOR_INTERRUPTS() calls between DataChecksumsNeedWrite() and the
write (or read and DataChecksumsNeedVerify()).

In most code, the DataChecksumsNeedWrite() call is very close to writing
out the page, often in the same critical section. But this is an
undocumented assumption.

The code in sendFile() in basebackup.c seems suspicious in that regard.
It calls DataChecksumsNeedVerify() once before starting to read the
file. Isn't it possible for the checksums flag to change while it's
reading the file and sending it to the client? I hope there are
CHECK_FOR_INTERRUPTS() calls buried somewhere in the loop, because it
could take minutes to send the whole file.

I would feel better if the state transition of the "checksums" flag
could only happen in a few safe places, or there were some other
safeguards for this. I think that's what Andres was trying to say
earlier in the thread on ProcSignalBarriers. I'm not sure what the
interface to that should be. It could be something like
HOLD/RESUME_INTERRUPTS(), where normally all procsignals are handled on
CHECK_FOR_INTERRUPTS(), but you could "hold off" some if needed. Or
something else. Or maybe we can just use HOLD/RESUME_INTERRUPTS() for
this. It's more coarse-grained than necessary, but probably doesn't
matter in practice.

At minimum, there needs to be comments in DataChecksumsNeedWrite() and
DataChecksumsNeedVerify(), instructing how to use them safely. Namely,
you must ensure that there are no interrupts between the
DataChecksumsNeedWrite() and writing out the page, or between reading
the page and the DataChecksumsNeedVerify() call. You can achieve that
with HOLD_INTERRUPTS() or a critical section, or simply ensuring that
there is no substantial code in between that could call
CHECK_FOR_INTERRUPTS(). And sendFile() in basebackup.c needs to be fixed.

Perhaps you could have "Assert(InteruptHoldOffCount > 0)" in
DataChecksumsNeedWrite() and DataChecksumsNeedVerify()? There could be
other ways that callers could avoid the TOCTOU issue, but it would
probably catch most of the unsafe call patterns, and you could always
wrap the DataChecksumsNeedWrite/verify() call in a dummy
HOLD_INTERRUPTS() block to work around the assertion if you know what
you're doing.

2. The signaling between enable_data_checksums() and the launcher
process looks funny to me. The general idea seems to be that
enable_data_checksums() just starts the launcher process, and the
launcher process figures out what it need to do and makes all the
changes to the global state. But then there's this violation of the
idea: enable_data_checksums() checks DataChecksumsOnInProgress(), and
tells the launcher process whether it should continue a previously
crashed operation or start from scratch. I think it would be much
cleaner if the launcher process figured that out itself, and
enable_data_checksums() would just tell the launcher what the target
state is.

enable_data_checksums() and disable_data_checksums() seem prone to race
conditions. If you call enable_data_checksums() in two backends
concurrently, depending on the timing, there are two possible outcomes:

a) One call returns true, and launches the background process. The other
call returns false.

b) Both calls return true, but one of them emits a "NOTICE: data
checksums worker is already running".

In disable_data_checksum() imagine what happens if another backend calls
enable_data_checksums() in between the
ShutdownDatachecksumsWorkerIfRunning() and SetDataChecksumsOff() calls.

/*
* Mark the buffer as dirty and force a full page write. We have to
* re-write the page to WAL even if the checksum hasn't changed,
* because if there is a replica it might have a slightly different
* version of the page with an invalid checksum, caused by unlogged
* changes (e.g. hintbits) on the master happening while checksums
* were off. This can happen if there was a valid checksum on the page
* at one point in the past, so only when checksums are first on, then
* off, and then turned on again. Iff wal_level is set to "minimal",
* this could be avoided iff the checksum is calculated to be correct.
*/
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();

It's really unfortunate that we have to dirty the page even if the
checksum already happens to match. Could we only do the
log_newpage_buffer() call and skip MarkBufferDirty() in that case?

Could we get away with a more lightweight WAL record that doesn't
contain the full-page image, but just the block number? On replay, the
redo routine would read the page from disk.

- Heikki

#48

Heikki Linnakangas

hlinnaka@iki.fi

over 5 years ago

In reply to: Daniel Gustafsson (#44)

Re: Online checksums patch - once again

Replying to an older message in this thread:

+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));
This seems pretty half-baked.
I don't disagree with that. However, given that enabling checksums is a pretty
intensive operation it seems somewhat unfriendly to automatically restart. As
a DBA I wouldn't want that to kick off without manual intervention, but there
is also the risk of this being missed due to assumptions that it would restart.
Any ideas on how to treat this?

If/when we can restart the processing where it left off, without the need to go
over all data again, things might be different wrt the default action.

The later patch version do support restarting, so I think we should
revisit this issue. I would expect the checksums worker to be
automatically started at postmaster startup. Can we make that happen?

- Heikki

#49

Álvaro Herrera

alvherre@alvh.no-ip.org

over 5 years ago

In reply to: Heikki Linnakangas (#47)

Re: Online checksums patch - once again

On 2020-Oct-05, Heikki Linnakangas wrote:

The code in sendFile() in basebackup.c seems suspicious in that regard. It
calls DataChecksumsNeedVerify() once before starting to read the file. Isn't
it possible for the checksums flag to change while it's reading the file and
sending it to the client? I hope there are CHECK_FOR_INTERRUPTS() calls
buried somewhere in the loop, because it could take minutes to send the
whole file.

I would feel better if the state transition of the "checksums" flag could
only happen in a few safe places, or there were some other safeguards for
this. I think that's what Andres was trying to say earlier in the thread on
ProcSignalBarriers. I'm not sure what the interface to that should be. It
could be something like HOLD/RESUME_INTERRUPTS(), where normally all
procsignals are handled on CHECK_FOR_INTERRUPTS(), but you could "hold off"
some if needed. Or something else. Or maybe we can just use
HOLD/RESUME_INTERRUPTS() for this. It's more coarse-grained than necessary,
but probably doesn't matter in practice.

I hope you're not suggesting that interrupts would be held for the whole
transmission of a file, which you say could take minutes. If we do have
an interrupt holdoff, then it has to be pretty short; users (and
systemd) despair if service shutdown is delayed more than a few seconds.

#50

Heikki Linnakangas

hlinnaka@iki.fi

over 5 years ago

In reply to: Álvaro Herrera (#49)

Re: Online checksums patch - once again

On 05/10/2020 17:25, Álvaro Herrera wrote:

On 2020-Oct-05, Heikki Linnakangas wrote:

The code in sendFile() in basebackup.c seems suspicious in that regard. It
calls DataChecksumsNeedVerify() once before starting to read the file. Isn't
it possible for the checksums flag to change while it's reading the file and
sending it to the client? I hope there are CHECK_FOR_INTERRUPTS() calls
buried somewhere in the loop, because it could take minutes to send the
whole file.

I would feel better if the state transition of the "checksums" flag could
only happen in a few safe places, or there were some other safeguards for
this. I think that's what Andres was trying to say earlier in the thread on
ProcSignalBarriers. I'm not sure what the interface to that should be. It
could be something like HOLD/RESUME_INTERRUPTS(), where normally all
procsignals are handled on CHECK_FOR_INTERRUPTS(), but you could "hold off"
some if needed. Or something else. Or maybe we can just use
HOLD/RESUME_INTERRUPTS() for this. It's more coarse-grained than necessary,
but probably doesn't matter in practice.

I hope you're not suggesting that interrupts would be held for the whole
transmission of a file, which you say could take minutes. If we do have
an interrupt holdoff, then it has to be pretty short; users (and
systemd) despair if service shutdown is delayed more than a few seconds.

I'm not suggesting that, sorry I wasn't clear. That would indeed be
horrible.

sendFile() needs a different solution, but all the other places where
DataChecksumsNeedWrite/Verify() is called need to be inspected to make
sure that they hold interrupts, or ensure some other way that an
interrupt doesn't change the local checksums flag between the
DataChecksumsNeedWrite/Verify() call and the read/write.

I think sendFile() needs to re-check the local checksums state before
each read. It also needs to ensure that an interrupt doesn't occur and
change the local checksums state between read and the
DataChecksumsNeedVerify() calls, but that's a very short period if you
call DataChecksumsNeedVerify() again for each block.

- Heikki

#51

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Heikki Linnakangas (#48)

Re: Online checksums patch - once again

On 5 Oct 2020, at 14:14, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Replying to an older message in this thread:
+ /*
+ * If we reach this point with checksums in inprogress state, we notify
+ * the user that they need to manually restart the process to enable
+ * checksums. This is because we cannot launch a dynamic background worker
+ * directly from here, it has to be launched from a regular backend.
+ */
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+ ereport(WARNING,
+ (errmsg("checksum state is \"inprogress\" with no worker"),
+ errhint("Either disable or enable checksums by calling the
pg_disable_data_checksums() or pg_enable_data_checksums()
functions.")));
This seems pretty half-baked.
I don't disagree with that. However, given that enabling checksums is a pretty
intensive operation it seems somewhat unfriendly to automatically restart. As
a DBA I wouldn't want that to kick off without manual intervention, but there
is also the risk of this being missed due to assumptions that it would restart.
Any ideas on how to treat this?
If/when we can restart the processing where it left off, without the need to go
over all data again, things might be different wrt the default action.
The later patch version do support restarting, so I think we should revisit this issue.

Agreed, now it makes sense to restart automatically.

I would expect the checksums worker to be automatically started at postmaster startup. Can we make that happen?

A dynamic background worker has to be registered from a regular backend, so
it's not entirely clear to me where in startup processing that would take
place. Do you have any good suggestions?

cheers ./daniel

#52

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Daniel Gustafsson (#51)

Re: Online checksums patch - once again

On 12/11/2020 15:17, Daniel Gustafsson wrote:

On 5 Oct 2020, at 14:14, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I would expect the checksums worker to be automatically started at postmaster startup. Can we make that happen?

A dynamic background worker has to be registered from a regular backend, so
it's not entirely clear to me where in startup processing that would take
place. Do you have any good suggestions?

Could you launch it from the startup process, in StartupXLOG()?
Does it have to be dynamic?

- Heikki

#53

Magnus Hagander

magnus@hagander.net

about 5 years ago

In reply to: Heikki Linnakangas (#52)

Re: Online checksums patch - once again

On Fri, Nov 13, 2020 at 12:22 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 12/11/2020 15:17, Daniel Gustafsson wrote:

On 5 Oct 2020, at 14:14, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
I would expect the checksums worker to be automatically started at

postmaster startup. Can we make that happen?

A dynamic background worker has to be registered from a regular backend,

so

it's not entirely clear to me where in startup processing that would take
place. Do you have any good suggestions?

Could you launch it from the startup process, in StartupXLOG()?
Does it have to be dynamic?

If it's not dynamic, you can't start it from a regular backend can you? So
then you'd need a restart for it to happen?

As for launching it from the startup process I don't know, that might be a
viable path. The code specifically explains why it's not possible to launch
it from the postmaster, but I don't see anything that would make it
impossible from the startup process.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#54

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Heikki Linnakangas (#47)

1 attachment(s)

Re: Online checksums patch - once again

On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I looked at patch v22, and I can see two main issues:

Thanks for reviewing!

1. The one that Robert talked about earlier: A backend checks the local "checksums" state. If it's 'off', it writes a page without checksums. How do you guarantee that the local state doesn't change in between? The implicit assumption seems to be that there MUST NOT be any CHECK_FOR_INTERRUPTS() calls between DataChecksumsNeedWrite() and the write (or read and DataChecksumsNeedVerify()).

In most code, the DataChecksumsNeedWrite() call is very close to writing out the page, often in the same critical section. But this is an undocumented assumption.

I've extended the documentation on this.

The code in sendFile() in basebackup.c seems suspicious in that regard. It calls DataChecksumsNeedVerify() once before starting to read the file. Isn't it possible for the checksums flag to change while it's reading the file and sending it to the client? I hope there are CHECK_FOR_INTERRUPTS() calls buried somewhere in the loop, because it could take minutes to send the whole file.

Agreed, fixed.

I would feel better if the state transition of the "checksums" flag could only happen in a few safe places, or there were some other safeguards for this. I think that's what Andres was trying to say earlier in the thread on ProcSignalBarriers. I'm not sure what the interface to that should be. It could be something like HOLD/RESUME_INTERRUPTS(), where normally all procsignals are handled on CHECK_FOR_INTERRUPTS(), but you could "hold off" some if needed. Or something else. Or maybe we can just use HOLD/RESUME_INTERRUPTS() for this. It's more coarse-grained than necessary, but probably doesn't matter in practice.

At minimum, there needs to be comments in DataChecksumsNeedWrite() and DataChecksumsNeedVerify(), instructing how to use them safely. Namely, you must ensure that there are no interrupts between the DataChecksumsNeedWrite() and writing out the page, or between reading the page and the DataChecksumsNeedVerify() call. You can achieve that with HOLD_INTERRUPTS() or a critical section, or simply ensuring that there is no substantial code in between that could call CHECK_FOR_INTERRUPTS(). And sendFile() in basebackup.c needs to be fixed.

Perhaps you could have "Assert(InteruptHoldOffCount > 0)" in DataChecksumsNeedWrite() and DataChecksumsNeedVerify()? There could be other ways that callers could avoid the TOCTOU issue, but it would probably catch most of the unsafe call patterns, and you could always wrap the DataChecksumsNeedWrite/verify() call in a dummy HOLD_INTERRUPTS() block to work around the assertion if you know what you're doing.

The attached holds off interrupt processing for the NeedWrite and NeedVerify
cases, and holds them for what I hope is the right duration for the respective
callsites.

One thing I realized in doing this is that the pg_stat_database checksums
statistics are set to NULL when checksums are disabled. That makes perfect
sense when checksum state is static, but not when it can be turned on/off. For
now I've made it so that stats are set to zero instead, and will continue
showing stats even if checksums gets disabled. Not sure what the best option
would be here.

2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems to be that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need to do and makes all the changes to the global state. But then there's this violation of the idea: enable_data_checksums() checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a previously crashed operation or start from scratch. I think it would be much cleaner if the launcher process figured that out itself, and enable_data_checksums() would just tell the launcher what the target state is.

enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call enable_data_checksums() in two backends concurrently, depending on the timing, there are two possible outcomes:

a) One call returns true, and launches the background process. The other call returns false.

b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running".

In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the ShutdownDatachecksumsWorkerIfRunning() and SetDataChecksumsOff() calls.

I've reworked this in the attached such that the enable_ and disable_ functions
merely call into the launcher with the desired outcome, and the launcher is
responsible for figuring out the rest. The datachecksumworker is now the sole
place which initiates a state transfer.

/*
* Mark the buffer as dirty and force a full page write. We have to
* re-write the page to WAL even if the checksum hasn't changed,
* because if there is a replica it might have a slightly different
* version of the page with an invalid checksum, caused by unlogged
* changes (e.g. hintbits) on the master happening while checksums
* were off. This can happen if there was a valid checksum on the page
* at one point in the past, so only when checksums are first on, then
* off, and then turned on again. Iff wal_level is set to "minimal",
* this could be avoided iff the checksum is calculated to be correct.
*/
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();

It's really unfortunate that we have to dirty the page even if the checksum already happens to match. Could we only do the log_newpage_buffer() call and skip MarkBufferDirty() in that case?

I think we can, but I've intentionally stayed away from such optimizations
until the basic version of the patch was deemed safe and approaching done.
It's complicated enough as it is IMO.

Could we get away with a more lightweight WAL record that doesn't contain the full-page image, but just the block number? On replay, the redo routine would read the page from disk.

Quite possibly, but I'm not sure how to reason about such a change to ensure
it's safety. I would love any ideas you'd have.

The attached fixes the above plus a few other small things found while hacking
on this version.

cheers ./daniel

Attachments:

online_checksums23.patchapplication/octet-stream; name=online_checksums23.patch; x-unix-mode=0644Download

From f1e1229570e3a149126954acf4bfc2a9321f716d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 23 Sep 2020 14:21:43 +0200
Subject: [PATCH] Support checksum enable/disable in running cluster v23

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Further description of the process TBW once the dust settles around
this.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/amcheck.sgml                    |    2 +-
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/heap/heapam.c             |    4 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  281 +++-
 src/backend/access/transam/xlogfuncs.c       |   39 +
 src/backend/catalog/heap.c                   |    3 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1500 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   90 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |  102 ++
 51 files changed, 2658 insertions(+), 73 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 99fad708bf..494cd1bd08 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 569841398b..5584b50caa 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c7d177c02..636e855b93 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25135,6 +25135,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 98e1995453..e492809190 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3666,8 +3666,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3677,8 +3676,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1dd4e54ff1..0dd1c509eb 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..a9d8bd631f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..81ab0785ef 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7258,7 +7258,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7279,11 +7279,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4f61107a6a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7d97b96e72..ac41ac525c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,15 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing. Thus, it can be read by backends without the need for a lock.
+ * Possible values are the checksum versions defined in storage/bufpage.h and
+ * zero for when checksums are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -893,6 +904,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4891,9 +4903,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4927,13 +4937,206 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most callsites shouldn't need to worry about the "inprogress" states, since
+ * they should check the requirement for verification or writing. Some low-
+ * level callsites dealing with page writes need however to know. It's also
+ * used to check for aborted checksum processing which need to be restarted.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+SetDataChecksumsOn(void)
+{
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	}
+
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7916,6 +8119,30 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9767,6 +9994,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10222,6 +10467,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..b754e007e0 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,41 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4cd7d76938..ea642fa0ff 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2e4aa1c4b6..01c6d96823 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1243,6 +1243,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..aeb6d8c642 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..24a0d91bc4
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1500 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		target;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the
+	 * process of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->target)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any point in time. That being
+			 * said, a user who sees stale relhaschecksums entries in the
+			 * catalog might run this just in case.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+			DatachecksumsWorkerShmem->operations[1] = DISABLE_CHECKSUMS;
+		}
+	}
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e76e627c6b..8b9a1db75c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3916,6 +3916,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b89df01fa7..51065c717f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1598,7 +1598,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1684,7 +1684,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..908edfb423 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index ad0d1a9abc..8a14f29027 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2933,8 +2933,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..9a33560469 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index ffe67acea1..c5331a68ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index ddf18079e2..3b74ddaa92 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a210fc93b4..9e1dc45cd8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,9 +1565,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1583,9 +1580,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66393becfb..9a38499dcb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1816,7 +1817,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1884,6 +1886,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3536,6 +3540,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the checksum state. Since the checksum state can change at any
+	 * time, the fetched value might be out of date by the time.
+	 * DataChecksumsNeedWrite returns true when checksums are: enabled; are
+	 * in the process of being enabled "inprogress-on"; are in the process of
+	 * being disabled "inprogress-off". Since relhaschecksums is only used to
+	 * track progress when checksums are being enabled, and going from
+	 * disabled to enabled will clear relhaschecksums before starting, it is
+	 * safe to use this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.
+	 *
+	 * Performing this last shortens the TOCTOU window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3802,6 +3827,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3826,16 +3852,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3864,6 +3898,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3884,15 +3919,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index ed2ab4b5b2..03b940dfd7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f2dd8e4914..bbe6663d2f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -616,6 +616,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..14dfe6d5ba 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1911,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4786,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index ffdc23945c..6a5a596f46 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 39bcaa8fe1..32058ebf61 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index ee70243c2e..bfa05eb1b0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..8dfd70fba6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index bb5e72ca43..275eb0a1a6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c01da4bf01..4e135a8f87 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10931,6 +10931,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..f56ecee715 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -923,6 +923,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..3572ec80c5
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index d0a52f8e08..3bb7742642 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..37cd0abbd6 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index 14cde4f5ba..61d6b918b9 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..68ff5f6a54
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,90 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d10bd5c5c5
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..eb2bd515b0
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,102 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#55

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Daniel Gustafsson (#54)

Re: Online checksums patch - once again

On 17/11/2020 10:56, Daniel Gustafsson wrote:

On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems to be that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need to do and makes all the changes to the global state. But then there's this violation of the idea: enable_data_checksums() checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a previously crashed operation or start from scratch. I think it would be much cleaner if the launcher process figured that out itself, and enable_data_checksums() would just tell the launcher what the target state is.

enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call enable_data_checksums() in two backends concurrently, depending on the timing, there are two possible outcomes:

a) One call returns true, and launches the background process. The other call returns false.

b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running".

In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the ShutdownDatachecksumsWorkerIfRunning() and SetDataChecksumsOff() calls.

I've reworked this in the attached such that the enable_ and disable_ functions
merely call into the launcher with the desired outcome, and the launcher is
responsible for figuring out the rest. The datachecksumworker is now the sole
place which initiates a state transfer.

Well, you still fill the DatachecksumsWorkerShmem->operations array in
the backend process that launches the datacheckumworker, not in the
worker process. I find that still a bit surprising, but I believe it works.

/*
* Mark the buffer as dirty and force a full page write. We have to
* re-write the page to WAL even if the checksum hasn't changed,
* because if there is a replica it might have a slightly different
* version of the page with an invalid checksum, caused by unlogged
* changes (e.g. hintbits) on the master happening while checksums
* were off. This can happen if there was a valid checksum on the page
* at one point in the past, so only when checksums are first on, then
* off, and then turned on again. Iff wal_level is set to "minimal",
* this could be avoided iff the checksum is calculated to be correct.
*/
START_CRIT_SECTION();
MarkBufferDirty(buf);
log_newpage_buffer(buf, false);
END_CRIT_SECTION();

It's really unfortunate that we have to dirty the page even if the checksum already happens to match. Could we only do the log_newpage_buffer() call and skip MarkBufferDirty() in that case?

I think we can, but I've intentionally stayed away from such optimizations
until the basic version of the patch was deemed safe and approaching done.
It's complicated enough as it is IMO.

Fair enough.

The attached fixes the above plus a few other small things found while hacking
on this version.

I haven't read through the whole patch, but a few random comments below,
in no particular order:

pg_enable/disable_data_checksums() should perform a superuser-check. I
don't think we want to expose them to any users.

+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing. Thus, it can be read by backends without the need for a lock.
+ * Possible values are the checksum versions defined in storage/bufpage.h and
+ * zero for when checksums are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;

The comment is a bit confusing: "Thus, it can be read by backends
without the need for a lock". Since it's a variable in backend-private
memory, it can only be read by the same backend, not "backends". And
that's also why you don't need a lock, not because it's updated during
interrupt processing. I understand how this works, but maybe rephrase
the comment a bit.

+/*
+ * DataChecksumsOnInProgress
+ *             Returns whether data checksums are being enabled
+ *
+ * Most callsites shouldn't need to worry about the "inprogress" states, since
+ * they should check the requirement for verification or writing. Some low-
+ * level callsites dealing with page writes need however to know. It's also
+ * used to check for aborted checksum processing which need to be restarted.
*/
bool
-DataChecksumsEnabled(void)
+DataChecksumsOnInProgress(void)
+{
+       return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}

s/need/needs/. The whole paragraph feels a bit awkwardly worded in
general. I'd suggest something like: "Most operations don't need to
worry about the "inprogress" states, and should use
DataChecksumsNeedVerify() or DataChecksumsNeedWrite()".
DataChecksumsOffInProgress() is called from
StartDatachecksumsWorkerLauncher(), which I wouldn't call a "low-level
callsite".

@@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata,
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

if (doPageWrites &&
(!prevDoPageWrites ||

The comment above this deserves to be updated. Also, this is a very hot
codepath; will this slow down WAL-logging, when full-page writes are
disabled? Could we inline DataChecksumsOnInProgress() or set
Insert->forcePageWrites when checksums are being computed or something?

In StartupXLOG():

+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+

Should this be WAL-logged, like in SetDataChecksumsOff()?

In SetDataChecksumsOff():

+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}

What happens is if you crash between UpdateControlFile() and XlogChecksum()?

- Heikki

#56

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Heikki Linnakangas (#55)

1 attachment(s)

Re: Online checksums patch - once again

On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 17/11/2020 10:56, Daniel Gustafsson wrote:

On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems to be that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need to do and makes all the changes to the global state. But then there's this violation of the idea: enable_data_checksums() checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a previously crashed operation or start from scratch. I think it would be much cleaner if the launcher process figured that out itself, and enable_data_checksums() would just tell the launcher what the target state is.

enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call enable_data_checksums() in two backends concurrently, depending on the timing, there are two possible outcomes:

a) One call returns true, and launches the background process. The other call returns false.

b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running".

In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the ShutdownDatachecksumsWorkerIfRunning() and SetDataChecksumsOff() calls.

I've reworked this in the attached such that the enable_ and disable_ functions
merely call into the launcher with the desired outcome, and the launcher is
responsible for figuring out the rest. The datachecksumworker is now the sole
place which initiates a state transfer.

Well, you still fill the DatachecksumsWorkerShmem->operations array in the backend process that launches the datacheckumworker, not in the worker process. I find that still a bit surprising, but I believe it works.

I'm open to changing it in case there are strong opinions, it just seemed the
most natural to me.

pg_enable/disable_data_checksums() should perform a superuser-check. I don't think we want to expose them to any users.

Fixed.

+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing. Thus, it can be read by backends without the need for a lock.
+ * Possible values are the checksum versions defined in storage/bufpage.h and
+ * zero for when checksums are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
The comment is a bit confusing: "Thus, it can be read by backends without the need for a lock". Since it's a variable in backend-private memory, it can only be read by the same backend, not "backends". And that's also why you don't need a lock, not because it's updated during interrupt processing. I understand how this works, but maybe rephrase the comment a bit.

Fixed.

+/*
+ * DataChecksumsOnInProgress
+ *             Returns whether data checksums are being enabled
+ *
+ * Most callsites shouldn't need to worry about the "inprogress" states, since
+ * they should check the requirement for verification or writing. Some low-
+ * level callsites dealing with page writes need however to know. It's also
+ * used to check for aborted checksum processing which need to be restarted.
*/
bool
-DataChecksumsEnabled(void)
+DataChecksumsOnInProgress(void)
+{
+       return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
s/need/needs/. The whole paragraph feels a bit awkwardly worded in general. I'd suggest something like: "Most operations don't need to worry about the "inprogress" states, and should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite()". DataChecksumsOffInProgress() is called from StartDatachecksumsWorkerLauncher(), which I wouldn't call a "low-level callsite".

Fixed, and a few related surrounding comments expanded and tweaked.

@@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata,
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
if (doPageWrites &&
(!prevDoPageWrites ||

The comment above this deserves to be updated.

Fixed.

Also, this is a very hot codepath; will this slow down WAL-logging, when full-page writes are disabled? Could we inline DataChecksumsOnInProgress() or set Insert->forcePageWrites when checksums are being computed or something?

Wouldn't setting forcePageWrites risk causing other side effects that we don't
want here? I've changed DataChecksumsOnInProgress to inline as a start.

In StartupXLOG():

+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+

Should this be WAL-logged, like in SetDataChecksumsOff()?

My initial thinking was that we wouldn't need to, as all nodes would process
the controlfile equally. The more I think about it the more I think we need to
have an XLOG record of it. Added.

It would be good to stress this in a TAP test, but I haven't been able to write
one yet.

In SetDataChecksumsOff():
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	XlogChecksums(0);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
What happens is if you crash between UpdateControlFile() and XlogChecksum()?

Good point, that would not get the cluster to a consistent state. The
XlogChecksum should be performed before controlfile is udpated.

The attached patch contains these fixes as well as a rebase on top of todays
Git HEAD.

cheers ./daniel

Attachments:

online_checksums24.patchapplication/octet-stream; name=online_checksums24.patch; x-unix-mode=0644Download

From 2d4a81c224e3f8bc022fd64728638b87b252bb8d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 25 Nov 2020 14:12:12 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v24

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Further description of the process TBW once the dust settles around
this.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/amcheck.sgml                    |    2 +-
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/heap/heapam.c             |    4 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  307 +++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    3 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1500 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   90 ++
 src/test/checksum/t/002_restarts.pl          |   97 ++
 src/test/checksum/t/003_standby_checksum.pl  |  102 ++
 51 files changed, 2690 insertions(+), 75 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 99fad708bf..494cd1bd08 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 569841398b..5584b50caa 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 507bc1a668..26652b2f27 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25102,6 +25102,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 98e1995453..e492809190 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3666,8 +3666,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3677,8 +3676,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1dd4e54ff1..0dd1c509eb 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..a9d8bd631f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..81ab0785ef 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7258,7 +7258,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7279,11 +7279,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4f61107a6a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 13f1d8c3dc..83cafec2c8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -893,6 +905,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1065,8 +1078,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1079,7 +1092,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4891,9 +4904,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4927,13 +4938,225 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress" state for enabling checksums is used when the checksum worker
+ * is setting checksums on all pages, it can thus be used to check for aborted
+ * checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress" state for disabling checksums is used for when the worker
+ * resets the catalog state. Operations should use DataChecksumsNeedVerify()
+ * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	Assert(ControlFile != NULL);
+
+	if (LocalDataChecksumVersion > 0)
+		return;
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" and the second one to "on".
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
 {
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7917,6 +8140,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9768,6 +10017,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10223,6 +10490,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..ab531484a7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4cd7d76938..ea642fa0ff 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 2e4aa1c4b6..01c6d96823 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1243,6 +1243,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..aeb6d8c642 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..24a0d91bc4
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1500 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		target;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the
+	 * process of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->target)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any point in time. That being
+			 * said, a user who sees stale relhaschecksums entries in the
+			 * catalog might run this just in case.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+			DatachecksumsWorkerShmem->operations[1] = DISABLE_CHECKSUMS;
+		}
+	}
+
+	/* Backoff parameters to throttle the load during enabling */
+	DatachecksumsWorkerShmem->cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = cost_limit;
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e76e627c6b..8b9a1db75c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3916,6 +3916,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index b89df01fa7..51065c717f 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1598,7 +1598,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1684,7 +1684,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..908edfb423 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index ad0d1a9abc..8a14f29027 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2933,8 +2933,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..9a33560469 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index ffe67acea1..c5331a68ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index ddf18079e2..3b74ddaa92 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a210fc93b4..9e1dc45cd8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,9 +1565,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1583,9 +1580,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66393becfb..9a38499dcb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1816,7 +1817,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1884,6 +1886,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3536,6 +3540,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the checksum state. Since the checksum state can change at any
+	 * time, the fetched value might be out of date by the time.
+	 * DataChecksumsNeedWrite returns true when checksums are: enabled; are
+	 * in the process of being enabled "inprogress-on"; are in the process of
+	 * being disabled "inprogress-off". Since relhaschecksums is only used to
+	 * track progress when checksums are being enabled, and going from
+	 * disabled to enabled will clear relhaschecksums before starting, it is
+	 * safe to use this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.
+	 *
+	 * Performing this last shortens the TOCTOU window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3802,6 +3827,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3826,16 +3852,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3864,6 +3898,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3884,15 +3919,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index ed2ab4b5b2..03b940dfd7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index f2dd8e4914..bbe6663d2f 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -616,6 +616,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bb34630e8e..14dfe6d5ba 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1911,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4786,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index ffdc23945c..6a5a596f46 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 39bcaa8fe1..32058ebf61 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index ee70243c2e..bfa05eb1b0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..8dfd70fba6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index bb5e72ca43..275eb0a1a6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e7fbda9f81..4dab5234ef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10935,6 +10935,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 257e515bfe..f56ecee715 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -923,6 +923,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..3572ec80c5
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index d0a52f8e08..3bb7742642 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..37cd0abbd6 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index 14cde4f5ba..61d6b918b9 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..68ff5f6a54
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,90 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we still can process data fine
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums when already disabled, which is also a no-op so we mainly
+# want to run this to make sure the backend isn't crashing or erroring out
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..d10bd5c5c5
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,97 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..eb2bd515b0
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,102 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 11;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on or in progress on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1,10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary');
+
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
-- 
2.21.1 (Apple Git-122.3)

#57

Heikki Linnakangas

hlinnaka@iki.fi

about 5 years ago

In reply to: Daniel Gustafsson (#56)

Re: Online checksums patch - once again

On 25/11/2020 15:20, Daniel Gustafsson wrote:

On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
What happens is if you crash between UpdateControlFile() and XlogChecksum()?

Good point, that would not get the cluster to a consistent state. The
XlogChecksum should be performed before controlfile is udpated.

+void
+SetDataChecksumsOff(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+		WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}

The lwlocking doesn't look right here. If
ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION,
LWLockAcquire is called twice without a LWLockRelease in between.

What if a checkpoint, and a crash, happens just after the WAL record has
been written, but before the control file is updated? That's a
ridiculously tight window for a whole checkpoint cycle to happen, but in
principle I think that would spell trouble. I think you could set
delayChkpt to prevent the checkpoint from happening in that window,
similar to how we avoid this problem with the clog updates at commit.
Also, I think this should be in a critical section; we don't want the
process to error out in between for any reason, and if it does happen,
it's panic time.

- Heikki

#58

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Heikki Linnakangas (#57)

1 attachment(s)

Re: Online checksums patch - once again

On 25 Nov 2020, at 14:33, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

The lwlocking doesn't look right here. If ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION, LWLockAcquire is called twice without a LWLockRelease in between.

Right, fixed.

What if a checkpoint, and a crash, happens just after the WAL record has been written, but before the control file is updated? That's a ridiculously tight window for a whole checkpoint cycle to happen, but in principle I think that would spell trouble. I think you could set delayChkpt to prevent the checkpoint from happening in that window, similar to how we avoid this problem with the clog updates at commit. Also, I think this should be in a critical section; we don't want the process to error out in between for any reason, and if it does happen, it's panic time.

Good points. The attached patch performs the state changes inside a critical
section with checkpoints delayed, as well as emit the barrier inside the
critical section while awaiting the barrier outside to keep it open as short as
possible.

I've also done some tweaks to the tests to make them more robust as well as
comment updates and general tidying up here and there.

cheers ./daniel

Attachments:

online_checksums25.patchapplication/octet-stream; name=online_checksums25.patch; x-unix-mode=0644Download

From 03a741bd31efb99578a3a30e55a2c8fca9d95881 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 25 Nov 2020 14:12:12 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v24

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Further description of the process TBW once the dust settles around
this.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/amcheck.sgml                    |    2 +-
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/heap/heapam.c             |    4 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  381 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    3 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1527 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 51 files changed, 2815 insertions(+), 75 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 99fad708bf..494cd1bd08 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 79069ddfab..9cf87c03f3 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df29af6371..07464a5590 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25095,6 +25095,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 52a69a5366..4c770d6611 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3693,8 +3693,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3704,8 +3703,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 1dd4e54ff1..0dd1c509eb 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index d1c3893b14..a9d8bd631f 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..81ab0785ef 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7258,7 +7258,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7279,11 +7279,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 3200f777f5..4f61107a6a 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 13f1d8c3dc..df0ef05ad9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -893,6 +905,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1065,8 +1078,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1079,7 +1092,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4891,9 +4904,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4927,13 +4938,299 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress" state for enabling checksums is used when the checksum worker
+ * is setting checksums on all pages, it can thus be used to check for aborted
+ * checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress" state for disabling checksums is used for when the worker
+ * resets the catalog state. Operations should use DataChecksumsNeedVerify()
+ * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" and the second one to "on".
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * inprogress-off state during which backends continue to write checksums
+	 * without verifying them. When all backends are in "inprogress-off" the
+	 * next transition to "off" can be performed, after which all data checksum
+	 * processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * and we can transition directly to "off" from there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7917,6 +8214,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9768,6 +10091,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10223,6 +10564,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 290658b22c..ab531484a7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4cd7d76938..ea642fa0ff 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b140c210bc..9ac784af9a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1246,6 +1246,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 5a9a0e3435..aeb6d8c642 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5d94db95f9
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1527 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		target;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the
+	 * process of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->target)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in the
+			 * catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system data
+			 * checksum state) be a window between catalog resetting and state
+			 * transition when new relations are created with the catalog state
+			 * set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is
+	 * no real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 9bad14981b..60b1f6de60 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3932,6 +3932,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 22be7ca9d5..80fb7aeef9 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1606,7 +1606,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1692,7 +1692,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 3f84ee99b8..908edfb423 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -212,6 +212,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index ad0d1a9abc..8a14f29027 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2933,8 +2933,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..9a33560469 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index ffe67acea1..c5331a68ba 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index ddf18079e2..3b74ddaa92 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 6afe1b6f56..e1c8bb640e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,9 +1565,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1583,9 +1580,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66393becfb..9a38499dcb 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1816,7 +1817,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1884,6 +1886,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3536,6 +3540,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the checksum state. Since the checksum state can change at any
+	 * time, the fetched value might be out of date by the time.
+	 * DataChecksumsNeedWrite returns true when checksums are: enabled; are
+	 * in the process of being enabled "inprogress-on"; are in the process of
+	 * being disabled "inprogress-off". Since relhaschecksums is only used to
+	 * track progress when checksums are being enabled, and going from
+	 * disabled to enabled will clear relhaschecksums before starting, it is
+	 * safe to use this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.
+	 *
+	 * Performing this last shortens the TOCTOU window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3802,6 +3827,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3826,16 +3852,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3864,6 +3898,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3884,15 +3919,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index ed2ab4b5b2..03b940dfd7 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 82d451569d..35782a93da 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -594,6 +594,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 02d2d267b5..b3f028fb86 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1898,17 +1911,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4784,6 +4786,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 28aba92a4c..0cf91f076c 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 39bcaa8fe1..32058ebf61 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -657,6 +657,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index ee70243c2e..bfa05eb1b0 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 221af87e71..8dfd70fba6 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 4146753d47..80a959bd7f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index bb5e72ca43..275eb0a1a6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 06bed90c5e..6bc802d8ba 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc2202b843..70c1f26375 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10936,6 +10936,22 @@
   proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}',
   prosrc => 'pg_control_init' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72e3352398..c4893551a3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5954068dec..50848876cf 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..3572ec80c5
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index d0a52f8e08..3bb7742642 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 6e77744cbc..f6ae955f58 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 5cb39697f3..37cd0abbd6 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index 14cde4f5ba..61d6b918b9 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..0f44512f83
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
-- 
2.21.1 (Apple Git-122.3)

#59

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Daniel Gustafsson (#58)

1 attachment(s)

Re: Online checksums patch - once again

On 3 Dec 2020, at 10:37, Daniel Gustafsson <daniel@yesql.se> wrote:

I've also done some tweaks to the tests to make them more robust as well as
comment updates and general tidying up here and there.

Attached is a rebase of the patch on top of current HEAD.

cheers ./daniel

Attachments:

online_checksums26.patchapplication/octet-stream; name=online_checksums26.patch; x-unix-mode=0644Download

From f36f0acf4733f47ada04e766e1c41d52a5ca8285 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 25 Nov 2020 14:12:12 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v24

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Further description of the process TBW once the dust settles around
this.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/amcheck.sgml                    |    2 +-
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/heap/heapam.c             |    4 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  381 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    3 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1527 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 51 files changed, 2815 insertions(+), 75 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 8dfb01a77b..5be0a0b9cf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..a81878369c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02a37658ad..307e37acd1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25800,6 +25800,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3d6c901306..fe8d1b4fe0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3693,8 +3693,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3704,8 +3703,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..5dcfcdd2ff 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..06c001f8ff 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede93ad7fd..01eca900ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -903,6 +915,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1075,8 +1088,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1089,7 +1102,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4902,9 +4915,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4938,13 +4949,299 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?
+ * In case checksums are being enabled we must write the checksum even though
+ * it's not verified during this stage.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress" state for enabling checksums is used when the checksum worker
+ * is setting checksums on all pages, it can thus be used to check for aborted
+ * checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress" state for disabling checksums is used for when the worker
+ * resets the catalog state. Operations should use DataChecksumsNeedVerify()
+ * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" and the second one to "on".
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * inprogress-off state during which backends continue to write checksums
+	 * without verifying them. When all backends are in "inprogress-off" the
+	 * next transition to "off" can be performed, after which all data checksum
+	 * processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * and we can transition directly to "off" from there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * Barrier absorption functions for disabling data checksums
+ */
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7929,6 +8226,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9860,6 +10183,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10315,6 +10656,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 21f2240ade..a5e715f19c 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ab4603c69b..e0c9351ce4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1246,6 +1246,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5d94db95f9
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1527 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *		  currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------
+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will
+ *   thus automatically belong to Be.  Checksums are enabled cluster-wide when
+ *   Bi is an empty set. All sets are compatible while still operating based on
+ *   their local state.
+ *
+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		target;
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the
+	 * process of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->target)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in the
+			 * catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system data
+			 * checksum state) be a window between catalog resetting and state
+			 * transition when new relations are created with the catalog state
+			 * set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is
+	 * no real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..96c814a91c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3937,6 +3937,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 23ab3cf605..ecfefd13ce 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8f2c482bc8..c14234f1a5 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2932,8 +2932,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..c5d9d3d846 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c9a1d4c56d..3b564ef9cf 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1565,9 +1565,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1583,9 +1580,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index afc3451a54..467cd6d2f3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1815,7 +1816,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1883,6 +1885,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3535,6 +3539,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the checksum state. Since the checksum state can change at any
+	 * time, the fetched value might be out of date by the time.
+	 * DataChecksumsNeedWrite returns true when checksums are: enabled; are
+	 * in the process of being enabled "inprogress-on"; are in the process of
+	 * being disabled "inprogress-off". Since relhaschecksums is only used to
+	 * track progress when checksums are being enabled, and going from
+	 * disabled to enabled will clear relhaschecksums before starting, it is
+	 * safe to use this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.
+	 *
+	 * Performing this last shortens the TOCTOU window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3800,6 +3825,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3824,16 +3850,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3862,6 +3896,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3882,15 +3917,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 59b3f4b135..ce933a9d08 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -605,6 +605,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2779da8a69..d583a052e2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -607,7 +620,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;
 static bool integer_datetimes;
 static bool assert_enabled;
 static char *recovery_target_timeline_string;
@@ -1888,17 +1901,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4774,6 +4776,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums_tmp,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..28b22db7fb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 9585ad17b3..356ecdab61 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..3c039274b3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11240,6 +11240,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '4142',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '4035',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 2c71db79c0..f31414c518 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -323,6 +323,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a7e199750..95f5ad05da 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..3572ec80c5
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..558a8135f1
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..0f44512f83
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
-- 
2.21.1 (Apple Git-122.3)

#60

Michael Banck

michael.banck@credativ.de

about 5 years ago

In reply to: Daniel Gustafsson (#59)

Re: Online checksums patch - once again

Hi,

On Tue, Jan 05, 2021 at 12:18:07AM +0100, Daniel Gustafsson wrote:

On 3 Dec 2020, at 10:37, Daniel Gustafsson <daniel@yesql.se> wrote:
I've also done some tweaks to the tests to make them more robust as well as
comment updates and general tidying up here and there.

Attached is a rebase of the patch on top of current HEAD.

I only looked through the documentation this time, and have one
suggestion:

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
</para>
</listitem>
</varlistentry>

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..5dcfcdd2ff 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
</para>
</sect1>

+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>

I think the above is rather informative about checksums in general and
not specific to online activation of checksusm, so could pretty much be
committed verbatim right now, except for the "either as an offline
operation or in a running cluster" bit which would have to be rewritten.

+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
<sect1 id="wal-intro">
<title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>

This as well.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#61

Michael Banck

michael.banck@credativ.de

about 5 years ago

In reply to: Daniel Gustafsson (#59)

Re: Online checksums patch - once again

Hi,

On Tue, Jan 05, 2021 at 12:18:07AM +0100, Daniel Gustafsson wrote:

Attached is a rebase of the patch on top of current HEAD.

cheers ./daniel

Some more comments/questions:

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..06c001f8ff 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* and dirtied.
*
* If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.

That sounds like it has nothing to do with online (de)activation of
checksums?

*/
XLogRecPtr
log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
XLogRegisterBuffer(0, vm_buffer, 0);

flags = REGBUF_STANDARD;
+ HOLD_INTERRUPTS();
if (!XLogHintBitIsNeeded())
flags |= REGBUF_NO_IMAGE;
XLogRegisterBuffer(1, heap_buffer, flags);

recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+ RESUME_INTERRUPTS();

This could maybe do with a comment on why the HOLD/RESUME_INTERRUPTS()
is required here, similar as is done in bufpage.c further down.

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
#include "access/xlog.h"
#include "access/xlog_internal.h"
#include "catalog/pg_control.h"
+#include "storage/bufpage.h"
#include "utils/guc.h"
#include "utils/timestamp.h"

@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
timestamptz_to_str(xlrec.end_time));
}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");

We probably discussed this earlier, but what was the conclusion?
PG_DATA_CHECKSUM_VERSION = 1 sounds like somebody thought it a good idea
to not just have a bool here but they probably rather thought about
different checkumming/hashing algorithms/implentations than about
internal state of the checksumming machinery.

If we decide on v2 of data page checksums, how would that look like?

PG_DATA_CHECKSUM_VERSION_V2 = 4?

If we think we're done with versions, did we consider removing the
VERSION here, because it is really confusing in
PG_DATA_CHECKSUM_INPROGRESS_ON/OFF_VERSION, like
PG_DATA_CHECKSUM_STATE_ON/INPROGRESS_ON/OFF? Or would removing "version"
also from pg_controldata be a backwards-incompatible change we don't
want to do?

Sorry again, I think we discussed it earlier, but maybe at least some
comments about what VERSION is supposed to be/mean in bufpage.h would be
in order:

--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION		4
#define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede93ad7fd..01eca900ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1089,7 +1102,7 @@ XLogInsertRecord(XLogRecData *rdata,
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Indent, but I guess this will be indented by pg_indent after the fact
andyway? Or is this how it's supposed to look?

@@ -4938,13 +4949,299 @@ GetMockAuthenticationNonce(void)
}

/*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?

The second "," looks odd and could be ommitted?.

+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.

I had to google "TOCTOU" and this acronym doesn't appear elsewhere in
the source tree, so I suggest to spell it out at least here (there's one
more occurance of it in this patch)

+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress" state for enabling checksums is used when the checksum worker
+ * is setting checksums on all pages, it can thus be used to check for aborted
+ * checksum processing which need to be restarted.

The first "inprogress" looks legit as it talks about both states, but
the second one should be "inprogress-on" I think and ...

+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress" state for disabling checksums is used for when the worker
+ * resets the catalog state. Operations should use DataChecksumsNeedVerify()
+ * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.

... "inprogress-off" here.

+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+       Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+       LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}

Bikeshed alert: maybe alo those Absorb*Barrier functions could be lumped
together after the SetDataChecksums*() functions. If not, a function
comment would be in order.

+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}

This function doesn't have the customary function header comment nor any
comments in the body and it looks like it's doing some pretty important
stuff, so I think some comments would be in order, e.g.
explaining that "data_checksum_version != 0" means we've already got
checksums enabled or are in the process of enabling/disabling them.

The corresponding SetDataChecksumsOff() function has comments that could
be duplicated here. If anything, it could be moved below
SetDataChecksumsOff() so that the reader potentially already went
through the comments in the other similar functions.

+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" and the second one to "on".
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".

This should proabably go above SetDataChecksumsOnInProgress() because
even though I've just reviewed this, I looked at the following function
body and wondered where the second barrier went...

+ */
+void
+SetDataChecksumsOn(void)
{
+	uint64		barrier;
+
Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * inprogress-off state during which backends continue to write checksums

The inprogress-off is in " " everywhere else.

+	 * without verifying them. When all backends are in "inprogress-off" the
+	 * next transition to "off" can be performed, after which all data checksum
+	 * processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * and we can transition directly to "off" from there.

Can you explain that a bit more? Is "inprogress-on" a typo for
"inprogress-off", or do you really mean that we can just switch off
checksums during "inprogress-on"? If so, the rationale should be
explained a bit more.

@@ -7929,6 +8226,32 @@ StartupXLOG(void)
*/
CompleteCommitTsInitialization();

+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we

I think this could rephrased to "If we reach this point with checksums
being enabled, we notify..." because the disable case is different and
handled in the following block.

+	 * cannot launch a dynamic background worker directly from here, it has to
+	 * be launched from a regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
/*
* All done with end-of-recovery actions.
*

index 21f2240ade..a5e715f19c 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
/* relpartbound is set by updating this tuple, if necessary */
nulls[Anum_pg_class_relpartbound - 1] = true;

+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);

/* finally insert the new tuple, update the indexes, and clean up */
CatalogTupleInsert(pg_class_desc, tup);
+ RESUME_INTERRUPTS();

heap_freetuple(tup);

Maybe add a comment here why we now HOLD/RESUME_INTERRUPTS.

diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5d94db95f9
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1527 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all datapages

It's "data page" with a space in between everywhere else.

+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all

Linewrap.

+ * currently connected backends have the local state "enabled"

+ * Synchronizing the state change is done with procsignal barriers, where the
+ * backend updating the global state in the controlfile will wait for all other
+ * backends to absorb the barrier before WAL logging. Barrier absorption will
+ * happen during interrupt processing, which means that connected backends will
+ * change state at different times.
+ *
+ *   When Enabling Data Checksums
+ *	 ----------------------------

There's something off with the indentation of either the title or the
line seperator here.

+ *	 A process which fails to observe data checksums being enabled can induce
+ *	 two types of errors: failing to write the checksum when modifying the page
+ *	 and failing to validate the data checksum on the page when reading it.
+ *
+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:

"enables"

+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state

s/on/in/

Also, given that "When the DataChecksumsWorker has finished writing
checksums on all pages and enable[s] data checksums cluster-wide",
shouldn't that mean that all other backends are either in "on" or
"inprogress-on" state, because the Bd -> Bi transition happened during a
previous barrier? Maybe that should be first explained?

+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will

"Any backend starting while Bg is waiting for the barrier" right?

+ *   All sets are compatible while still operating based on
+ *   their local state.

Whoa, you lost me there.

+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and

Can you rephrase what you mean with "being disabled"? If you mean we're
in the "inprogress-off" state, then why is that an error? Do you mean
"writing *no* checksum" because AIUI we should still write checksums at
this point? Or are you talking about a different state?

+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state

s/Backands/Backends/

+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state

I suggest using a different symbol here for "inprogress-off" in order
not to confuse the two (different right?) Bi.

+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that

s/writes/write/

+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.

... "anymore" maybe.

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		target;

This "target" bool isn't documented very well (at least here). AIUI,
it's true if we enable checksums and false if we disable them?

+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)

After reading through the function I find this 'bool enable_checksums'
a bit confusing, I think something like 'int operation' and then
comparing it to either ENABLE or DISABLE or whatever would make the code
more readable, but it's a minor nitpick.

+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the
+	 * process of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+

extra newline

+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->target)

Would "if (DatachecksumsWorkerShmem->target == enable_checksums)" maybe
be better, or does that change the meaning?

+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{

That's a slightly weird comment placement, maybe put it below the '{' so
that that the '} else {' is kept intact?

+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;

This one is especially confusing as it looks like we're unconditionally
enabling checksums but instead we're just setting the target based on
the bool (see above). It might be our standard notation though.

[...]

+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)

+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO

Any idea on how? Or is that for a future feature and can just be documented
for now? In any case, one can just run pg_disable_checksums() again as
mentioned elsewhere so maybe just rephrase the comment to say the admin
needs to do that?

(skipped the rest of this file for now)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 59b3f4b135..ce933a9d08 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -605,6 +605,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
if (MyBackendId > MaxBackends || MyBackendId <= 0)
elog(FATAL, "bad backend ID: %d", MyBackendId);

+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();

This just sets LocalDataChecksumVersion for now, is it expected to cache
other ControlData values in the future? Maybe clarifying the current
state in the comment would be in order.

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 2779da8a69..d583a052e2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -498,6 +500,17 @@ static struct config_enum_entry shared_memory_options[] = {
{NULL, 0, false}
};

+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
/*
* Options for enum values stored in other modules
*/
@@ -607,7 +620,7 @@ static int	max_identifier_length;
static int	block_size;
static int	segment_size;
static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums_tmp;

Why the _tmp, is that required for enum GUCs?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#62

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Michael Banck (#61)

Re: Online checksums patch - once again

On Tue, Jan 05, 2021 at 09:29:31PM +0100, Michael Banck wrote:

@@ -4938,13 +4949,299 @@ GetMockAuthenticationNonce(void)
/*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Are checksums enabled, or in the process of being enabled, for data pages?

The second "," looks odd and could be ommitted?.

Maybe write:

Show quoted text

+ * Are checksums on data pages enabled, or in the process of being enabled ?

#63

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Michael Banck (#61)

1 attachment(s)

Re: Online checksums patch - once again

On 5 Jan 2021, at 21:29, Michael Banck <michael.banck@credativ.de> wrote:
On Tue, Jan 05, 2021 at 12:18:07AM +0100, Daniel Gustafsson wrote:

Attached is a rebase of the patch on top of current HEAD.

Some more comments/questions:

Thanks for reviewing, much appreciated!

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..06c001f8ff 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
* and dirtied.
*
* If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.

That sounds like it has nothing to do with online (de)activation of
checksums?

Right, it's a fix which is independent of this which could be broken out into a
separate docs patch along with the suggestions in your previous mail.

*/
XLogRecPtr
log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,13 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
XLogRegisterBuffer(0, vm_buffer, 0);

flags = REGBUF_STANDARD;
+ HOLD_INTERRUPTS();
if (!XLogHintBitIsNeeded())
flags |= REGBUF_NO_IMAGE;
XLogRegisterBuffer(1, heap_buffer, flags);

recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+ RESUME_INTERRUPTS();

This could maybe do with a comment on why the HOLD/RESUME_INTERRUPTS()
is required here, similar as is done in bufpage.c further down.

Fixed.

diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
#include "access/xlog.h"
#include "access/xlog_internal.h"
#include "catalog/pg_control.h"
+#include "storage/bufpage.h"
#include "utils/guc.h"
#include "utils/timestamp.h"

@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
timestamptz_to_str(xlrec.end_time));
}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");

Commit 443951748ce4c94b001877c7cf88b0ee969c79e7 explicitly moved from a bool to
be able to handle changes to the checksum field. I don't recall it being
discussed in the context of this patch.

If we decide on v2 of data page checksums, how would that look like?

PG_DATA_CHECKSUM_VERSION_V2 = 4?

Something like that yes.

If we think we're done with versions, did we consider removing the
VERSION here, because it is really confusing in
PG_DATA_CHECKSUM_INPROGRESS_ON/OFF_VERSION, like
PG_DATA_CHECKSUM_STATE_ON/INPROGRESS_ON/OFF? Or would removing "version"
also from pg_controldata be a backwards-incompatible change we don't
want to do?

We could rename the _INPROGRESS states to not have the _VERSION suffix, but
then we'd end up in a discussion around what a checksum version is, and we'd be
assigning something not named _VERSION to a version field . I think the
current state of the patch is the least surprising.

Sorry again, I think we discussed it earlier, but maybe at least some
comments about what VERSION is supposed to be/mean in bufpage.h would be
in order:

That I don't disagree with, but it can be done separately from this work. I
didn't chase down the thread which led to 443951748ce4c94b but I assume the
discussion there would be useful for documenting this.

-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
Indent, but I guess this will be indented by pg_indent after the fact
andyway? Or is this how it's supposed to look?

The patch has been through pgindent so I think it's by design.

+ * Are checksums enabled, or in the process of being enabled, for data pages?

The second "," looks odd and could be ommitted?.

I think that was correct, but not being a native speaker I might be wrong.
Either way I've reworded the comment to make it clearer since it on the whole
seemed a bit cobbled together.

+ * to keep the critical section short. This is in order to protect against
+ * TOCTOU situations around checksum validation.
I had to google "TOCTOU" and this acronym doesn't appear elsewhere in
the source tree, so I suggest to spell it out at least here (there's one
more occurance of it in this patch)

Fair enough, both places fixed. The comment in RelationBuildLocalRelation was
also reworded a bit to try and make a tad clearer.

+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress" state for enabling checksums is used when the checksum worker
+ * is setting checksums on all pages, it can thus be used to check for aborted
+ * checksum processing which need to be restarted.

The first "inprogress" looks legit as it talks about both states, but
the second one should be "inprogress-on" I think and ...

The sentence was as intended, but I agree that spelling out the state name is
better. Fixed.

+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress" state for disabling checksums is used for when the worker
+ * resets the catalog state. Operations should use DataChecksumsNeedVerify()
+ * or DataChecksumsNeedWrite() for deciding whether to read/write checksums.

... "inprogress-off" here.

Same as the above, but also fixed (with some additional wordsmithing).

+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+       Assert(LocalDataChecksumVersion == 0 || LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+       LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
Bikeshed alert: maybe alo those Absorb*Barrier functions could be lumped
together after the SetDataChecksums*() functions. If not, a function
comment would be in order.

I moved all the Absorb* functions into a single place since they very similar.

+void
+SetDataChecksumsOnInProgress(void)

This function doesn't have the customary function header comment nor any
comments in the body and it looks like it's doing some pretty important
stuff, so I think some comments would be in order, e.g.
explaining that "data_checksum_version != 0" means we've already got
checksums enabled or are in the process of enabling/disabling them.

That was indeed sloppy, fixed.

+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" and the second one to "on".
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
This should proabably go above SetDataChecksumsOnInProgress() because
even though I've just reviewed this, I looked at the following function
body and wondered where the second barrier went...

I kept the ordering but reworded to comment to make it clearer.

+ * inprogress-off state during which backends continue to write checksums

The inprogress-off is in " " everywhere else.

Fixed.

+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * and we can transition directly to "off" from there.
Can you explain that a bit more? Is "inprogress-on" a typo for
"inprogress-off", or do you really mean that we can just switch off
checksums during "inprogress-on"? If so, the rationale should be
explained a bit more.

The reason why we need "inprogress-off", where checksums are written but not
verified, is to ensure that backends still in the "on" state have checksums
updated to not incur verification failures on concurrent writes.

If the state is "inprogress-on" then checksums are being written but not
verified, so we can transition directly to "off" as there are no backends
verifying data checksums so there is to need to keep writing them.

@@ -7929,6 +8226,32 @@ StartupXLOG(void)
*/
CompleteCommitTsInitialization();
+	/*
+	 * If we reach this point with checksums in progress state (either being
+	 * enabled or being disabled), we notify the user that they need to
+	 * manually restart the process to enable checksums. This is because we
I think this could rephrased to "If we reach this point with checksums
being enabled, we notify..." because the disable case is different and
handled in the following block.

Correct, that sentence was a leftover from a previous version of this codepath,
fixed.

@@ -965,10 +965,13 @@ InsertPgClassTuple(Relation pg_class_desc,
/* relpartbound is set by updating this tuple, if necessary */
nulls[Anum_pg_class_relpartbound - 1] = true;
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
/* finally insert the new tuple, update the indexes, and clean up */
CatalogTupleInsert(pg_class_desc, tup);
+ RESUME_INTERRUPTS();

heap_freetuple(tup);
Maybe add a comment here why we now HOLD/RESUME_INTERRUPTS.

Fixed.

+ * a lot of WAL as the entire database is read and written. Once all datapages

It's "data page" with a space in between everywhere else.

Fixed.

+ *		- Backends SHALL NOT violate local datachecksum state
+ *		- Data checksums SHALL NOT be considered enabled cluster-wide until all

Linewrap.

Not sure what you mean here, sentence is too long to fit on one line.

+ * When Enabling Data Checksums
+ * ----------------------------

There's something off with the indentation of either the title or the
line seperator here.

There was a mix of tab and space, fixed to consistently use spaces for
indentation here.

+ *   When the DataChecksumsWorker has finished writing checksums on all pages
+ *   and enable data checksums cluster-wide, there are three sets of backends:

"enables"

Fixed.

+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends on "off" state
s/on/in/

Also, given that "When the DataChecksumsWorker has finished writing
checksums on all pages and enable[s] data checksums cluster-wide",
shouldn't that mean that all other backends are either in "on" or
"inprogress-on" state, because the Bd -> Bi transition happened during a
previous barrier? Maybe that should be first explained?

Right, I've reworded this and wordsmithed a bit to make it a bit less
convoluted.

+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends transition from the Bd state to Be like so: Bd -> Bi -> Be
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting will observe the global state being "on" and will

"Any backend starting while Bg is waiting for the barrier" right?

Correct, fixed.

+ *   All sets are compatible while still operating based on
+ *   their local state.

Whoa, you lost me there.

What I meant was that Bi and Be are compatible as they both satisfy the
requirement of each other (both write checksums), so they can concurrently
exist in a cluster without false negatives occurring. Reworded.

+ *	 When Disabling Data Checksums
+ *	 -----------------------------
+ *	 A process which fails to observe data checksums being disabled can induce
+ *	 two types of errors: writing the checksum when modifying the page and
Can you rephrase what you mean with "being disabled"? If you mean we're
in the "inprogress-off" state, then why is that an error? Do you mean
"writing *no* checksum" because AIUI we should still write checksums at
this point? Or are you talking about a different state?

This was referring to the "off" state, but it wasn't terribly clear. I've
reworded that part.

+ *	 validating a data checksum which is no longer correct due to modifications
+ *	 to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backands in "off" state

s/Backands/Backends/

Fixed.

+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-off" state
I suggest using a different symbol here for "inprogress-off" in order
not to confuse the two (different right?) Bi.

Good point, fixed.

+ *   Backends transition from the Be state to Bd like so: Be -> Bi -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bi writes data checksums, but don't validate them, such that

s/writes/write/

Fixed.

+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bi. Once all backends are in Bi, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation.

... "anymore" maybe.

Makes sense, fixed.

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

Copyright year bump, there's two other occurances.

Fixed.

+ bool target;

This "target" bool isn't documented very well (at least here). AIUI,
it's true if we enable checksums and false if we disable them?

Correct, and I agree that the name is quite poor. I renamed to
"enable_checksums" and added a comment.

+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
After reading through the function I find this 'bool enable_checksums'
a bit confusing, I think something like 'int operation' and then
comparing it to either ENABLE or DISABLE or whatever would make the code
more readable, but it's a minor nitpick.

Maybe, I personally find enable_checksums to be self-explanatory but I'm
clearly biased. With the small change of renaming "target", do you still think
it should be changed?

+		 * running launcher.
+		 */
+

extra newline

Fixed.

+ if (DatachecksumsWorkerShmem->target)

Would "if (DatachecksumsWorkerShmem->target == enable_checksums)" maybe
be better, or does that change the meaning?

It would change its meaning as it would always be true. Since renaming target
to enable_checksums I think this condition is quite clear though.

+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
That's a slightly weird comment placement, maybe put it below the '{' so
that that the '} else {' is kept intact?

This style is used in a number of places in the patch, and it's used around the
tree as well. If it's a common concern/complaint then I'm happy to change them
all.

+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->target = enable_checksums;
This one is especially confusing as it looks like we're unconditionally
enabling checksums but instead we're just setting the target based on
the bool (see above). It might be our standard notation though.

After the rename of the variable to enable_checkums it reads pretty clear that
this operation is caching the passed in value. What do you think about the
codepath now?

+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * need to be handled in some way. TODO

I've reworded the comment to be a nice to have rather than a need to have,
since stale flags have no practical implication. Regarding how to address it
I'm not really sure what the cleanest way would be, and I've deliberately held
off on it since it's mostly cosmetical and this patch is complicated enough as
it is.

+	/*
+	 * Set up local cache of Controldata values.
+	 */
+	InitLocalControldata();
This just sets LocalDataChecksumVersion for now, is it expected to cache
other ControlData values in the future? Maybe clarifying the current
state in the comment would be in order.

I'm a bit hesitant to write in the comment that it's only the data checksums
state right now, since such a comment is almost guaranteed to be missed and
become stale when/if InitLocalControldata gains more capabilities. Instead
I've added a function comment in xlog.c on InitLocalControldata.

-static bool data_checksums;
+static int	data_checksums_tmp;
Why the _tmp, is that required for enum GUCs?

That is a missed leftover from an earlier version, removed.

Attached is a rebase with the above fixes, thanks for review!

cheers ./daniel

Attachments:

online_checksums27.patchapplication/octet-stream; name=online_checksums27.patch; x-unix-mode=0644Download

From 855126940f45aaee0e8015f3a4861ec48ff668ce Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 25 Nov 2020 14:12:12 +0100
Subject: [PATCH] Support checksum enable/disable in running cluster v24

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Further description of the process TBW once the dust settles around
this.

Daniel Gustafsson, Magnus Hagander
---
 doc/src/sgml/amcheck.sgml                    |    2 +-
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/initdb.sgml                 |    1 +
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   97 ++
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  428 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1541 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 51 files changed, 2885 insertions(+), 75 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 8dfb01a77b..5be0a0b9cf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..a81878369c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02a37658ad..307e37acd1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25800,6 +25800,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 43fe8ae383..2acedd14e8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3699,8 +3699,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3710,8 +3709,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..5dcfcdd2ff 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>On-line Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+    Disabling of checksums takes effect immediately when the function is called.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..4269200b0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would later the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede93ad7fd..0b689c543e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -49,6 +50,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -252,6 +254,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -903,6 +915,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1075,8 +1088,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1089,7 +1102,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4902,9 +4915,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4938,13 +4949,346 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled. In case data checksums are currently being enabled we must write
+ * the checksum even though it's not verified during this stage. Interrupts
+ * need to be held off by the caller to ensure that the returned state is
+ * valid for the duration of the intended processing.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
 {
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * In order to start the process of enabling data checksums in a running
+ * cluster the data_checksum_version state must be changed to "inprogress-on".
+ * This state requires data checksums to be written but not verified. The state
+ * transition is performed in a critical section in order to provide crash
+ * safety, and checkpoints are held off. When the emitted procsignalbarrier
+ * has been absorbed by all backends we know that the cluster has started to
+ * enable data checksums.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7929,6 +8273,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9860,6 +10230,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10315,6 +10703,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 21f2240ade..09936278ba 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..cd49cb8403 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1257,6 +1257,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5c86ea5e4f
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1541 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..96c814a91c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3937,6 +3937,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8f2c482bc8..c14234f1a5 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2932,8 +2932,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..c5d9d3d846 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5c12a165a1..358cb9f1f8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index daf9c127cd..5f378a1631 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1902,17 +1915,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4822,6 +4824,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..28b22db7fb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 9585ad17b3..356ecdab61 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..e5163b2f3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11240,6 +11240,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..ad4df0028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..466fb41521
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..0f44512f83
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
-- 
2.21.1 (Apple Git-122.3)

#64

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Michael Banck (#60)

Re: Online checksums patch - once again

On 5 Jan 2021, at 18:19, Michael Banck <michael.banck@credativ.de> wrote:

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
</para>
</listitem>
</varlistentry>

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..5dcfcdd2ff 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
</para>
</sect1>

+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>

That might indeedbe useful regardless of the state of this patch. Robert and
Heikki: being committers who have both reviewed recent versions of the patch,
would you prefer these sections be broken out into a separate patch?

cheers ./daniel

#65

Michael Banck

michael.banck@credativ.de

about 5 years ago

In reply to: Daniel Gustafsson (#64)

Re: Online checksums patch - once again

Hi,

On Thu, Jan 07, 2021 at 03:05:44PM +0100, Daniel Gustafsson wrote:

On 5 Jan 2021, at 18:19, Michael Banck <michael.banck@credativ.de> wrote:
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster. 

This would also have to be rewritten to clarify that it "can optionally
be enabled during cluster creation or when the cluster is offline later
on" I'd say.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#66

Magnus Hagander

magnus@hagander.net

about 5 years ago

In reply to: Daniel Gustafsson (#64)

Re: Online checksums patch - once again

On Thu, Jan 7, 2021 at 3:05 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 5 Jan 2021, at 18:19, Michael Banck <michael.banck@credativ.de> wrote:

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
failures will be reported in the
<link linkend="monitoring-pg-stat-database-view">
<structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
</para>
</listitem>
</varlistentry>

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..5dcfcdd2ff 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,103 @@
</para>
</sect1>

+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time, either as an offline
+   operation or in a running cluster. In all cases, checksums are enabled or
+   disabled at the full cluster level, and cannot be specified individually for
+   databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>

I think it would be ;)

Obviously in that case adapted so that it has to be changed offline,
and the patch would have to change that to say it can be changed
online.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#67

Daniel Gustafsson

daniel@yesql.se

about 5 years ago

In reply to: Magnus Hagander (#66)

2 attachment(s)

Re: Online checksums patch - once again

On 7 Jan 2021, at 16:25, Magnus Hagander <magnus@hagander.net> wrote:

I think it would be ;)

Obviously in that case adapted so that it has to be changed offline,
and the patch would have to change that to say it can be changed
online.

Attached is a v28 which has the docs portion separated out into 0001 with 0002
changing the docs in 0001 to mention the online operation.

cheers ./daniel

Attachments:

v28-0001-Add-documentation-about-data-page-checksums.patchapplication/octet-stream; name=v28-0001-Add-documentation-about-data-page-checksums.patch; x-unix-mode=0644Download

From 0929dc1c14e1c26ac1f0c75fd7ec97fc499488a6 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 11 Jan 2021 23:46:58 +0100
Subject: [PATCH v28 1/2] Add documentation about data page checksums

Data page checksums did not have a longer discussion in the docs,
this adds a sort of stub section with an overview which can be
expanded upon.
---
 doc/src/sgml/amcheck.sgml    |  2 +-
 doc/src/sgml/ref/initdb.sgml |  1 +
 doc/src/sgml/wal.sgml        | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 8dfb01a77b..5be0a0b9cf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..c359194df7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,53 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time as an offline
+   operation. Data page checksums are enabled or disabled at the full cluster
+   level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
-- 
2.21.1 (Apple Git-122.3)

v28-0002-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v28-0002-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 7b4545329f01522f43d659c29f5a003dd44c025e Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 11 Jan 2021 23:49:54 +0100
Subject: [PATCH v28 2/2] Support checksum enable/disable in a running cluster
 v28

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  428 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1541 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 49 files changed, 2839 insertions(+), 78 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..a81878369c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02a37658ad..307e37acd1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25800,6 +25800,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3cdb1aff3c..66b092dcd4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3699,8 +3699,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3710,8 +3709,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index c359194df7..ac62f57517 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data page checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   page checksums are enabled or disabled at the full cluster level, and
+   cannot be specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..4269200b0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would later the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b18257c198..848bccfbf7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1076,8 +1089,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1090,7 +1103,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4918,9 +4931,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4954,13 +4965,346 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled. In case data checksums are currently being enabled we must write
+ * the checksum even though it's not verified during this stage. Interrupts
+ * need to be held off by the caller to ensure that the returned state is
+ * valid for the duration of the intended processing.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
 {
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * In order to start the process of enabling data checksums in a running
+ * cluster the data_checksum_version state must be changed to "inprogress-on".
+ * This state requires data checksums to be written but not verified. The state
+ * transition is performed in a critical section in order to provide crash
+ * safety, and checkpoints are held off. When the emitted procsignalbarrier
+ * has been absorbed by all backends we know that the cluster has started to
+ * enable data checksums.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7945,6 +8289,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9876,6 +10246,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10331,6 +10719,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 21f2240ade..09936278ba 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..cd49cb8403 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1257,6 +1257,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5c86ea5e4f
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1541 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..96c814a91c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3937,6 +3937,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 71b5852224..d3d7e7d5d1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2932,8 +2932,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..c5d9d3d846 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5c12a165a1..358cb9f1f8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..28b22db7fb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 9585ad17b3..356ecdab61 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..e5163b2f3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11240,6 +11240,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..ad4df0028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..466fb41521
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..0f44512f83
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
-- 
2.21.1 (Apple Git-122.3)

#68

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Daniel Gustafsson (#67)

2 attachment(s)

Re: Online checksums patch - once again

On 12 Jan 2021, at 00:07, Daniel Gustafsson <daniel@yesql.se> wrote:

On 7 Jan 2021, at 16:25, Magnus Hagander <magnus@hagander.net> wrote:

I think it would be ;)

Obviously in that case adapted so that it has to be changed offline,
and the patch would have to change that to say it can be changed
online.

Attached is a v28 which has the docs portion separated out into 0001 with 0002
changing the docs in 0001 to mention the online operation.

Attached is v29 which adds a small test for running pg_checksums on a cluster
turned off during a checksum enable operation (as well as some minor word-
smithing in test comments).

cheers ./daniel

Attachments:

v29-0001-Add-documentation-about-data-page-checksums.patchapplication/octet-stream; name=v29-0001-Add-documentation-about-data-page-checksums.patch; x-unix-mode=0644Download

From f0c6806521a46aaaef91d1be466840a2c4582a33 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 11 Jan 2021 23:46:58 +0100
Subject: [PATCH v29 1/2] Add documentation about data page checksums

Data page checksums did not have a longer discussion in the docs,
this adds a sort of stub section with an overview which can be
expanded upon.
---
 doc/src/sgml/amcheck.sgml    |  2 +-
 doc/src/sgml/ref/initdb.sgml |  1 +
 doc/src/sgml/wal.sgml        | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 8dfb01a77b..5be0a0b9cf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..c359194df7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,53 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time as an offline
+   operation. Data page checksums are enabled or disabled at the full cluster
+   level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
-- 
2.21.1 (Apple Git-122.3)

v29-0002-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v29-0002-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 7c4eca28521e40bcd79a644876b90d5a28f1707a Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 11 Jan 2021 23:49:54 +0100
Subject: [PATCH v29 2/2] Support checksum enable/disable in a running cluster
 v29

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  428 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1541 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   46 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 src/test/checksum/t/004_offline.pl           |  100 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 2975 insertions(+), 78 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..a81878369c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 02a37658ad..307e37acd1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25800,6 +25800,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3cdb1aff3c..66b092dcd4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3699,8 +3699,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3710,8 +3709,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index c359194df7..ac62f57517 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data page checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   page checksums are enabled or disabled at the full cluster level, and
+   cannot be specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..4269200b0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7284,7 +7284,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7305,11 +7305,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would later the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b18257c198..848bccfbf7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1076,8 +1089,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1090,7 +1103,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4918,9 +4931,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4954,13 +4965,346 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled. In case data checksums are currently being enabled we must write
+ * the checksum even though it's not verified during this stage. Interrupts
+ * need to be held off by the caller to ensure that the returned state is
+ * valid for the duration of the intended processing.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
 {
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * In order to start the process of enabling data checksums in a running
+ * cluster the data_checksum_version state must be changed to "inprogress-on".
+ * This state requires data checksums to be written but not verified. The state
+ * transition is performed in a critical section in order to provide crash
+ * safety, and checkpoints are held off. When the emitted procsignalbarrier
+ * has been absorbed by all backends we know that the cluster has started to
+ * enable data checksums.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7945,6 +8289,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9876,6 +10246,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10331,6 +10719,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 21f2240ade..09936278ba 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -965,10 +965,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..cd49cb8403 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1257,6 +1257,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5c86ea5e4f
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1541 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..96c814a91c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3937,6 +3937,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index c192c2e35b..f293635332 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..c5d9d3d846 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,11 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
+
+static void ProcessBarrierChecksumOnInProgress(void);
+static void ProcessBarrierChecksumOffInProgress(void);
+static void ProcessBarrierChecksumOn(void);
+static void ProcessBarrierChecksumOff(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +500,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		ProcessBarrierChecksumOnInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		ProcessBarrierChecksumOn();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		ProcessBarrierChecksumOffInProgress();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		ProcessBarrierChecksumOff();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -509,16 +520,27 @@ ProcessProcSignalBarrier(void)
 }
 
 static void
-ProcessBarrierPlaceholder(void)
+ProcessBarrierChecksumOn(void)
 {
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
+	AbsorbChecksumsOnBarrier();
+}
+
+static void
+ProcessBarrierChecksumOff(void)
+{
+	AbsorbChecksumsOffBarrier();
+}
+
+static void
+ProcessBarrierChecksumOnInProgress(void)
+{
+	AbsorbChecksumsOnInProgressBarrier();
+}
+
+static void
+ProcessBarrierChecksumOffInProgress(void)
+{
+	AbsorbChecksumsOffInProgressBarrier();
 }
 
 /*
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5c12a165a1..358cb9f1f8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..28b22db7fb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 9585ad17b3..356ecdab61 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..e5163b2f3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11240,6 +11240,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..ad4df0028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..466fb41521
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..57384a452c
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..28f6208a63
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,100 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..61b4571e9d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#69

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Daniel Gustafsson (#68)

2 attachment(s)

Re: Online checksums patch - once again

The attached v30 adds the proposed optimizations in this thread as previously
asked for, as well as some small cleanups to the procsignal calling codepath
(replacing single call functions with just calling the function) and some
function comments which were missing.

cheers ./daniel

Attachments:

v30-0001-Add-documentation-about-data-page-checksums.patchapplication/octet-stream; name=v30-0001-Add-documentation-about-data-page-checksums.patch; x-unix-mode=0644Download

From 507d04c6cabcc129bbc197e31e0af565ba6a26c1 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 11 Jan 2021 23:46:58 +0100
Subject: [PATCH v30 1/2] Add documentation about data page checksums

Data page checksums did not have a longer discussion in the docs,
this adds a sort of stub section with an overview which can be
expanded upon.
---
 doc/src/sgml/amcheck.sgml    |  2 +-
 doc/src/sgml/ref/initdb.sgml |  1 +
 doc/src/sgml/wal.sgml        | 47 ++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 8dfb01a77b..5be0a0b9cf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -497,7 +497,7 @@ SET client_min_messages = DEBUG1;
   Structural corruption can happen due to faulty storage hardware, or
   relation files being overwritten or modified by unrelated software.
   This kind of corruption can also be detected with
-  <link linkend="app-initdb-data-checksums"><application>data page
+  <link linkend="checksums"><application>data page
   checksums</application></link>.
  </para>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 385ac25150..e3b0048806 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -219,6 +219,7 @@ PostgreSQL documentation
         failures will be reported in the
         <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link> view.
+        See <xref linkend="checksums" /> for details.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index f4bc147b10..c359194df7 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -230,6 +230,53 @@
   </para>
  </sect1>
 
+ <sect1 id="checksums">
+  <title>Data Checksums</title>
+  <indexterm>
+   <primary>checksums</primary>
+  </indexterm>
+
+  <para>
+   Data pages are not checksum protected by default, but this can optionally be
+   enabled for a cluster.  When enabled, each data page will be assigned a
+   checksum that is updated when the page is written and verified every time
+   the page is read. Only data pages are protected by checksums, internal data
+   structures and temporary files are not.
+  </para>
+
+  <para>
+   Checksums are normally enabled when the cluster is initialized using <link
+   linkend="app-initdb-data-checksums"><application>initdb</application></link>.
+   They can also be enabled or disabled at a later time as an offline
+   operation. Data page checksums are enabled or disabled at the full cluster
+   level, and cannot be specified individually for databases or tables.
+  </para>
+
+  <para>
+   The current state of checksums in the cluster can be verified by viewing the
+   value of the read-only configuration variable <xref
+   linkend="guc-data-checksums" /> by issuing the command <command>SHOW
+   data_checksums</command>.
+  </para>
+
+  <para>
+   When attempting to recover from corrupt data it may be necessary to bypass
+   the checksum protection in order to recover data. To do this, temporarily
+   set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />.
+  </para>
+
+  <sect2 id="checksums-offline-enable-disable">
+   <title>Off-line Enabling of Checksums</title>
+
+   <para>
+    The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
+    application can be used to enable or disable data checksums, as well as 
+    verify checksums, on an offline cluster.
+   </para>
+
+  </sect2>
+ </sect1>
+
   <sect1 id="wal-intro">
    <title>Write-Ahead Logging (<acronym>WAL</acronym>)</title>
 
-- 
2.21.1 (Apple Git-122.3)

v30-0002-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v30-0002-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 01f474349705729dc835aae9725d610f37ae44d2 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Fri, 15 Jan 2021 11:24:14 +0100
Subject: [PATCH v30 2/2] Support checksum enable/disable in a running cluster
 v30

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  428 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1580 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   25 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 src/test/checksum/t/004_offline.pl           |  100 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 2989 insertions(+), 82 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..4a64d156ad 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fd0370a1b4..ed6b7050e8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25798,6 +25798,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3cdb1aff3c..66b092dcd4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3699,8 +3699,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3710,8 +3709,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index c359194df7..ac62f57517 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data page checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   page checksums are enabled or disabled at the full cluster level, and
+   cannot be specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5b9cfb26cf..371e3e6c73 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7794,7 +7794,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7815,11 +7815,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would later the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 199d911be7..8fc20b24af 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1077,8 +1090,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1091,7 +1104,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4919,9 +4932,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4955,13 +4966,346 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled. In case data checksums are currently being enabled we must write
+ * the checksum even though it's not verified during this stage. Interrupts
+ * need to be held off by the caller to ensure that the returned state is
+ * valid for the duration of the intended processing.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
 {
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * In order to start the process of enabling data checksums in a running
+ * cluster the data_checksum_version state must be changed to "inprogress-on".
+ * This state requires data checksums to be written but not verified. The state
+ * transition is performed in a critical section in order to provide crash
+ * safety, and checkpoints are held off. When the emitted procsignalbarrier
+ * has been absorbed by all backends we know that the cluster has started to
+ * enable data checksums.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+void
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+}
+
+void
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+}
+
+void
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+}
+
+void
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7946,6 +8290,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9877,6 +10247,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10332,6 +10720,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..cd49cb8403 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1257,6 +1257,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..4756ddeb9c
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1580 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..96c814a91c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3937,6 +3937,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..430cf61fa7 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "commands/async.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -92,7 +93,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
-static void ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -495,8 +495,14 @@ ProcessProcSignalBarrier(void)
 	 * unconditionally, but it's more efficient to call only the ones that
 	 * might need us to do something based on the flags.
 	 */
-	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_PLACEHOLDER))
-		ProcessBarrierPlaceholder();
+	if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON))
+		AbsorbChecksumsOnInProgressBarrier();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_ON))
+		AbsorbChecksumsOnBarrier();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF))
+		AbsorbChecksumsOffInProgressBarrier();
+	else if (BARRIER_SHOULD_CHECK(flags, PROCSIGNAL_BARRIER_CHECKSUM_OFF))
+		AbsorbChecksumsOffBarrier();
 
 	/*
 	 * State changes related to all types of barriers that might have been
@@ -508,19 +514,6 @@ ProcessProcSignalBarrier(void)
 	pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierGeneration, shared_gen);
 }
 
-static void
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 */
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5c12a165a1..358cb9f1f8 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..28b22db7fb 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern void AbsorbChecksumsOnInProgressBarrier(void);
+extern void AbsorbChecksumsOffInProgressBarrier(void);
+extern void AbsorbChecksumsOnBarrier(void);
+extern void AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d27336adcd..82a0976d3b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11263,6 +11263,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c38b689710..ad4df0028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -929,6 +929,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..466fb41521
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..57384a452c
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..28f6208a63
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,100 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..61b4571e9d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#70

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Daniel Gustafsson (#69)

Re: Online checksums patch - once again

On Fri, Jan 15, 2021 at 11:33 AM Daniel Gustafsson <daniel@yesql.se> wrote:

The attached v30 adds the proposed optimizations in this thread as previously
asked for, as well as some small cleanups to the procsignal calling codepath
(replacing single call functions with just calling the function) and some
function comments which were missing.

I've applied the docs patch.

I made a tiny change so the reference to "data page checksums" was
changed to "data checksums". Of course, after doing that I realize
that we use both terms in different places, but the docs side mostly
talked about "data checksums". I changed the one reference that was
also in wal.sgml, but left the rest alone.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#71

Michael Banck

michael.banck@credativ.de

almost 5 years ago

In reply to: Daniel Gustafsson (#69)

Re: Online checksums patch - once again

Hi,

On Fri, Jan 15, 2021 at 11:32:56AM +0100, Daniel Gustafsson wrote:

The attached v30 adds the proposed optimizations in this thread as previously
asked for, as well as some small cleanups to the procsignal calling codepath
(replacing single call functions with just calling the function) and some
function comments which were missing.

0002 (now 0001 I guess) needs a rebase due to a3ed4d1e.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mï¿½nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mï¿½nchengladbach
Geschï¿½ftsfï¿½hrung: Dr. Michael Meskes, Jï¿½rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#72

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Michael Banck (#71)

1 attachment(s)

Re: Online checksums patch - once again

On 18 Jan 2021, at 19:14, Michael Banck <michael.banck@credativ.de> wrote:

Hi,

On Fri, Jan 15, 2021 at 11:32:56AM +0100, Daniel Gustafsson wrote:

The attached v30 adds the proposed optimizations in this thread as previously
asked for, as well as some small cleanups to the procsignal calling codepath
(replacing single call functions with just calling the function) and some
function comments which were missing.

0002 (now 0001 I guess) needs a rebase due to a3ed4d1e.

Correct, rebase attached.

cheers ./daniel

Attachments:

v31-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From addd027ba4c5c0502c285d247641b7f6bcb7686d Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 19 Jan 2021 11:36:51 +0100
Subject: [PATCH v31] Support checksum enable/disable in a running cluster v31

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   71 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  432 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1580 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   36 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 src/test/checksum/t/004_offline.pl           |  100 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 2996 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..4a64d156ad 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index aa99665e2e..d5c253d515 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25839,6 +25839,77 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>. Returns <literal>true</literal> if processing
+        was started.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+        Returns <literal>false</literal> in case data checksums are disabled
+        already.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index f05140dd42..cea28737c1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3699,8 +3699,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3710,8 +3709,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index faffbb1865..953901d473 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7847,7 +7847,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7868,11 +7868,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would later the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 470e113b33..c61e989650 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1077,8 +1090,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1091,7 +1104,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4919,9 +4932,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4955,13 +4966,350 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled. In case data checksums are currently being enabled we must write
+ * the checksum even though it's not verified during this stage. Interrupts
+ * need to be held off by the caller to ensure that the returned state is
+ * valid for the duration of the intended processing.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified.
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
+ */
+bool
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * In order to start the process of enabling data checksums in a running
+ * cluster the data_checksum_version state must be changed to "inprogress-on".
+ * This state requires data checksums to be written but not verified. The state
+ * transition is performed in a critical section in order to provide crash
+ * safety, and checkpoints are held off. When the emitted procsignalbarrier
+ * has been absorbed by all backends we know that the cluster has started to
+ * enable data checksums.
+ */
+void
+SetDataChecksumsOnInProgress(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one
+ * sets the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
+ * During "inprogress-on", checksums are written but not verified. When all
+ * existing pages are guaranteed to have checksums, and all new pages will be
+ * initiated with checksums, the state can be changed to "on".
+ */
+void
+SetDataChecksumsOn(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be
+ * emitted.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XlogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7995,6 +8343,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XlogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9926,6 +10300,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XlogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10381,6 +10773,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..cd4dc60800 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_BOOL(true);
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_BOOL(true);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..c6e81eb480 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..4756ddeb9c
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1580 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct * DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, the only operation we can perform
+	 * is to cancel it iff the user requested for checksums to be disabled.
+	 * That doesn't however mean that all other cases yield an error, as some
+	 * might be perfectly benevolent.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started data checksums cannot be on or off, but
+		 * it may be in an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/* This should be unreachable */
+			Assert(false);
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (DataChecksumsOffInProgress())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to query the system
+	 * data checksum state to determine how to proceed based on the requested
+	 * target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else if (!enable_checksums)
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *		Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/* If the launcher isn't started, there is nothing to shut down */
+	if (DatachecksumsWorkerShmem->launcher_started)
+		DatachecksumsWorkerShmem->abort = true;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase * db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				DatachecksumsWorkerShmem->abort = true;
+			aborted = DatachecksumsWorkerShmem->abort;
+
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		/*
+		 * Get a list of all databases to process. This may include databases
+		 * that were created during our runtime.
+		 *
+		 * Since a database can be created as a copy of any other database
+		 * (which may not have existed in our last run), we have to repeat
+		 * this loop until no new databases show up in the list. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		DatabaseList = BuildDatabaseList();
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to.
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+
+	/*
+	 * Before we do this, wait for all pending transactions to finish. This
+	 * will ensure there are no concurrently running CREATE DATABASE, which
+	 * could cause us to miss the creation of a database that was copied
+	 * without checksums.
+	 */
+	WaitForAllTransactionsToFinish();
+
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		SetDataChecksumsOff();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 774292fd94..23eaf9e576 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..9243b3301c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11301,6 +11301,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'bool',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..466fb41521
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,36 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Shutdown the background processes, if any */
+void		ShutdownDatachecksumsWorkerIfRunning(void);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..57384a452c
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..28f6208a63
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,100 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..61b4571e9d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#73

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Daniel Gustafsson (#72)

Re: Online checksums patch - once again

I read through the latest patch,
v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some
comments below:

On 19/01/2021 14:32, Daniel Gustafsson wrote:

+       /*
+        * Hold interrupts for the duration of xlogging to avoid the state of data
+        * checksums changing during the processing which would later the premise
+        * for xlogging hint bits.
+        */

Sentence sense does not make.

@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
TimeLineID prevTLI);
static void LocalSetXLogInsertAllowed(void);

Spelling: make it "XLogChecksums" for consistency.

/*
* DataChecksumsNeedWrite
* Returns whether data checksums must be written or not
*
* Returns true iff data checksums are enabled or are in the process of being
* enabled. In case data checksums are currently being enabled we must write
* the checksum even though it's not verified during this stage. Interrupts
* need to be held off by the caller to ensure that the returned state is
* valid for the duration of the intended processing.
*/
bool
DataChecksumsNeedWrite(void)
{
Assert(InterruptHoldoffCount > 0);
return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
}

Can you be more precise on the "duration of the intended processing"? It
means, until you have actually written out the page, or something like
that, right? The similar explanation in DataChecksumsNeedVerify() is
easier to understand. The two functions are similar, it would be good to
phrase the comments similarly, so that you can quickly see the
difference between the two.

/*
* SetDataChecksumsOnInProgress
* Sets the data checksum state to "inprogress-on" to enable checksums
*
* In order to start the process of enabling data checksums in a running
* cluster the data_checksum_version state must be changed to "inprogress-on".
* This state requires data checksums to be written but not verified. The state
* transition is performed in a critical section in order to provide crash
* safety, and checkpoints are held off. When the emitted procsignalbarrier
* has been absorbed by all backends we know that the cluster has started to
* enable data checksums.
*/

The two "in order" are superfluous, it's more concise to say just "To
start the process ..." (I was made aware of that by Jï¿½rgen Purtz's
recent patches that removed a couple of "in order"s from the docs as
unnecessary).

It's a bit confusing to talk about the critical section and
procsignalbarrier here. Does the caller need to wait for the
procsignalbarrier? No, that's just explaining what the function does
internally. Maybe move that explanation inside the function, and say
here something like "This function blocks until all processes have
acknowledged the state change" or something like that.

/*
* SetDataChecksumsOn
* Enables data checksums cluster-wide
*
* Enabling data checksums is performed using two barriers, the first one
* sets the checksums state to "inprogress-on" (which is performed by
* SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
* During "inprogress-on", checksums are written but not verified. When all
* existing pages are guaranteed to have checksums, and all new pages will be
* initiated with checksums, the state can be changed to "on".
*/

Perhaps the explanation for how these SetDataChecksumsOn() and
SetDataChecksumsOnInProgress() functions work together should be moved
to one place. For example, explain both functions here at
SetDataChecksumsOn(), and just have a "see SetDataChecksumsOn()" in the
other function, or no comment at all if they're kept next to each other.
As it stands, you have to read both comments and piece together the the
big picture in your head. Maybe add a "see datachecksumsworker.c" here,
since there's a longer explanation of the overall mechanism there.

+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+       if (!superuser())
+               ereport(ERROR,
+                               (errmsg("must be superuser")));
+
+       StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+       PG_RETURN_BOOL(true);
+}

The documentation says "Returns <literal>false</literal> in case data
checksums are disabled already", but the function always returns true.
The enable_data_checksum() function also returns a constant 'true'; why
not make it void?

The "has immediate effect" comment seems wrong, given that it actually
launches a worker process.

+       /*
+        * If the launcher is already started, the only operation we can perform
+        * is to cancel it iff the user requested for checksums to be disabled.
+        * That doesn't however mean that all other cases yield an error, as some
+        * might be perfectly benevolent.
+        */

This comment is a bit hard to understand. Maybe something like

"If the launcher is already started, we cannot launch a new one. But if
the user requested for checksums to be disabled, we can cancel it."

+       if (DatachecksumsWorkerShmem->launcher_started)
+       {
+               if (DatachecksumsWorkerShmem->abort)
+               {
+                       ereport(NOTICE,
+                                       (errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }

If it's being aborted, and the user requested to disable checksum, can
we leave out the NOTICE? I guess not, because the worker process won't
check the 'abort' flag, if it's already finished processing all data pages.

+               /*
+                * If the launcher is started data checksums cannot be on or off, but
+                * it may be in an inprogress state. Since the state transition may
+                * not have happened yet (in case of rapidly initiated checksum enable
+                * calls for example) we inspect the target state of the currently
+                * running launcher.
+                */

This comment contradicts itself. If the state transition has not
happened yet, then contrary to the first sentence, data checksums *are*
currently on or off.

+               if (enable_checksums)
+               {
+                       /*
+                        * If we are asked to enable checksums when they are already being
+                        * enabled, there is nothing to do so exit.
+                        */
+                       if (DatachecksumsWorkerShmem->enable_checksums)
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /*
+                        * Disabling checksums is likely to be a very quick operation in
+                        * many cases so trying to abort it to save the checksums would
+                        * run the risk of race conditions.
+                        */
+                       else
+                       {
+                               ereport(NOTICE,
+                                               (errmsg("data checksums are concurrently being disabled, please retry")));
+
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /* This should be unreachable */
+                       Assert(false);
+               }
+               else if (!enable_checksums)
+               {
+                       /*
+                        * Data checksums are already being disabled, exit silently.
+                        */
+                       if (DataChecksumsOffInProgress())
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       DatachecksumsWorkerShmem->abort = true;
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }

The Assert seems unnecessary. The "if (!enable_checkums)" also seems
unnecessary, could be just "else".

+       /*
+        * The launcher is currently not running, so we need to query the system
+        * data checksum state to determine how to proceed based on the requested
+        * target state.
+        */

"query the system" makes me think "checking the catalogs" or similar,
but this just looks at a few variables in shared memory.

+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *             Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+       LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+       /* If the launcher isn't started, there is nothing to shut down */
+       if (DatachecksumsWorkerShmem->launcher_started)
+               DatachecksumsWorkerShmem->abort = true;
+
+       LWLockRelease(DatachecksumsWorkerLock);
+}

This function is unused.

+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+       LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+       DatachecksumsWorkerShmem->abort = true;
+       LWLockRelease(DatachecksumsWorkerLock);
+}

Acquiring an lwlock in signal handler is not safe.

Since this is a signal handler, it might still get called after the
process has finished processing, and has already set
DatachecksumsWorkerShmem->launcher_started = false. That seems like an
unexpected state.

+/*
+ * WaitForAllTransactionsToFinish
+ *             Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */

The caller of this function doesn't check the 'abort' flag AFAICS. I
think you could just use use WL_EXIT_ON_PM_DEATH to error out on
postmaster death.

+       while (true)
+       {
+               int                     processed_databases = 0;
+
+               /*
+                * Get a list of all databases to process. This may include databases
+                * that were created during our runtime.
+                *
+                * Since a database can be created as a copy of any other database
+                * (which may not have existed in our last run), we have to repeat
+                * this loop until no new databases show up in the list. Since we wait
+                * for all pre-existing transactions finish, this way we can be
+                * certain that there are no databases left without checksums.
+                */
+               DatabaseList = BuildDatabaseList();

I think it's unnecessary to wait out the transactions on the first
iteration of this loop. There must be at least one database, so there
will always be at least two iterations.

I think it would be more clear to move the
WaitForAllTransactionsToFinish() from BuildDatabaseList() to the callers.

@@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
relkind == RELKIND_MATVIEW)
RelationInitTableAccessMethod(rel);

+       /*
+        * Set the data checksum state. Since the data checksum state can change at
+        * any time, the fetched value might be out of date by the time the
+        * relation is built.  DataChecksumsNeedWrite returns true when data
+        * checksums are: enabled; are in the process of being enabled (state:
+        * "inprogress-on"); are in the process of being disabled (state:
+        * "inprogress-off"). Since relhaschecksums is only used to track progress
+        * when data checksums are being enabled, and going from disabled to
+        * enabled will clear relhaschecksums before starting, it is safe to use
+        * this value for a concurrent state transition to off.
+        *
+        * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+        * true then that implies that checksums are being enabled. Worst case,
+        * this will lead to the relation being processed for checksums even though
+        * each page written will have them already.  Performing this last shortens
+        * the window, but doesn't avoid it.
+        */
+       HOLD_INTERRUPTS();
+       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+       RESUME_INTERRUPTS();
+
/*
* Okay to insert into the relcache hash table.
*

I grepped for relhashcheckums, and concluded that the value in the
relcache isn't actually used for anything. Not so! In
heap_create_with_catalog(), the actual pg_class row is constructed from
the relcache entry, so the value set in RelationBuildLocalRelation()
finds its way to pg_class. Perhaps it would be more clear to pass
relhachecksums directly as an argument to AddNewRelationTuple(). That
way, the value in the relcache would be truly never used.

- Heikki

#74

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Heikki Linnakangas (#73)

Re: Online checksums patch - once again

On 22/01/2021 13:55, Heikki Linnakangas wrote:

I read through the latest patch,
v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some
comments below:

One more thing:

In SetRelationNumChecks(), you should use SearchSysCacheCopy1() to get a
modifiable copy of the tuple. Otherwise you modify the tuple in the
relcache as a side effect. Maybe that's harmless in this case, as the
'relhaschecksums' value in the relcache isn't used for anything, but
let's be tidy.

- Heikki

#75

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Heikki Linnakangas (#73)

1 attachment(s)

Re: Online checksums patch - once again

On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I read through the latest patch, v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some comments below:

Thanks for reviewing! Attached is a v32 which address most of your comments,
see below.

On 19/01/2021 14:32, Daniel Gustafsson wrote:

+       /*
+        * Hold interrupts for the duration of xlogging to avoid the state of data
+        * checksums changing during the processing which would later the premise
+        * for xlogging hint bits.
+        */

Sentence sense does not make.

Indeed it doesn't, the "later" is a misspelled "alter". Essentially, the idea
with the comment is to explain that interrupts should be held during logging so
that the decision to log hintbits is valid for the duration of the logging.
Does changing to "alter" suffice? It's fixed like so in the attached, but
perhaps there are better wordings for this.

@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
static void SetCurrentChunkStartTime(TimestampTz xtime);
static void CheckRequiredParameterValues(void);
static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
TimeLineID prevTLI);
static void LocalSetXLogInsertAllowed(void);

Spelling: make it "XLogChecksums" for consistency.

Good point, fixed.

/*
* DataChecksumsNeedWrite
* Returns whether data checksums must be written or not
*
* Returns true iff data checksums are enabled or are in the process of being
* enabled. In case data checksums are currently being enabled we must write
* the checksum even though it's not verified during this stage. Interrupts
* need to be held off by the caller to ensure that the returned state is
* valid for the duration of the intended processing.
*/
bool
DataChecksumsNeedWrite(void)
{
Assert(InterruptHoldoffCount > 0);
return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
}

Can you be more precise on the "duration of the intended processing"? It means, until you have actually written out the page, or something like that, right? The similar explanation in DataChecksumsNeedVerify() is easier to understand. The two functions are similar, it would be good to phrase the comments similarly, so that you can quickly see the difference between the two.

I've reworded this to (hopefully) be more clear, and to be more like the
comment for DataChecksumsNeedVerify.

/*
* SetDataChecksumsOnInProgress
* Sets the data checksum state to "inprogress-on" to enable checksums
*
* In order to start the process of enabling data checksums in a running
* cluster the data_checksum_version state must be changed to "inprogress-on".
* This state requires data checksums to be written but not verified. The state
* transition is performed in a critical section in order to provide crash
* safety, and checkpoints are held off. When the emitted procsignalbarrier
* has been absorbed by all backends we know that the cluster has started to
* enable data checksums.
*/

The two "in order" are superfluous, it's more concise to say just "To start the process ..." (I was made aware of that by Jürgen Purtz's recent patches that removed a couple of "in order"s from the docs as unnecessary).

Agreed, that's better.

It's a bit confusing to talk about the critical section and procsignalbarrier here. Does the caller need to wait for the procsignalbarrier? No, that's just explaining what the function does internally. Maybe move that explanation inside the function, and say here something like "This function blocks until all processes have acknowledged the state change" or something like that.

Fixed.

/*
* SetDataChecksumsOn
* Enables data checksums cluster-wide
*
* Enabling data checksums is performed using two barriers, the first one
* sets the checksums state to "inprogress-on" (which is performed by
* SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
* During "inprogress-on", checksums are written but not verified. When all
* existing pages are guaranteed to have checksums, and all new pages will be
* initiated with checksums, the state can be changed to "on".
*/

Perhaps the explanation for how these SetDataChecksumsOn() and SetDataChecksumsOnInProgress() functions work together should be moved to one place. For example, explain both functions here at SetDataChecksumsOn(), and just have a "see SetDataChecksumsOn()" in the other function, or no comment at all if they're kept next to each other. As it stands, you have to read both comments and piece together the the big picture in your head. Maybe add a "see datachecksumsworker.c" here, since there's a longer explanation of the overall mechanism there.

Fixed. I did keep both comments since the function are too long to fit the
comment and the code on screen, but refer to one from the other.

+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+       if (!superuser())
+               ereport(ERROR,
+                               (errmsg("must be superuser")));
+
+       StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+       PG_RETURN_BOOL(true);
+}

The documentation says "Returns <literal>false</literal> in case data checksums are disabled already", but the function always returns true.

Fixed.

The enable_data_checksum() function also returns a constant 'true'; why not make it void?

These functions were void until v20 of this patch when Robert correctly pointed
out that using ereport to communicate status was a poor choice (there were
NOTICEs IIRC). v23 however rolled back all status checking in response to a
review by you, leaving the functions to only call the worker. The returnvalue
was however not changed back to void at that point which would've been the
reasonable choice. Fixed.

The "has immediate effect" comment seems wrong, given that it actually launches a worker process.

Fixed.

+       /*
+        * If the launcher is already started, the only operation we can perform
+        * is to cancel it iff the user requested for checksums to be disabled.
+        * That doesn't however mean that all other cases yield an error, as some
+        * might be perfectly benevolent.
+        */
This comment is a bit hard to understand. Maybe something like

"If the launcher is already started, we cannot launch a new one. But if the user requested for checksums to be disabled, we can cancel it."

Agreed, thats better. Fixed.

+       if (DatachecksumsWorkerShmem->launcher_started)
+       {
+               if (DatachecksumsWorkerShmem->abort)
+               {
+                       ereport(NOTICE,
+                                       (errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }
If it's being aborted, and the user requested to disable checksum, can we leave out the NOTICE? I guess not, because the worker process won't check the 'abort' flag, if it's already finished processing all data pages.

Correct, since we might otherwise silently not disable checksums on a request
to do so.

+               /*
+                * If the launcher is started data checksums cannot be on or off, but
+                * it may be in an inprogress state. Since the state transition may
+                * not have happened yet (in case of rapidly initiated checksum enable
+                * calls for example) we inspect the target state of the currently
+                * running launcher.
+                */

This comment contradicts itself. If the state transition has not happened yet, then contrary to the first sentence, data checksums *are* currently on or off.

Yes, indeed. I've reworded this comment but I'm not sure it's much of an
improvement I'm afraid.

+               if (enable_checksums)
+               {
+                       /*
+                        * If we are asked to enable checksums when they are already being
+                        * enabled, there is nothing to do so exit.
+                        */
+                       if (DatachecksumsWorkerShmem->enable_checksums)
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /*
+                        * Disabling checksums is likely to be a very quick operation in
+                        * many cases so trying to abort it to save the checksums would
+                        * run the risk of race conditions.
+                        */
+                       else
+                       {
+                               ereport(NOTICE,
+                                               (errmsg("data checksums are concurrently being disabled, please retry")));
+
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /* This should be unreachable */
+                       Assert(false);
+               }
+               else if (!enable_checksums)
+               {
+                       /*
+                        * Data checksums are already being disabled, exit silently.
+                        */
+                       if (DataChecksumsOffInProgress())
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       DatachecksumsWorkerShmem->abort = true;
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }

The Assert seems unnecessary. The "if (!enable_checkums)" also seems unnecessary, could be just "else".

Fixed.

+       /*
+        * The launcher is currently not running, so we need to query the system
+        * data checksum state to determine how to proceed based on the requested
+        * target state.
+        */
"query the system" makes me think "checking the catalogs" or similar, but this just looks at a few variables in shared memory.

Reworded.

+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *             Request shutdown of the datachecksumsworker
+ *

This function is unused.

Removed.

+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+       LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+       DatachecksumsWorkerShmem->abort = true;
+       LWLockRelease(DatachecksumsWorkerLock);
+}
Acquiring an lwlock in signal handler is not safe.

Since this is a signal handler, it might still get called after the process has finished processing, and has already set DatachecksumsWorkerShmem->launcher_started = false. That seems like an unexpected state.

Good point, I've fixed this to use a proper signal handler flag. While doing
so I realized there was a race window when checksums were disabled while the
worker was running, but after it had processed the final buffer. Since the
abort flag was only checked during buffer processing it could be missed leading
to the worker ending with checksums enabled when they should be disabled. This
is likely to be much more common in the TAP tests than in real production use,
which is good since it was easy to trigger. The attached fixes this by
re-checking the abort flag and I am no longer able to trigger the window during
tests.

+/*
+ * WaitForAllTransactionsToFinish
+ *             Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */

The caller of this function doesn't check the 'abort' flag AFAICS. I think you could just use use WL_EXIT_ON_PM_DEATH to error out on postmaster death.

Good point, that simplifies the code. It's erroring out in the same way as
other checks for postmaster death.

+       while (true)
+       {
+               int                     processed_databases = 0;
+
+               /*
+                * Get a list of all databases to process. This may include databases
+                * that were created during our runtime.
+                *
+                * Since a database can be created as a copy of any other database
+                * (which may not have existed in our last run), we have to repeat
+                * this loop until no new databases show up in the list. Since we wait
+                * for all pre-existing transactions finish, this way we can be
+                * certain that there are no databases left without checksums.
+                */
+               DatabaseList = BuildDatabaseList();

I think it's unnecessary to wait out the transactions on the first iteration of this loop. There must be at least one database, so there will always be at least two iterations.

I think it would be more clear to move the WaitForAllTransactionsToFinish() from BuildDatabaseList() to the callers.

I've tried this in the attached. It does increase the window between waiting
for transactions to finish and grabbing the list, but that might be negligable?

@@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
relkind == RELKIND_MATVIEW)
RelationInitTableAccessMethod(rel);
+       /*
+        * Set the data checksum state. Since the data checksum state can change at
+        * any time, the fetched value might be out of date by the time the
+        * relation is built.  DataChecksumsNeedWrite returns true when data
+        * checksums are: enabled; are in the process of being enabled (state:
+        * "inprogress-on"); are in the process of being disabled (state:
+        * "inprogress-off"). Since relhaschecksums is only used to track progress
+        * when data checksums are being enabled, and going from disabled to
+        * enabled will clear relhaschecksums before starting, it is safe to use
+        * this value for a concurrent state transition to off.
+        *
+        * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+        * true then that implies that checksums are being enabled. Worst case,
+        * this will lead to the relation being processed for checksums even though
+        * each page written will have them already.  Performing this last shortens
+        * the window, but doesn't avoid it.
+        */
+       HOLD_INTERRUPTS();
+       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+       RESUME_INTERRUPTS();
+
/*
* Okay to insert into the relcache hash table.
*

I grepped for relhashcheckums, and concluded that the value in the relcache isn't actually used for anything. Not so! In heap_create_with_catalog(), the actual pg_class row is constructed from the relcache entry, so the value set in RelationBuildLocalRelation() finds its way to pg_class. Perhaps it would be more clear to pass relhachecksums directly as an argument to AddNewRelationTuple(). That way, the value in the relcache would be truly never used.

I might be thick (or undercaffeinated) but I'm not sure I follow.
AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the relcache
entry.

--
Daniel Gustafsson https://vmware.com/

Attachments:

v32-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v32-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 99df76db5a5fa0f333bc71db9f663198f70d4593 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Tue, 26 Jan 2021 15:27:22 +0100
Subject: [PATCH v32] Support checksum enable/disable in a running cluster v32

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  452 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1600 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   33 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   89 +
 src/test/checksum/t/002_restarts.pl          |  108 ++
 src/test/checksum/t/003_standby_checksum.pl  |  116 ++
 src/test/checksum/t/004_offline.pl           |  100 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 3030 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..75cc1588a5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index aa99665e2e..94182fb7b1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25839,6 +25839,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9496f76b1f..7e170ec429 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3695,8 +3695,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3706,8 +3705,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..ffcd889908 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc007b8963..c0e2bde5d5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,370 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	/*
+	 * The state transition is performed in a critical section with checkpoints
+	 * held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7991,6 +8359,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9900,6 +10294,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10355,6 +10767,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..5d77be8a2d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..516ae666b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..1994712367
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1600 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+static void AbortProcessing(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, we cannot launch a new one. But if
+	 * the user requested for checksums to be disabled, we can cancel it.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started, then the data checksums state will
+		 * transition to an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so silently exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+		}
+		else
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (!DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to inspect the data
+	 * data checksum state in the cluster to determine how to proceed based on
+	 * the requested target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort || abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	ReleaseSysCache(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	abort_requested = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int		save_errno = errno;
+
+	abort_requested = true;
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still
+	 * need to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				ereport(FATAL,
+						(errmsg("postmaster exited during data checksum processing"),
+						 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			aborted = DatachecksumsWorkerShmem->abort || abort_requested;
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+static void
+AbortProcessing(void)
+{
+	bool		connected = false;
+
+	SetDataChecksumsOff();
+	ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					abort_requested = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					abort_requested = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * If the user called pg_disable_data_checksums while the worker is
+	 * running but after all ProcessSingleRelationFork is done with all
+	 * blocks, then there is a window in which the abort requested could end
+	 * with checksums enabled. Re-check the cancellation request before exit
+	 * to ensure that doesn't happen.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		AbortProcessing();
+	}
+	else
+		LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	abort_requested = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last run),
+	 * we have to repeat this loop until no new databases show up in the list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be certain
+		 * that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		abort_requested = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		AbortProcessing();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5b083749d5 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..17c4dc15e6 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change at
+	 * any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even though
+	 * each page written will have them already.  Performing this last shortens
+	 * the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..f050f15a58 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11301,6 +11301,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..809de73dc6
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..57384a452c
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,89 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..dc5bcb9629
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,108 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';");
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1', 'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums " .
+	"AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..99c283e0b1
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,116 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+cmp_ok($result, '~~', ["inprogress-on", "on"], 'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;",
+	'0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is ($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..28f6208a63
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,100 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in = '';
+my $out = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';");
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..61b4571e9d 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#76

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Heikki Linnakangas (#74)

Re: Online checksums patch - once again

On 22/01/2021 14:21, Heikki Linnakangas wrote:

On 22/01/2021 13:55, Heikki Linnakangas wrote:

I read through the latest patch,
v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some
comments below:

One more thing:

In SetRelationNumChecks(), you should use SearchSysCacheCopy1() to get a
modifiable copy of the tuple. Otherwise you modify the tuple in the
relcache as a side effect. Maybe that's harmless in this case, as the
'relhaschecksums' value in the relcache isn't used for anything, but
let's be tidy.

Sorry, I meant SetRelHasChecksums. There is no SetRelationNumChecks
function, I don't know where I got that from.

- Heikki

#77

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Heikki Linnakangas (#76)

1 attachment(s)

Re: Online checksums patch - once again

On 26 Jan 2021, at 23:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 22/01/2021 14:21, Heikki Linnakangas wrote:

On 22/01/2021 13:55, Heikki Linnakangas wrote:

I read through the latest patch,
v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Some
comments below:

One more thing:
In SetRelationNumChecks(), you should use SearchSysCacheCopy1() to get a
modifiable copy of the tuple. Otherwise you modify the tuple in the
relcache as a side effect. Maybe that's harmless in this case, as the
'relhaschecksums' value in the relcache isn't used for anything, but
let's be tidy.

Sorry, I meant SetRelHasChecksums. There is no SetRelationNumChecks function, I don't know where I got that from.

Ah, that makes more sense, you had me confused there for a bit =) Fixed in
attached v33 which also have been through another pgindent and pgperltidy run.

--
Daniel Gustafsson https://vmware.com/

Attachments:

v33-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v33-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 80ce4e22fd128a7290ae168ed078b03d3ff1f09c Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Wed, 27 Jan 2021 11:38:33 +0100
Subject: [PATCH v33] Support checksum enable/disable in a running cluster v33

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  452 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1602 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   33 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   92 +
 src/test/checksum/t/002_restarts.pl          |  117 ++
 src/test/checksum/t/003_standby_checksum.pl  |  127 ++
 src/test/checksum/t/004_offline.pl           |  105 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 3060 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..75cc1588a5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index aa99665e2e..94182fb7b1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25839,6 +25839,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9496f76b1f..7e170ec429 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3695,8 +3695,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3706,8 +3705,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..ffcd889908 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc007b8963..8531def93c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,370 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	/*
+	 * The state transition is performed in a critical section with
+	 * checkpoints held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7991,6 +8359,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9900,6 +10294,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10355,6 +10767,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..5d77be8a2d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..516ae666b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..5e2e57c843
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1602 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * Access to launcher_started and abort must be protected by
+	 * DatachecksumsWorkerLock.
+	 */
+	bool		launcher_started;
+	bool		abort;
+
+	/*
+	 * Variables for the worker to signal the launcher, or subsequent workers
+	 * in other databases. As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	DatachecksumsWorkerResult success;
+	bool		process_shared_catalogs;
+
+	/*
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	int			cost_delay;
+	int			cost_limit;
+	int			operations[MAX_OPS];
+	bool		enable_checksums;	/* True if checksums are being enabled,
+									 * else false */
+}			DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+static void AbortProcessing(void);
+
+/*
+ * DataChecksumsWorkerStarted
+ *			Informational function to query the state of the worker
+ */
+bool
+DataChecksumsWorkerStarted(void)
+{
+	bool		started;
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	started = DatachecksumsWorkerShmem->launcher_started && !DatachecksumsWorkerShmem->abort;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	return started;
+}
+
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+
+	/*
+	 * Given that any backend can initiate a data checksum operation, the
+	 * launcher can at this point be in one of the below distinct states:
+	 *
+	 * A: Started and performing an operation; B: Started and in the process
+	 * of aborting; C: Not started
+	 *
+	 * If the launcher is in state A, and the requested target state is equal
+	 * to the currently performed operation then we can return immediately.
+	 * This can happen if two users enable checksums simultaneously.  If the
+	 * requested target is to disable checksums while they are being enabled,
+	 * we must abort the current processing.  This can happen if a user
+	 * enables data checksums and then, before checksumming is done, disables
+	 * data checksums again.
+	 *
+	 * If the launcher is in state B, we need to wait for processing to end
+	 * and the abort flag be cleared before we can restart with the requested
+	 * operation.  Here we will exit immediately and leave it to the user to
+	 * restart processing at a later time.
+	 *
+	 * If the launcher is in state C we can start performing the requested
+	 * operation immediately.
+	 */
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	/*
+	 * If the launcher is already started, we cannot launch a new one. But if
+	 * the user requested for checksums to be disabled, we can cancel it.
+	 */
+	if (DatachecksumsWorkerShmem->launcher_started)
+	{
+		if (DatachecksumsWorkerShmem->abort)
+		{
+			ereport(NOTICE,
+					(errmsg("data checksum processing is concurrently being aborted, please retry")));
+
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+
+		/*
+		 * If the launcher is started, then the data checksums state will
+		 * transition to an inprogress state. Since the state transition may
+		 * not have happened yet (in case of rapidly initiated checksum enable
+		 * calls for example) we inspect the target state of the currently
+		 * running launcher.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums when they are already being
+			 * enabled, there is nothing to do so silently exit.
+			 */
+			if (DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * Disabling checksums is likely to be a very quick operation in
+			 * many cases so trying to abort it to save the checksums would
+			 * run the risk of race conditions.
+			 */
+			else
+			{
+				ereport(NOTICE,
+						(errmsg("data checksums are concurrently being disabled, please retry")));
+
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+		}
+		else
+		{
+			/*
+			 * Data checksums are already being disabled, exit silently.
+			 */
+			if (!DatachecksumsWorkerShmem->enable_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			DatachecksumsWorkerShmem->abort = true;
+			LWLockRelease(DatachecksumsWorkerLock);
+			return;
+		}
+	}
+
+	/*
+	 * The launcher is currently not running, so we need to inspect the data
+	 * data checksum state in the cluster to determine how to proceed based on
+	 * the requested target state.
+	 */
+	else
+	{
+		memset(DatachecksumsWorkerShmem->operations, 0, sizeof(DatachecksumsWorkerShmem->operations));
+		DatachecksumsWorkerShmem->enable_checksums = enable_checksums;
+
+		/*
+		 * If the launcher isn't started and we're asked to enable checksums,
+		 * we need to check if processing was previously interrupted such that
+		 * we should resume rather than start from scratch.
+		 */
+		if (enable_checksums)
+		{
+			/*
+			 * If we are asked to enable checksums in a cluster which already
+			 * has checksums enabled, exit immediately as there is nothing
+			 * more to do.
+			 */
+			if (DataChecksumsNeedVerify())
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				return;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-on" then we will
+			 * resume from where we left off based on the catalog state. This
+			 * will be safe since new relations created while the checksum-
+			 * worker was disabled will have checksums enabled.
+			 */
+			else if (DataChecksumsOnInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[1] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * If the controlfile state is set to "inprogress-off" then we
+			 * were interrupted while the catalog state was being cleared. In
+			 * this case we need to first reset state and then continue with
+			 * enabling checksums.
+			 */
+			else if (DataChecksumsOffInProgress())
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+
+			/*
+			 * Data checksums are off in the cluster, we can proceed with
+			 * enabling them. Just in case we will start by resetting the
+			 * catalog state since we are doing this from scratch and we don't
+			 * want leftover catalog state to cause us to miss a relation.
+			 */
+			else
+			{
+				DatachecksumsWorkerShmem->operations[0] = RESET_STATE;
+				DatachecksumsWorkerShmem->operations[1] = SET_INPROGRESS_ON;
+				DatachecksumsWorkerShmem->operations[2] = ENABLE_CHECKSUMS;
+				DatachecksumsWorkerShmem->operations[3] = SET_CHECKSUMS_ON;
+			}
+		}
+		else
+		{
+			/*
+			 * Regardless of current state in the system, we go through the
+			 * motions when asked to disable checksums. The catalog state is
+			 * only defined to be relevant during the operation of enabling
+			 * checksums, and have no use at any other point in time. That
+			 * being said, a user who sees stale relhaschecksums entries in
+			 * the catalog might run this just in case.
+			 *
+			 * Resetting state must be performed after setting data checksum
+			 * state to off, as there otherwise might (depending on system
+			 * data checksum state) be a window between catalog resetting and
+			 * state transition when new relations are created with the
+			 * catalog state set to true.
+			 */
+			DatachecksumsWorkerShmem->operations[0] = DISABLE_CHECKSUMS;
+			DatachecksumsWorkerShmem->operations[1] = RESET_STATE;
+		}
+	}
+
+	/*
+	 * Backoff parameters to throttle the load during enabling. As there is no
+	 * real processing performed during disabling checksums the backoff
+	 * parameters do not apply there.
+	 */
+	if (enable_checksums)
+	{
+		DatachecksumsWorkerShmem->cost_delay = cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = cost_limit;
+	}
+	else
+	{
+		DatachecksumsWorkerShmem->cost_delay = 0;
+		DatachecksumsWorkerShmem->cost_limit = 0;
+	}
+
+	/*
+	 * Prepare the BackgroundWorker and launch it.
+	 */
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = (Datum) 0;
+
+	DatachecksumsWorkerShmem->launcher_started = true;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->launcher_started = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+		ereport(ERROR,
+				(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		if (DatachecksumsWorkerShmem->abort || abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	heap_freetuple(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->abort = false;
+	DatachecksumsWorkerShmem->launcher_started = false;
+	abort_requested = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	abort_requested = true;
+
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still need
+	 * to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+	bool		aborted = false;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (!aborted)
+	{
+		TransactionId oldestxid = GetOldestActiveTransactionId();
+
+		if (TransactionIdPrecedes(oldestxid, waitforxid))
+		{
+			char		activity[64];
+			int			rc;
+
+			/* Oldest running xid is older than us, so wait */
+			snprintf(activity,
+					 sizeof(activity),
+					 "Waiting for current transactions to finish (waiting for %u)",
+					 waitforxid);
+			pgstat_report_activity(STATE_RUNNING, activity);
+
+			/* Retry every 5 seconds */
+			ResetLatch(MyLatch);
+			rc = WaitLatch(MyLatch,
+						   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+						   5000,
+						   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+			/*
+			 * If the postmaster died we won't be able to enable checksums
+			 * cluster-wide so abort and hope to continue when restarted.
+			 */
+			if (rc & WL_POSTMASTER_DEATH)
+				ereport(FATAL,
+						(errmsg("postmaster exited during data checksum processing"),
+						 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			aborted = DatachecksumsWorkerShmem->abort || abort_requested;
+			LWLockRelease(DatachecksumsWorkerLock);
+		}
+		else
+		{
+			pgstat_report_activity(STATE_IDLE, NULL);
+			return;
+		}
+	}
+}
+
+static void
+AbortProcessing(void)
+{
+	bool		connected = false;
+
+	SetDataChecksumsOff();
+	ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = DatachecksumsWorkerShmem->operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					abort_requested = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					DatachecksumsWorkerShmem->launcher_started = false;
+					DatachecksumsWorkerShmem->abort = false;
+					abort_requested = false;
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+	/*
+	 * If the user called pg_disable_data_checksums while the worker is
+	 * running but after all ProcessSingleRelationFork is done with all
+	 * blocks, then there is a window in which the abort requested could end
+	 * with checksums enabled. Re-check the cancellation request before exit
+	 * to ensure that doesn't happen.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->abort)
+	{
+		LWLockRelease(DatachecksumsWorkerLock);
+		AbortProcessing();
+	}
+	else
+		LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Clean up after processing
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	DatachecksumsWorkerShmem->launcher_started = false;
+	DatachecksumsWorkerShmem->abort = false;
+	abort_requested = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last
+	 * run), we have to repeat this loop until no new databases show up in the
+	 * list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		DatachecksumsWorkerShmem->abort = false;
+		DatachecksumsWorkerShmem->launcher_started = false;
+		abort_requested = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launcher_started = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		AbortProcessing();
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	while (!aborted)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+		int			rc;
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			DatachecksumsWorkerShmem->abort = true;
+		aborted = DatachecksumsWorkerShmem->abort;
+
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5b083749d5 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..633821bae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change
+	 * at any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.  Performing this last
+	 * shortens the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca..3b7207afb5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..f050f15a58 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11301,6 +11301,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..809de73dc6
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,33 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Status functions */
+bool		DataChecksumsWorkerStarted(void);
+
+/* Start the background processes for enabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..3b229de915
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,92 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..41a4d64037
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,117 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';"
+);
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1',
+	'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..1555a1694b
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,127 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+cmp_ok(
+	$result, '~~',
+	[ "inprogress-on", "on" ],
+	'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..2dfca4df23
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..b7431a7600 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return;
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#78

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Daniel Gustafsson (#75)

1 attachment(s)

Re: Online checksums patch - once again

Revisiting an issue we discussed earlier:

On 25/11/2020 15:20, Daniel Gustafsson wrote:

On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

On 17/11/2020 10:56, Daniel Gustafsson wrote:

I've reworked this in the attached such that the enable_ and
disable_ functions merely call into the launcher with the desired
outcome, and the launcher is responsible for figuring out the
rest. The datachecksumworker is now the sole place which
initiates a state transfer.

Well, you still fill the DatachecksumsWorkerShmem->operations array
in the backend process that launches the datacheckumworker, not in
the worker process. I find that still a bit surprising, but I
believe it works.

I'm open to changing it in case there are strong opinions, it just
seemed the most natural to me.

This kept bothering me, so I spent a while hacking this to my liking.
The attached patch moves the code to fill in 'operations' from the
backend to the launcher, so that the pg_enable/disable_checksums() call
now truly just stores the desired state, checksum on or off, in shared
memory, and launches the launcher process. The launcher process figures
out how to get to the desired state. This removes the couple of corner
cases that previously emitted a NOTICE about the processing being
concurrently disabled or aborted. What do you think? I haven't done much
testing, so if you adopt this approach, please check if I broke
something in the process.

This changes the way the abort works. If the
pg_enable/disable_checksums() is called, while the launcher is already
busy changing the state, pg_enable/disable_checksums() will just set the
new desired state in shared memory anyway. The launcher process will
notice that the target state changed some time later, and restart from
scratch.

A couple of other issues came up while doing that:

- AbortProcessing() has two callers, one in code that runs in the
launcher process, and another one in code that runs in the worker
process. Is it really safe to use from the worker process? Calling
ProcessAllDatabases() in the worker seems sketchy. (This is moot in this
new patch version, as I removed AbortProcessing() altogether)

- Is it possible that the worker keeps running after the launcher has
already exited, e.g. because of an ERROR or SIGTERM? If you then quickly
call pg_enable_checksums() again, can you end up with two workers
running at the same time? Is that bad?

On 26/01/2021 23:00, Daniel Gustafsson wrote:

On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
@@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
relkind == RELKIND_MATVIEW)
RelationInitTableAccessMethod(rel);
+       /*
+        * Set the data checksum state. Since the data checksum state can change at
+        * any time, the fetched value might be out of date by the time the
+        * relation is built.  DataChecksumsNeedWrite returns true when data
+        * checksums are: enabled; are in the process of being enabled (state:
+        * "inprogress-on"); are in the process of being disabled (state:
+        * "inprogress-off"). Since relhaschecksums is only used to track progress
+        * when data checksums are being enabled, and going from disabled to
+        * enabled will clear relhaschecksums before starting, it is safe to use
+        * this value for a concurrent state transition to off.
+        *
+        * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+        * true then that implies that checksums are being enabled. Worst case,
+        * this will lead to the relation being processed for checksums even though
+        * each page written will have them already.  Performing this last shortens
+        * the window, but doesn't avoid it.
+        */
+       HOLD_INTERRUPTS();
+       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+       RESUME_INTERRUPTS();
+
/*
* Okay to insert into the relcache hash table.
*
I grepped for relhashcheckums, and concluded that the value in the
relcache isn't actually used for anything. Not so! In
heap_create_with_catalog(), the actual pg_class row is constructed
from the relcache entry, so the value set in
RelationBuildLocalRelation() finds its way to pg_class. Perhaps it
would be more clear to pass relhachecksums directly as an argument
to AddNewRelationTuple(). That way, the value in the relcache would
be truly never used.
I might be thick (or undercaffeinated) but I'm not sure I follow.
AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the
relcache entry.

Ah, you're right, I misread AddNewRelationTuple. That means that the
relhaschecksums field in the relcache is never used? That's a clear
rule. The changes to formrdesc() and RelationBuildLocalRelation() seem
unnecessary then, we can always initialize relhaschecksums to false in
the relcache.

- Heikki

Attachments:

v34-0001-Support-checksum-enable-disable-in-a-running-clu.patchtext/x-patch; charset=UTF-8; name=v34-0001-Support-checksum-enable-disable-in-a-running-clu.patchDownload

From 3ccb06a1e39b456d26a5e2f89c9b634f04b34307 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Jan 2021 17:20:44 +0200
Subject: [PATCH v34 1/1] Support checksum enable/disable in a running cluster
 v33

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  452 +++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1530 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   30 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   92 ++
 src/test/checksum/t/002_restarts.pl          |  117 ++
 src/test/checksum/t/003_standby_checksum.pl  |  127 ++
 src/test/checksum/t/004_offline.pl           |  105 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 2985 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0b..75cc1588a5c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index aa99665e2eb..94182fb7b1e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25839,6 +25839,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9496f76b1fb..7e170ec4299 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3695,8 +3695,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3706,8 +3705,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b23..d879550e81c 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f81..48890ccc9d3 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd546..ffcd8899082 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea0735..fa074c6046f 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc007b8963e..8531def93c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,370 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	/*
+	 * The state transition is performed in a critical section with
+	 * checkpoints held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7991,6 +8359,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9900,6 +10294,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10355,6 +10767,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319dd..5d77be8a2dc 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f556..87052b06930 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d78..516ae666b7a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833db..59b82ee9ce5 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de35..8afbf762afc 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 00000000000..b26c31e8924
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1530 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+/*
+ * Signaling between backends calling pg_enable/disable_checkums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DatachecksumsWorkerLock
+ */
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * These are set by pg_enable/disable_checkums, to tell the launcher what
+	 * the target state is.
+	 */
+	bool		launch_enable_checksums;	/* True if checksums are being
+											 * enabled, else false */
+	int			launch_cost_delay;
+	int			launch_cost_limit;
+
+	/*
+	 * Is a launcher process is currently running?
+	 *
+	 * This is set by the launcher process, after it has read the above launch_*
+	 * parameters.
+	 */
+	bool		launcher_running;
+
+	/*
+	 * These fields indicate the target state that the launcher is currently
+	 * working towards. They can be different from the corresponding launch_*
+	 * fields, if a new pg_enable_disable_checksums() call was made while the
+	 * launcher/worker was already running.
+
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	bool		enabling_checksums;	/* True if checksums are being enabled,
+									 * else false */
+	int			cost_delay;
+	int			cost_limit;
+
+	/*
+	 * Signaling between the launcher and the worker process.
+	 *
+	 * As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	/* result, set by worker before exiting */
+	DatachecksumsWorkerResult success;
+
+	/* tells the worker process whether it should also process the shared catalogs. */
+	bool		process_shared_catalogs;
+} DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/*
+ * Have we set the DatachecksumsWorkerShmemStruct->launcher_running flag?
+ * If we have, we need to clear it before exiting!
+ */
+static volatile sig_atomic_t launcher_running = false;
+
+/*
+ * Are we enabling checkums, or disabling them?
+ */
+static bool enabling_checksums;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	bool		launcher_running;
+
+	/* the cost delay settings have no effect when disabling */
+	Assert(enable_checksums || cost_delay == 0);
+	Assert(enable_checksums || cost_limit == 0);
+
+	/*
+	 * Store the desired state in shared memory.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	DatachecksumsWorkerShmem->launch_enable_checksums = enable_checksums;
+	DatachecksumsWorkerShmem->launch_cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->launch_cost_limit = cost_limit;
+
+	/* is the launcher already running? */
+	launcher_running = DatachecksumsWorkerShmem->launcher_running;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Launch a new launcher process, if it's not running already.
+	 *
+	 * If the launcher is currently busy enabling the checkums, and we want
+	 * them disabled (or vice versa), the launcher will notice that at latest
+	 * when it's about to exit, and will loop back process the new request.
+	 * So if the launcher is already running, we don't need to do anything
+	 * more here to abort it.
+	 *
+	 * If you call pg_enable/disable_checksums() twice in a row, before the
+	 * launcher has had a chance to start up, we still end up launching it
+	 * twice.  That's OK, the second invocation will see that a launcher is
+	 * already running and exit quickly.
+	 *
+	 * TODO: We could optimize here and skip launching the launcher, if we are
+	 * already in the desired state, i.e. if the checksums are already enabled
+	 * and you call pg_enable_checksums().
+	 */
+	if (!launcher_running)
+	{
+		/*
+		 * Prepare the BackgroundWorker and launch it.
+		 */
+		memset(&bgw, 0, sizeof(bgw));
+		bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+		bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+		snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+		snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+		snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+		bgw.bgw_restart_time = BGW_NEVER_RESTART;
+		bgw.bgw_notify_pid = MyProcPid;
+		bgw.bgw_main_arg = (Datum) 0;
+
+		if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+			ereport(ERROR,
+					(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		Assert(enabling_checksums);
+		if (!DatachecksumsWorkerShmem->launch_enable_checksums)
+			abort_requested = true;
+		if (abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	heap_freetuple(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	if (launcher_running)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		launcher_running = false;
+		DatachecksumsWorkerShmem->launcher_running = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	abort_requested = true;
+
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still need
+	 * to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ *
+ * NB: this will return early, if aborted by SIGINT or if the target state
+ * is changed while we're running.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (TransactionIdPrecedes(GetOldestActiveTransactionId(), waitforxid))
+	{
+		char		activity[64];
+		int			rc;
+
+		/* Oldest running xid is older than us, so wait */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for current transactions to finish (waiting for %u)",
+				 waitforxid);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errmsg("postmaster exited during data checksum processing"),
+					 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_SHARED);
+		if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+			abort_requested = true;
+		LWLockRelease(DatachecksumsWorkerLock);
+		if (abort_requested)
+			break;
+	}
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+	return;
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+	int			operations[MAX_OPS];
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	if (DatachecksumsWorkerShmem->launcher_running)
+	{
+		/* Launcher was already running. Let it finish. */
+		LWLockRelease(DatachecksumsWorkerLock);
+		return;
+	}
+
+	launcher_running = true;
+
+	enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+	DatachecksumsWorkerShmem->launcher_running = true;
+	DatachecksumsWorkerShmem->enabling_checksums = enabling_checksums;
+	DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * The target state can change while we are busy enabling/disabling checksums,
+	 * if the user calls pg_disable/enable_checksums() before we are finished with
+	 * the previous request. In that case, we will loop back here, to process the
+	 * new request.
+	 */
+again:
+
+	memset(operations, 0, sizeof(operations));
+
+	/*
+	 * If we're asked to enable checksums, we need to check if processing was
+	 * previously interrupted such that we should resume rather than start
+	 * from scratch.
+	 */
+	if (enabling_checksums)
+	{
+		/*
+		 * If we are asked to enable checksums in a cluster which already
+		 * has checksums enabled, exit immediately as there is nothing
+		 * more to do.
+		 */
+		if (DataChecksumsNeedVerify())
+			goto done;
+
+		/*
+		 * If the controlfile state is set to "inprogress-on" then we will
+		 * resume from where we left off based on the catalog state. This
+		 * will be safe since new relations created while the checksum-
+		 * worker was disabled will have checksums enabled.
+		 */
+		else if (DataChecksumsOnInProgress())
+		{
+			operations[0] = ENABLE_CHECKSUMS;
+			operations[1] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * If the controlfile state is set to "inprogress-off" then we
+		 * were interrupted while the catalog state was being cleared. In
+		 * this case we need to first reset state and then continue with
+		 * enabling checksums.
+		 */
+		else if (DataChecksumsOffInProgress())
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * Data checksums are off in the cluster, we can proceed with
+		 * enabling them. Just in case we will start by resetting the
+		 * catalog state since we are doing this from scratch and we don't
+		 * want leftover catalog state to cause us to miss a relation.
+		 */
+		else
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+	}
+	else
+	{
+		/*
+		 * Regardless of current state in the system, we go through the
+		 * motions when asked to disable checksums. The catalog state is
+		 * only defined to be relevant during the operation of enabling
+		 * checksums, and have no use at any other point in time. That
+		 * being said, a user who sees stale relhaschecksums entries in
+		 * the catalog might run this just in case.
+		 *
+		 * Resetting state must be performed after setting data checksum
+		 * state to off, as there otherwise might (depending on system
+		 * data checksum state) be a window between catalog resetting and
+		 * state transition when new relations are created with the
+		 * catalog state set to true.
+		 */
+		operations[0] = DISABLE_CHECKSUMS;
+		operations[1] = RESET_STATE;
+	}
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+done:
+	/*
+	 * All done. But before we exit, check if the target state was changed while
+	 * we were running. In that case we will have to start all over again.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+	{
+		DatachecksumsWorkerShmem->enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+		LWLockRelease(DatachecksumsWorkerLock);
+		goto again;
+	}
+
+	launcher_running = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last
+	 * run), we have to repeat this loop until no new databases show up in the
+	 * list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launch_enable_checksums = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	enabling_checksums = true;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	Assert(DatachecksumsWorkerShmem->enabling_checksums);
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	for (;;)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 5000,
+						 WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		aborted = DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		if (aborted || abort_requested)
+		{
+			DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+			ereport(DEBUG1,
+					(errmsg("data checksum processing aborted in database OID %u",
+							dboid)));
+			return;
+		}
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719dd..0fef097eb80 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550b..cc494b6f13d 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0e..d9c482454ff 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092f..9362ec00184 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b507..c7928f34957 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b4..a3720617f94 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c2956..5b083749d55 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59ad..78edf57adc8 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae0..8fbebd98700 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638d..4ac396ccf1e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01b..633821bae59 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change
+	 * at any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.  Performing this last
+	 * shortens the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc55..045da219044 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517d..92367ece4b8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 17579eeaca9..3b7207afb54 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee44082..f3f029f41e5 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf334..12988574583 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd0..b35cd4d503a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd0..6947c095914 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246f..adbe81e890b 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55f..bf296625e42 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce7..d8229422afc 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a3..f050f15a58c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11301,6 +11301,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e3082..f013acba76a 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87e..0974dfadfe4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 00000000000..845f6bceaae
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Start the background processes for enabling or disabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f4..c35b747520a 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d23591921..f736b12f986 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8e..d865796d048 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a4753..9774816625b 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 00000000000..871e943d50e
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 00000000000..fd60f7e97f3
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 00000000000..0f0317060b3
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 00000000000..3b229de915a
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,92 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 00000000000..41a4d640375
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,117 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';"
+);
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1',
+	'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 00000000000..1555a1694be
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,127 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+cmp_ok(
+	$result, '~~',
+	[ "inprogress-on", "on" ],
+	'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 00000000000..2dfca4df235
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667ec..b7431a76005 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return;
+}
+
 =pod
 
 =back
-- 
2.29.2

#79

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Heikki Linnakangas (#78)

2 attachment(s)

Re: Online checksums patch - once again

On 27 Jan 2021, at 16:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Revisiting an issue we discussed earlier:

On 25/11/2020 15:20, Daniel Gustafsson wrote:

On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

On 17/11/2020 10:56, Daniel Gustafsson wrote:

I've reworked this in the attached such that the enable_ and
disable_ functions merely call into the launcher with the desired
outcome, and the launcher is responsible for figuring out the
rest. The datachecksumworker is now the sole place which
initiates a state transfer.

Well, you still fill the DatachecksumsWorkerShmem->operations array
in the backend process that launches the datacheckumworker, not in
the worker process. I find that still a bit surprising, but I
believe it works.

I'm open to changing it in case there are strong opinions, it just
seemed the most natural to me.

This kept bothering me, so I spent a while hacking this to my liking. The attached patch moves the code to fill in 'operations' from the backend to the launcher, so that the pg_enable/disable_checksums() call now truly just stores the desired state, checksum on or off, in shared memory, and launches the launcher process. The launcher process figures out how to get to the desired state. This removes the couple of corner cases that previously emitted a NOTICE about the processing being concurrently disabled or aborted. What do you think?

I like it, it does avoid a few edgecases, while some are moved around (more on
that below) yielding easier to read code. This wasn't really how I thought you
wanted it when we talked about this, so I'm very glad to see it in code since I
now get what you were saying. Thanks!

I haven't done much testing, so if you adopt this approach, please check if I broke something in the process.

I think you broke restarts: the enable_checksums variable wasn't populated so
the restart seemed to fail to set the right operations. Also, the false return
can now mean an actual failure as well as an abort case for a restart. I've
done a quick change to look for an updated state, but that doesn't cover the
case when processing fails *and* a new state has been set. Maybe the best
solution is to change return type for processing to int's such that the three
cases (failure, abort, success) can be returned?

Also, this fails on assertion for me when checking the current state since
interrupts aren't held off. I have a feeling it should've done that in my
version too as I look at it? The attached holds off interrutps to pass that
(we clearly don't want a new state while we decide on operations anyways).

The tests in src/test/checksum were reliably failing on these for me but pass
with the attached 0002 applied.

This changes the way the abort works. If the pg_enable/disable_checksums() is called, while the launcher is already busy changing the state, pg_enable/disable_checksums() will just set the new desired state in shared memory anyway. The launcher process will notice that the target state changed some time later, and restart from scratch.

The "notice period" is the same as before though, unless I'm missing something.

A couple of other issues came up while doing that:

- AbortProcessing() has two callers, one in code that runs in the launcher process, and another one in code that runs in the worker process. Is it really safe to use from the worker process? Calling ProcessAllDatabases() in the worker seems sketchy. (This is moot in this new patch version, as I removed AbortProcessing() altogether)

I think it was safe, but I agree that this version is a lot cleaner so it is as
you say moot.

- Is it possible that the worker keeps running after the launcher has already exited, e.g. because of an ERROR or SIGTERM? If you then quickly call pg_enable_checksums() again, can you end up with two workers running at the same time? Is that bad?

True, I think there is a window for when that could happen, but worst case
should be that a database is checksummed twice?

On 26/01/2021 23:00, Daniel Gustafsson wrote:
On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
@@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
relkind == RELKIND_MATVIEW)
RelationInitTableAccessMethod(rel);
+       /*
+        * Set the data checksum state. Since the data checksum state can change at
+        * any time, the fetched value might be out of date by the time the
+        * relation is built.  DataChecksumsNeedWrite returns true when data
+        * checksums are: enabled; are in the process of being enabled (state:
+        * "inprogress-on"); are in the process of being disabled (state:
+        * "inprogress-off"). Since relhaschecksums is only used to track progress
+        * when data checksums are being enabled, and going from disabled to
+        * enabled will clear relhaschecksums before starting, it is safe to use
+        * this value for a concurrent state transition to off.
+        *
+        * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+        * true then that implies that checksums are being enabled. Worst case,
+        * this will lead to the relation being processed for checksums even though
+        * each page written will have them already.  Performing this last shortens
+        * the window, but doesn't avoid it.
+        */
+       HOLD_INTERRUPTS();
+       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+       RESUME_INTERRUPTS();
+
/*
* Okay to insert into the relcache hash table.
*
I grepped for relhashcheckums, and concluded that the value in the
relcache isn't actually used for anything. Not so! In
heap_create_with_catalog(), the actual pg_class row is constructed
from the relcache entry, so the value set in
RelationBuildLocalRelation() finds its way to pg_class. Perhaps it
would be more clear to pass relhachecksums directly as an argument
to AddNewRelationTuple(). That way, the value in the relcache would
be truly never used.
I might be thick (or undercaffeinated) but I'm not sure I follow. AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the
relcache entry.
Ah, you're right, I misread AddNewRelationTuple. That means that the relhaschecksums field in the relcache is never used? That's a clear rule. The changes to formrdesc() and RelationBuildLocalRelation() seem unnecessary then, we can always initialize relhaschecksums to false in the relcache.

They probably are, but should they be kept as a just-in-case for future hackery
which may assume the relcache is at all points correct?

I've attached my changes as 0002 here on top of your patch, what do you think
about those? There are some commentfixups as well, some stemming from your
patch and some from much earlier versions that I just hadn't seen until now.

--
Daniel Gustafsson https://vmware.com/

Attachments:

v35-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v35-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 30450b76ff2ed0a983baa1c045448c44504536c9 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Jan 2021 17:20:44 +0200
Subject: [PATCH v35 1/2] Support checksum enable/disable in a running cluster
 v35

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. A new in-progress
state is introduced which during processing ensures that data checksums
are written but not verified to avoid false negatives. State changes
across backends are synchronized using a procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  452 +++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1530 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   30 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   92 ++
 src/test/checksum/t/002_restarts.pl          |  117 ++
 src/test/checksum/t/003_standby_checksum.pl  |  127 ++
 src/test/checksum/t/004_offline.pl           |  105 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 2985 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..75cc1588a5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2166,6 +2166,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4342c8e74f..1f78ce1f92 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25850,6 +25850,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9496f76b1f..7e170ec429 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3695,8 +3695,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3706,8 +3705,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..ffcd889908 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 236a66f638..e4a4ca01e2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,370 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	/*
+	 * The state transition is performed in a critical section with
+	 * checkpoints held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7994,6 +8362,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9903,6 +10297,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10358,6 +10770,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..5d77be8a2d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..516ae666b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..b26c31e892
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1530 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+/*
+ * Signaling between backends calling pg_enable/disable_checkums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DatachecksumsWorkerLock
+ */
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * These are set by pg_enable/disable_checkums, to tell the launcher what
+	 * the target state is.
+	 */
+	bool		launch_enable_checksums;	/* True if checksums are being
+											 * enabled, else false */
+	int			launch_cost_delay;
+	int			launch_cost_limit;
+
+	/*
+	 * Is a launcher process is currently running?
+	 *
+	 * This is set by the launcher process, after it has read the above launch_*
+	 * parameters.
+	 */
+	bool		launcher_running;
+
+	/*
+	 * These fields indicate the target state that the launcher is currently
+	 * working towards. They can be different from the corresponding launch_*
+	 * fields, if a new pg_enable_disable_checksums() call was made while the
+	 * launcher/worker was already running.
+
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	bool		enabling_checksums;	/* True if checksums are being enabled,
+									 * else false */
+	int			cost_delay;
+	int			cost_limit;
+
+	/*
+	 * Signaling between the launcher and the worker process.
+	 *
+	 * As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+	/* result, set by worker before exiting */
+	DatachecksumsWorkerResult success;
+
+	/* tells the worker process whether it should also process the shared catalogs. */
+	bool		process_shared_catalogs;
+} DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/*
+ * Have we set the DatachecksumsWorkerShmemStruct->launcher_running flag?
+ * If we have, we need to clear it before exiting!
+ */
+static volatile sig_atomic_t launcher_running = false;
+
+/*
+ * Are we enabling checkums, or disabling them?
+ */
+static bool enabling_checksums;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	bool		launcher_running;
+
+	/* the cost delay settings have no effect when disabling */
+	Assert(enable_checksums || cost_delay == 0);
+	Assert(enable_checksums || cost_limit == 0);
+
+	/*
+	 * Store the desired state in shared memory.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	DatachecksumsWorkerShmem->launch_enable_checksums = enable_checksums;
+	DatachecksumsWorkerShmem->launch_cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->launch_cost_limit = cost_limit;
+
+	/* is the launcher already running? */
+	launcher_running = DatachecksumsWorkerShmem->launcher_running;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Launch a new launcher process, if it's not running already.
+	 *
+	 * If the launcher is currently busy enabling the checkums, and we want
+	 * them disabled (or vice versa), the launcher will notice that at latest
+	 * when it's about to exit, and will loop back process the new request.
+	 * So if the launcher is already running, we don't need to do anything
+	 * more here to abort it.
+	 *
+	 * If you call pg_enable/disable_checksums() twice in a row, before the
+	 * launcher has had a chance to start up, we still end up launching it
+	 * twice.  That's OK, the second invocation will see that a launcher is
+	 * already running and exit quickly.
+	 *
+	 * TODO: We could optimize here and skip launching the launcher, if we are
+	 * already in the desired state, i.e. if the checksums are already enabled
+	 * and you call pg_enable_checksums().
+	 */
+	if (!launcher_running)
+	{
+		/*
+		 * Prepare the BackgroundWorker and launch it.
+		 */
+		memset(&bgw, 0, sizeof(bgw));
+		bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+		bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+		snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+		snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+		snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+		bgw.bgw_restart_time = BGW_NEVER_RESTART;
+		bgw.bgw_notify_pid = MyProcPid;
+		bgw.bgw_main_arg = (Datum) 0;
+
+		if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+			ereport(ERROR,
+					(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		Assert(enabling_checksums);
+		if (!DatachecksumsWorkerShmem->launch_enable_checksums)
+			abort_requested = true;
+		if (abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	heap_freetuple(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	if (launcher_running)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		launcher_running = false;
+		DatachecksumsWorkerShmem->launcher_running = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	abort_requested = true;
+
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still need
+	 * to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ *
+ * NB: this will return early, if aborted by SIGINT or if the target state
+ * is changed while we're running.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (TransactionIdPrecedes(GetOldestActiveTransactionId(), waitforxid))
+	{
+		char		activity[64];
+		int			rc;
+
+		/* Oldest running xid is older than us, so wait */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for current transactions to finish (waiting for %u)",
+				 waitforxid);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errmsg("postmaster exited during data checksum processing"),
+					 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_SHARED);
+		if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+			abort_requested = true;
+		LWLockRelease(DatachecksumsWorkerLock);
+		if (abort_requested)
+			break;
+	}
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+	return;
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+	int			operations[MAX_OPS];
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	if (DatachecksumsWorkerShmem->launcher_running)
+	{
+		/* Launcher was already running. Let it finish. */
+		LWLockRelease(DatachecksumsWorkerLock);
+		return;
+	}
+
+	launcher_running = true;
+
+	enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+	DatachecksumsWorkerShmem->launcher_running = true;
+	DatachecksumsWorkerShmem->enabling_checksums = enabling_checksums;
+	DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * The target state can change while we are busy enabling/disabling checksums,
+	 * if the user calls pg_disable/enable_checksums() before we are finished with
+	 * the previous request. In that case, we will loop back here, to process the
+	 * new request.
+	 */
+again:
+
+	memset(operations, 0, sizeof(operations));
+
+	/*
+	 * If we're asked to enable checksums, we need to check if processing was
+	 * previously interrupted such that we should resume rather than start
+	 * from scratch.
+	 */
+	if (enabling_checksums)
+	{
+		/*
+		 * If we are asked to enable checksums in a cluster which already
+		 * has checksums enabled, exit immediately as there is nothing
+		 * more to do.
+		 */
+		if (DataChecksumsNeedVerify())
+			goto done;
+
+		/*
+		 * If the controlfile state is set to "inprogress-on" then we will
+		 * resume from where we left off based on the catalog state. This
+		 * will be safe since new relations created while the checksum-
+		 * worker was disabled will have checksums enabled.
+		 */
+		else if (DataChecksumsOnInProgress())
+		{
+			operations[0] = ENABLE_CHECKSUMS;
+			operations[1] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * If the controlfile state is set to "inprogress-off" then we
+		 * were interrupted while the catalog state was being cleared. In
+		 * this case we need to first reset state and then continue with
+		 * enabling checksums.
+		 */
+		else if (DataChecksumsOffInProgress())
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * Data checksums are off in the cluster, we can proceed with
+		 * enabling them. Just in case we will start by resetting the
+		 * catalog state since we are doing this from scratch and we don't
+		 * want leftover catalog state to cause us to miss a relation.
+		 */
+		else
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+	}
+	else
+	{
+		/*
+		 * Regardless of current state in the system, we go through the
+		 * motions when asked to disable checksums. The catalog state is
+		 * only defined to be relevant during the operation of enabling
+		 * checksums, and have no use at any other point in time. That
+		 * being said, a user who sees stale relhaschecksums entries in
+		 * the catalog might run this just in case.
+		 *
+		 * Resetting state must be performed after setting data checksum
+		 * state to off, as there otherwise might (depending on system
+		 * data checksum state) be a window between catalog resetting and
+		 * state transition when new relations are created with the
+		 * catalog state set to true.
+		 */
+		operations[0] = DISABLE_CHECKSUMS;
+		operations[1] = RESET_STATE;
+	}
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+done:
+	/*
+	 * All done. But before we exit, check if the target state was changed while
+	 * we were running. In that case we will have to start all over again.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+	{
+		DatachecksumsWorkerShmem->enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+		LWLockRelease(DatachecksumsWorkerLock);
+		goto again;
+	}
+
+	launcher_running = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last
+	 * run), we have to repeat this loop until no new databases show up in the
+	 * list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launch_enable_checksums = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	enabling_checksums = true;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	Assert(DatachecksumsWorkerShmem->enabling_checksums);
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	for (;;)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 5000,
+						 WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		aborted = DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		if (aborted || abort_requested)
+		{
+			DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+			ereport(DEBUG1,
+					(errmsg("data checksum processing aborted in database OID %u",
+							dboid)));
+			return;
+		}
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5b083749d5 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..633821bae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change
+	 * at any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.  Performing this last
+	 * shortens the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..3d108a2348 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index e8dcd15a55..bf296625e4 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* heap for rewrite during DDL, link to original rel */
 	Oid			relrewrite BKI_DEFAULT(0);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..f050f15a58 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11301,6 +11301,22 @@
   proname => 'raw_array_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..845f6bceaa
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Start the background processes for enabling or disabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index ab1ef9a475..9774816625 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..3b229de915
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,92 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..41a4d64037
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,117 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';"
+);
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1',
+	'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..1555a1694b
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,127 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+cmp_ok(
+	$result, '~~',
+	[ "inprogress-on", "on" ],
+	'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..2dfca4df23
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..b7431a7600 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return;
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

v35-0002-Fix-controlflow-around-restarts-v35.patchapplication/octet-stream; name=v35-0002-Fix-controlflow-around-restarts-v35.patch; x-unix-mode=0644Download

From a14cee24e5fab59cdd70217a4c1b47caf61c06ad Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Thu, 28 Jan 2021 23:50:56 +0100
Subject: [PATCH v35 2/2] Fix controlflow around restarts v35

The enable_checksums variable must be updated for the restarted new
target state, and interrupts must be held off while setting the local
state to avoid state changes. The failure case must also be checked
to differentiate restarts from actual failures.

Also tweak and polish some of the comments.
---
 src/backend/postmaster/datachecksumsworker.c | 66 +++++++++++++++-----
 1 file changed, 49 insertions(+), 17 deletions(-)

diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
index b26c31e892..cf6b876a57 100644
--- a/src/backend/postmaster/datachecksumsworker.c
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -224,7 +224,7 @@ typedef enum
 }			DatachecksumsWorkerResult;
 
 /*
- * Signaling between backends calling pg_enable/disable_checkums, the
+ * Signaling between backends calling pg_enable/disable_data_checksums, the
  * checksums launcher process, and the checksums worker process.
  *
  * This struct is protected by DatachecksumsWorkerLock
@@ -232,8 +232,8 @@ typedef enum
 typedef struct DatachecksumsWorkerShmemStruct
 {
 	/*
-	 * These are set by pg_enable/disable_checkums, to tell the launcher what
-	 * the target state is.
+	 * These are set by pg_enable/disable_data_checksums, to tell the launcher
+	 * what the target state is.
 	 */
 	bool		launch_enable_checksums;	/* True if checksums are being
 											 * enabled, else false */
@@ -251,9 +251,9 @@ typedef struct DatachecksumsWorkerShmemStruct
 	/*
 	 * These fields indicate the target state that the launcher is currently
 	 * working towards. They can be different from the corresponding launch_*
-	 * fields, if a new pg_enable_disable_checksums() call was made while the
-	 * launcher/worker was already running.
-
+	 * fields, if a new pg_enable/disable_data_checksums() call was made while
+	 * the launcher/worker was already running.
+	 *
 	 * The below members are set when the launcher starts, and are only
 	 * accessed read-only by the single worker. Thus, we can access these
 	 * without a lock. If multiple workers, or dynamic cost parameters, are
@@ -272,10 +272,11 @@ typedef struct DatachecksumsWorkerShmemStruct
 	 * the need for a lock. If multiple workers are supported then this will
 	 * have to be revisited.
 	 */
+
 	/* result, set by worker before exiting */
 	DatachecksumsWorkerResult success;
 
-	/* tells the worker process whether it should also process the shared catalogs. */
+	/* tells the worker process whether it should also process the shared catalogs */
 	bool		process_shared_catalogs;
 } DatachecksumsWorkerShmemStruct;
 
@@ -309,7 +310,7 @@ static volatile sig_atomic_t abort_requested = false;
 static volatile sig_atomic_t launcher_running = false;
 
 /*
- * Are we enabling checkums, or disabling them?
+ * Are we enabling data checksums, or disabling them?
  */
 static bool enabling_checksums;
 
@@ -358,20 +359,20 @@ StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost
 	/*
 	 * Launch a new launcher process, if it's not running already.
 	 *
-	 * If the launcher is currently busy enabling the checkums, and we want
+	 * If the launcher is currently busy enabling the checksums, and we want
 	 * them disabled (or vice versa), the launcher will notice that at latest
 	 * when it's about to exit, and will loop back process the new request.
 	 * So if the launcher is already running, we don't need to do anything
 	 * more here to abort it.
 	 *
-	 * If you call pg_enable/disable_checksums() twice in a row, before the
-	 * launcher has had a chance to start up, we still end up launching it
+	 * If you call pg_enable/disable_data_checksums() twice in a row, before
+	 * the launcher has had a chance to start up, we still end up launching it
 	 * twice.  That's OK, the second invocation will see that a launcher is
 	 * already running and exit quickly.
 	 *
 	 * TODO: We could optimize here and skip launching the launcher, if we are
 	 * already in the desired state, i.e. if the checksums are already enabled
-	 * and you call pg_enable_checksums().
+	 * and you call pg_enable_data_checksums().
 	 */
 	if (!launcher_running)
 	{
@@ -816,7 +817,7 @@ DatachecksumsWorkerLauncherMain(Datum arg)
 
 	if (DatachecksumsWorkerShmem->launcher_running)
 	{
-		/* Launcher was already running. Let it finish. */
+		/* Launcher was already running, let it finish */
 		LWLockRelease(DatachecksumsWorkerLock);
 		return;
 	}
@@ -831,15 +832,17 @@ DatachecksumsWorkerLauncherMain(Datum arg)
 	LWLockRelease(DatachecksumsWorkerLock);
 
 	/*
-	 * The target state can change while we are busy enabling/disabling checksums,
-	 * if the user calls pg_disable/enable_checksums() before we are finished with
-	 * the previous request. In that case, we will loop back here, to process the
-	 * new request.
+	 * The target state can change while we are busy enabling/disabling
+	 * checksums, if the user calls pg_disable/enable_data_checksums() before
+	 * we are finished with the previous request. In that case, we will loop
+	 * back here, to process the new request.
 	 */
 again:
 
 	memset(operations, 0, sizeof(operations));
 
+	HOLD_INTERRUPTS();
+
 	/*
 	 * If we're asked to enable checksums, we need to check if processing was
 	 * previously interrupted such that we should resume rather than start
@@ -915,6 +918,8 @@ again:
 		operations[1] = RESET_STATE;
 	}
 
+	RESUME_INTERRUPTS();
+
 	for (int i = 0; i < MAX_OPS; i++)
 	{
 		current = operations[i];
@@ -939,15 +944,41 @@ again:
 			case RESET_STATE:
 				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
 				if (!status)
+				{
+					/*
+					 * If the target state changed during processing then it's
+					 * not a failure, so restart processing instead.
+					 */
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+					{
+						LWLockRelease(DatachecksumsWorkerLock);
+						goto done;
+					}
+					LWLockRelease(DatachecksumsWorkerLock);
 					ereport(ERROR,
 							(errmsg("unable to reset catalog checksum state")));
+				}
 				break;
 
 			case ENABLE_CHECKSUMS:
 				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
 				if (!status)
+				{
+					/*
+					 * If the target state changed during processing then it's
+					 * not a failure, so restart processing instead.
+					 */
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+					{
+						LWLockRelease(DatachecksumsWorkerLock);
+						goto done;
+					}
+					LWLockRelease(DatachecksumsWorkerLock);
 					ereport(ERROR,
 							(errmsg("unable to enable checksums in cluster")));
+				}
 				break;
 
 			default:
@@ -965,6 +996,7 @@ done:
 	if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
 	{
 		DatachecksumsWorkerShmem->enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
 		DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
 		DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
 		LWLockRelease(DatachecksumsWorkerLock);
-- 
2.21.1 (Apple Git-122.3)

#80

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Daniel Gustafsson (#79)

1 attachment(s)

Re: Online checksums patch - once again

The previous v35 had a tiny conflict in pg_class.h, the attached v36 (which is
a squash of the 2 commits in v35) fixes that. No other changes are introduced
in this version.

--
Daniel Gustafsson https://vmware.com/

Attachments:

v36-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v36-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 8562228fa449e2af3f458c894aa3a0962b7fc11e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 27 Jan 2021 17:20:44 +0200
Subject: [PATCH v36] Support checksum enable/disable in a running cluster v36

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Data checksums could prior to this only be enabled at initdb time, or
when the cluster is offline using pg_checksums. This commit introduce
functionality to enable, and disable, data checksums without the need
for turning off the cluster.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. Once all relations
in all databases have been processed, the data_checksums state can be
set to "on" and the cluster will at that point be identical to one
which had checksums enabled from the start.

While the cluster is writing checksums on existing buffers, checksums
are written but not verified during reading to avoid false negatives.
The status of each relation is tracked with a new flag in pg_class,
relhaschecksums, which enables processing to be restarted in case
the cluster is restarted before checksums are enabled. Disabling
checksums will clear the relhaschecksums flag but will not touch any
buffers (but existing checksums cannot be re-used in case checksums
are immediately re-enabled). While disabling, checksums are again
written but not verified to ensure that concurrent backends which
haven't started disabling checksums will incur a verification error.

New in-progress states are introduced for data_checksums which during
processing ensures that backends know whether to verify and write
checksums. All state changes across backends are synchronized using a
procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml                   |   11 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/transam/xlog.c            |  452 ++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/heap.c                   |    7 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |   10 +
 src/backend/postmaster/datachecksumsworker.c | 1562 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |    5 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   29 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/cache/relcache.c           |   60 +-
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_class.h               |    3 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   30 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   92 ++
 src/test/checksum/t/002_restarts.pl          |  117 ++
 src/test/checksum/t/003_standby_checksum.pl  |  127 ++
 src/test/checksum/t/004_offline.pl           |  105 ++
 src/test/perl/PostgresNode.pm                |   36 +
 51 files changed, 3017 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..56e041a8cc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2167,6 +2167,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>relhaschecksums</structfield> <type>bool</type>
+      </para>
+      <para>
+        True if relation has data checksums on all pages. This state is only
+        used during checksum processing; this field should never be consulted
+        for cluster checksum status.
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>relrewrite</structfield> <type>oid</type>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b7150510ab..515e64a4e1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25903,6 +25903,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..c94faa11e0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3697,8 +3697,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3708,8 +3707,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..ffcd889908 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f03bd473e2..f2b37fae0b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,370 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * Data checksum state can only be transitioned to "inprogress-on" from
+	 * "off", if data checksums are in any other state then exit.
+	 */
+	if (ControlFile->data_checksum_version != 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	/*
+	 * The state transition is performed in a critical section with
+	 * checkpoints held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == 0 ||
+		   LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7994,6 +8362,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -9903,6 +10297,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10358,6 +10770,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 5e1aab319d..5d77be8a2d 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -784,3 +785,49 @@ pg_promote(PG_FUNCTION_ARGS)
 			(errmsg("server did not promote within %d seconds", wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9abc4a1f55..87052b0693 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -974,10 +974,17 @@ InsertPgClassTuple(Relation pg_class_desc,
 	/* relpartbound is set by updating this tuple, if necessary */
 	nulls[Anum_pg_class_relpartbound - 1] = true;
 
+	/*
+	 * Hold off interrupts to ensure that the observed data checksum state
+	 * cannot change as we form and insert the tuple.
+	 */
+	HOLD_INTERRUPTS();
+	values[Anum_pg_class_relhaschecksums - 1] = BoolGetDatum(DataChecksumsNeedWrite());
 	tup = heap_form_tuple(RelationGetDescr(pg_class_desc), values, nulls);
 
 	/* finally insert the new tuple, update the indexes, and clean up */
 	CatalogTupleInsert(pg_class_desc, tup);
+	RESUME_INTERRUPTS();
 
 	heap_freetuple(tup);
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..516ae666b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..8afbf762af 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,15 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
+	},
+	{
+		"ResetDataChecksumsStateInDatabase", ResetDataChecksumsStateInDatabase
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..cf6b876a57
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1562 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is
+ * recorded in the catalog and control file, and no changes are performed
+ * on the data pages or in the catalog.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed as well as have its
+ * state recorded in the catalog to avoid the datachecksumsworker having to
+ * process it when already checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written. Once all data pages
+ * in a relation have been written, pg_class.relhaschecksums is set to true to
+ * indicate that the relation is done.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from where it left off given that pg_class.relhaschecksums track state of
+ * processed relations and the in-progress state will ensure all new writes
+ * performed with checksums. Each database will be reprocessed, but relations
+ * where pg_class.relhaschecksums is true are skipped.
+ *
+ * If data checksums are enabled, then disabled, and then re-enabled, every
+ * relation's pg_class.relhaschecksums field will be reset to false before
+ * entering the in-progress mode.
+ *
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation. During "inprogress-off", the catalog
+ * state pg_class.relhaschecksums is cleared for all relations.
+ *
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum DataChecksumOperation
+{
+	ENABLE_CHECKSUMS = 1,
+	DISABLE_CHECKSUMS,
+	RESET_STATE,
+	SET_INPROGRESS_ON,
+	SET_CHECKSUMS_ON
+}			DataChecksumOperation;
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+/*
+ * Signaling between backends calling pg_enable/disable_data_checksums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DatachecksumsWorkerLock
+ */
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * These are set by pg_enable/disable_data_checksums, to tell the launcher
+	 * what the target state is.
+	 */
+	bool		launch_enable_checksums;	/* True if checksums are being
+											 * enabled, else false */
+	int			launch_cost_delay;
+	int			launch_cost_limit;
+
+	/*
+	 * Is a launcher process is currently running?
+	 *
+	 * This is set by the launcher process, after it has read the above launch_*
+	 * parameters.
+	 */
+	bool		launcher_running;
+
+	/*
+	 * These fields indicate the target state that the launcher is currently
+	 * working towards. They can be different from the corresponding launch_*
+	 * fields, if a new pg_enable/disable_data_checksums() call was made while
+	 * the launcher/worker was already running.
+	 *
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	bool		enabling_checksums;	/* True if checksums are being enabled,
+									 * else false */
+	int			cost_delay;
+	int			cost_limit;
+
+	/*
+	 * Signaling between the launcher and the worker process.
+	 *
+	 * As there is only a single worker, and the launcher
+	 * won't read these until the worker exits, they can be accessed without
+	 * the need for a lock. If multiple workers are supported then this will
+	 * have to be revisited.
+	 */
+
+	/* result, set by worker before exiting */
+	DatachecksumsWorkerResult success;
+
+	/* tells the worker process whether it should also process the shared catalogs */
+	bool		process_shared_catalogs;
+} DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/*
+ * Have we set the DatachecksumsWorkerShmemStruct->launcher_running flag?
+ * If we have, we need to clear it before exiting!
+ */
+static volatile sig_atomic_t launcher_running = false;
+
+/*
+ * Are we enabling data checksums, or disabling them?
+ */
+static bool enabling_checksums;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name);
+static bool ProcessAllDatabases(bool *already_connected, const char *bgw_func_name);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void SetRelHasChecksums(Oid relOid);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	bool		launcher_running;
+
+	/* the cost delay settings have no effect when disabling */
+	Assert(enable_checksums || cost_delay == 0);
+	Assert(enable_checksums || cost_limit == 0);
+
+	/*
+	 * Store the desired state in shared memory.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	DatachecksumsWorkerShmem->launch_enable_checksums = enable_checksums;
+	DatachecksumsWorkerShmem->launch_cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->launch_cost_limit = cost_limit;
+
+	/* is the launcher already running? */
+	launcher_running = DatachecksumsWorkerShmem->launcher_running;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Launch a new launcher process, if it's not running already.
+	 *
+	 * If the launcher is currently busy enabling the checksums, and we want
+	 * them disabled (or vice versa), the launcher will notice that at latest
+	 * when it's about to exit, and will loop back process the new request.
+	 * So if the launcher is already running, we don't need to do anything
+	 * more here to abort it.
+	 *
+	 * If you call pg_enable/disable_data_checksums() twice in a row, before
+	 * the launcher has had a chance to start up, we still end up launching it
+	 * twice.  That's OK, the second invocation will see that a launcher is
+	 * already running and exit quickly.
+	 *
+	 * TODO: We could optimize here and skip launching the launcher, if we are
+	 * already in the desired state, i.e. if the checksums are already enabled
+	 * and you call pg_enable_data_checksums().
+	 */
+	if (!launcher_running)
+	{
+		/*
+		 * Prepare the BackgroundWorker and launch it.
+		 */
+		memset(&bgw, 0, sizeof(bgw));
+		bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+		bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+		snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+		snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+		snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+		bgw.bgw_restart_time = BGW_NEVER_RESTART;
+		bgw.bgw_notify_pid = MyProcPid;
+		bgw.bgw_main_arg = (Datum) 0;
+
+		if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+			ereport(ERROR,
+					(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		Assert(enabling_checksums);
+		if (!DatachecksumsWorkerShmem->launch_enable_checksums)
+			abort_requested = true;
+		if (abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	if (!aborted)
+		SetRelHasChecksums(relationId);
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * SetRelHasChecksums
+ *
+ * Sets the pg_class.relhaschecksums flag for the relation specified by relOid
+ * to true. The corresponding function for clearing state is
+ * ResetDataChecksumsStateInDatabase which operate on all relations in a
+ * database.
+ */
+static void
+SetRelHasChecksums(Oid relOid)
+{
+	Relation	rel;
+	Relation	heaprel;
+	Form_pg_class pg_class_tuple;
+	HeapTuple	tuple;
+
+	/*
+	 * If the relation has gone away since we checksummed it then that's not
+	 * an errorcase. Exit early and continue on the next relation instead.
+	 */
+	heaprel = try_relation_open(relOid, ShareUpdateExclusiveLock);
+	if (!heaprel)
+		return;
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+
+	tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relOid));
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for relation %u", relOid);
+
+	pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
+	pg_class_tuple->relhaschecksums = true;
+
+	CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+
+	heap_freetuple(tuple);
+
+	table_close(rel, RowExclusiveLock);
+	relation_close(heaprel, ShareUpdateExclusiveLock);
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db, const char *bgw_func_name)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", bgw_func_name);
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be resumed. When
+	 * disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected. In the
+	 * latter case we might have stale relhaschecksums flags in pg_class which
+	 * it would be nice to handle in some way. Enabling data checksums reset
+	 * the flags so any stale flags won't cause problems at that point, but
+	 * they may cause confusion with users reading pg_class. TODO.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	if (launcher_running)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		launcher_running = false;
+		DatachecksumsWorkerShmem->launcher_running = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	abort_requested = true;
+
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still need
+	 * to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ *
+ * NB: this will return early, if aborted by SIGINT or if the target state
+ * is changed while we're running.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (TransactionIdPrecedes(GetOldestActiveTransactionId(), waitforxid))
+	{
+		char		activity[64];
+		int			rc;
+
+		/* Oldest running xid is older than us, so wait */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for current transactions to finish (waiting for %u)",
+				 waitforxid);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errmsg("postmaster exited during data checksum processing"),
+					 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_SHARED);
+		if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+			abort_requested = true;
+		LWLockRelease(DatachecksumsWorkerLock);
+		if (abort_requested)
+			break;
+	}
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+	return;
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+	DataChecksumOperation current;
+	int			operations[MAX_OPS];
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	if (DatachecksumsWorkerShmem->launcher_running)
+	{
+		/* Launcher was already running, let it finish */
+		LWLockRelease(DatachecksumsWorkerLock);
+		return;
+	}
+
+	launcher_running = true;
+
+	enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+	DatachecksumsWorkerShmem->launcher_running = true;
+	DatachecksumsWorkerShmem->enabling_checksums = enabling_checksums;
+	DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * The target state can change while we are busy enabling/disabling
+	 * checksums, if the user calls pg_disable/enable_data_checksums() before
+	 * we are finished with the previous request. In that case, we will loop
+	 * back here, to process the new request.
+	 */
+again:
+
+	memset(operations, 0, sizeof(operations));
+
+	HOLD_INTERRUPTS();
+
+	/*
+	 * If we're asked to enable checksums, we need to check if processing was
+	 * previously interrupted such that we should resume rather than start
+	 * from scratch.
+	 */
+	if (enabling_checksums)
+	{
+		/*
+		 * If we are asked to enable checksums in a cluster which already
+		 * has checksums enabled, exit immediately as there is nothing
+		 * more to do.
+		 */
+		if (DataChecksumsNeedVerify())
+			goto done;
+
+		/*
+		 * If the controlfile state is set to "inprogress-on" then we will
+		 * resume from where we left off based on the catalog state. This
+		 * will be safe since new relations created while the checksum-
+		 * worker was disabled will have checksums enabled.
+		 */
+		else if (DataChecksumsOnInProgress())
+		{
+			operations[0] = ENABLE_CHECKSUMS;
+			operations[1] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * If the controlfile state is set to "inprogress-off" then we
+		 * were interrupted while the catalog state was being cleared. In
+		 * this case we need to first reset state and then continue with
+		 * enabling checksums.
+		 */
+		else if (DataChecksumsOffInProgress())
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+
+		/*
+		 * Data checksums are off in the cluster, we can proceed with
+		 * enabling them. Just in case we will start by resetting the
+		 * catalog state since we are doing this from scratch and we don't
+		 * want leftover catalog state to cause us to miss a relation.
+		 */
+		else
+		{
+			operations[0] = RESET_STATE;
+			operations[1] = SET_INPROGRESS_ON;
+			operations[2] = ENABLE_CHECKSUMS;
+			operations[3] = SET_CHECKSUMS_ON;
+		}
+	}
+	else
+	{
+		/*
+		 * Regardless of current state in the system, we go through the
+		 * motions when asked to disable checksums. The catalog state is
+		 * only defined to be relevant during the operation of enabling
+		 * checksums, and have no use at any other point in time. That
+		 * being said, a user who sees stale relhaschecksums entries in
+		 * the catalog might run this just in case.
+		 *
+		 * Resetting state must be performed after setting data checksum
+		 * state to off, as there otherwise might (depending on system
+		 * data checksum state) be a window between catalog resetting and
+		 * state transition when new relations are created with the
+		 * catalog state set to true.
+		 */
+		operations[0] = DISABLE_CHECKSUMS;
+		operations[1] = RESET_STATE;
+	}
+
+	RESUME_INTERRUPTS();
+
+	for (int i = 0; i < MAX_OPS; i++)
+	{
+		current = operations[i];
+
+		if (!current)
+			break;
+
+		switch (current)
+		{
+			case DISABLE_CHECKSUMS:
+				SetDataChecksumsOff();
+				break;
+
+			case SET_INPROGRESS_ON:
+				SetDataChecksumsOnInProgress();
+				break;
+
+			case SET_CHECKSUMS_ON:
+				SetDataChecksumsOn();
+				break;
+
+			case RESET_STATE:
+				status = ProcessAllDatabases(&connected, "ResetDataChecksumsStateInDatabase");
+				if (!status)
+				{
+					/*
+					 * If the target state changed during processing then it's
+					 * not a failure, so restart processing instead.
+					 */
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+					{
+						LWLockRelease(DatachecksumsWorkerLock);
+						goto done;
+					}
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to reset catalog checksum state")));
+				}
+				break;
+
+			case ENABLE_CHECKSUMS:
+				status = ProcessAllDatabases(&connected, "DatachecksumsWorkerMain");
+				if (!status)
+				{
+					/*
+					 * If the target state changed during processing then it's
+					 * not a failure, so restart processing instead.
+					 */
+					LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+					if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+					{
+						LWLockRelease(DatachecksumsWorkerLock);
+						goto done;
+					}
+					LWLockRelease(DatachecksumsWorkerLock);
+					ereport(ERROR,
+							(errmsg("unable to enable checksums in cluster")));
+				}
+				break;
+
+			default:
+				elog(ERROR, "unknown checksum operation requested");
+				break;
+		}
+	}
+
+done:
+	/*
+	 * All done. But before we exit, check if the target state was changed while
+	 * we were running. In that case we will have to start all over again.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+	{
+		DatachecksumsWorkerShmem->enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+		LWLockRelease(DatachecksumsWorkerLock);
+		goto again;
+	}
+
+	launcher_running = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for either
+ * enabling checksums or resetting the checksum catalog tracking. Until no
+ * new databases are found, this will loop around computing a new list and
+ * comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected, const char *bgw_func_name)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last
+	 * run), we have to repeat this loop until no new databases show up in the
+	 * list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db, bgw_func_name);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launch_enable_checksums = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relhaschecksums)
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * ResetDataChecksumsStateInDatabase
+ *		Main worker function for clearing checksums state in the catalog
+ *
+ * Resets the pg_class.relhaschecksums flag to false for all entries in the
+ * current database. This is required to be performed before adding checksums
+ * to a running cluster in order to track the state of the processing.
+ */
+void
+ResetDataChecksumsStateInDatabase(Datum arg)
+{
+	Relation	rel;
+	HeapTuple	tuple;
+	Oid			dboid = DatumGetObjectId(arg);
+	TableScanDesc scan;
+	Form_pg_class pgc;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("resetting catalog state for data checksums in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid, BGWORKER_BYPASS_ALLOWCONN);
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, RowExclusiveLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tuple = heap_getnext(scan, ForwardScanDirection)))
+	{
+		tuple = heap_copytuple(tuple);
+		pgc = (Form_pg_class) GETSTRUCT(tuple);
+
+		if (pgc->relhaschecksums)
+		{
+			pgc->relhaschecksums = false;
+			CatalogTupleUpdate(rel, &tuple->t_self, tuple);
+		}
+
+		heap_freetuple(tuple);
+	}
+
+	table_endscan(scan);
+	table_close(rel, RowExclusiveLock);
+
+	CommitTransactionCommand();
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	enabling_checksums = true;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	Assert(DatachecksumsWorkerShmem->enabling_checksums);
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	for (;;)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 5000,
+						 WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		aborted = DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		if (aborted || abort_requested)
+		{
+			DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+			ereport(DEBUG1,
+					(errmsg("data checksum processing aborted in database OID %u",
+							dboid)));
+			return;
+		}
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..9362ec0018 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2944,8 +2944,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5b083749d5 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..8fbebd9870 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,13 +100,20 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		/*
+		 * Hold interrupts for the duration of the checksum check to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
 			if (checksum != p->pd_checksum)
 				checksum_failure = true;
 		}
+		RESUME_INTERRUPTS();
 
 		/*
 		 * The following checks don't prove the header is correct, only that
@@ -1394,10 +1401,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,8 +1410,17 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	HOLD_INTERRUPTS();
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
+		return (char *) page;
+	}
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
+	RESUME_INTERRUPTS();
 	return pageCopy;
 }
 
@@ -1421,9 +1433,14 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
+	HOLD_INTERRUPTS();
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+	{
+		RESUME_INTERRUPTS();
 		return;
+	}
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
+	RESUME_INTERRUPTS();
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..633821bae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -271,7 +271,8 @@ static void write_relcache_init_file(bool shared);
 static void write_item(const void *data, Size len, FILE *fp);
 
 static void formrdesc(const char *relationName, Oid relationReltype,
-					  bool isshared, int natts, const FormData_pg_attribute *attrs);
+					  bool isshared, int natts, const FormData_pg_attribute *attrs,
+					  bool haschecksums);
 
 static HeapTuple ScanPgRelation(Oid targetRelId, bool indexOK, bool force_non_historic);
 static Relation AllocateRelationDesc(Form_pg_class relp);
@@ -1828,7 +1829,8 @@ RelationInitTableAccessMethod(Relation relation)
 static void
 formrdesc(const char *relationName, Oid relationReltype,
 		  bool isshared,
-		  int natts, const FormData_pg_attribute *attrs)
+		  int natts, const FormData_pg_attribute *attrs,
+		  bool haschecksums)
 {
 	Relation	relation;
 	int			i;
@@ -1896,6 +1898,8 @@ formrdesc(const char *relationName, Oid relationReltype,
 	relation->rd_rel->relnatts = (int16) natts;
 	relation->rd_rel->relam = HEAP_TABLE_AM_OID;
 
+	relation->rd_rel->relhaschecksums = haschecksums;
+
 	/*
 	 * initialize attribute tuple form
 	 *
@@ -3548,6 +3552,27 @@ RelationBuildLocalRelation(const char *relname,
 		relkind == RELKIND_MATVIEW)
 		RelationInitTableAccessMethod(rel);
 
+	/*
+	 * Set the data checksum state. Since the data checksum state can change
+	 * at any time, the fetched value might be out of date by the time the
+	 * relation is built.  DataChecksumsNeedWrite returns true when data
+	 * checksums are: enabled; are in the process of being enabled (state:
+	 * "inprogress-on"); are in the process of being disabled (state:
+	 * "inprogress-off"). Since relhaschecksums is only used to track progress
+	 * when data checksums are being enabled, and going from disabled to
+	 * enabled will clear relhaschecksums before starting, it is safe to use
+	 * this value for a concurrent state transition to off.
+	 *
+	 * If DataChecksumsNeedWrite returns false, and is concurrently changed to
+	 * true then that implies that checksums are being enabled. Worst case,
+	 * this will lead to the relation being processed for checksums even
+	 * though each page written will have them already.  Performing this last
+	 * shortens the window, but doesn't avoid it.
+	 */
+	HOLD_INTERRUPTS();
+	rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+	RESUME_INTERRUPTS();
+
 	/*
 	 * Okay to insert into the relcache hash table.
 	 *
@@ -3813,6 +3838,7 @@ void
 RelationCacheInitializePhase2(void)
 {
 	MemoryContext oldcxt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3837,16 +3863,24 @@ RelationCacheInitializePhase2(void)
 	 */
 	if (!load_relcache_init_file(true))
 	{
+		/*
+		 * Our local state can't change at this point, so we can cache the
+		 * checksum state.
+		 */
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
+
 		formrdesc("pg_database", DatabaseRelation_Rowtype_Id, true,
-				  Natts_pg_database, Desc_pg_database);
+				  Natts_pg_database, Desc_pg_database, haschecksums);
 		formrdesc("pg_authid", AuthIdRelation_Rowtype_Id, true,
-				  Natts_pg_authid, Desc_pg_authid);
+				  Natts_pg_authid, Desc_pg_authid, haschecksums);
 		formrdesc("pg_auth_members", AuthMemRelation_Rowtype_Id, true,
-				  Natts_pg_auth_members, Desc_pg_auth_members);
+				  Natts_pg_auth_members, Desc_pg_auth_members, haschecksums);
 		formrdesc("pg_shseclabel", SharedSecLabelRelation_Rowtype_Id, true,
-				  Natts_pg_shseclabel, Desc_pg_shseclabel);
+				  Natts_pg_shseclabel, Desc_pg_shseclabel, haschecksums);
 		formrdesc("pg_subscription", SubscriptionRelation_Rowtype_Id, true,
-				  Natts_pg_subscription, Desc_pg_subscription);
+				  Natts_pg_subscription, Desc_pg_subscription, haschecksums);
 
 #define NUM_CRITICAL_SHARED_RELS	5	/* fix if you change list above */
 	}
@@ -3875,6 +3909,7 @@ RelationCacheInitializePhase3(void)
 	RelIdCacheEnt *idhentry;
 	MemoryContext oldcxt;
 	bool		needNewCacheFile = !criticalSharedRelcachesBuilt;
+	bool		haschecksums;
 
 	/*
 	 * relation mapper needs initialized too
@@ -3895,15 +3930,18 @@ RelationCacheInitializePhase3(void)
 		!load_relcache_init_file(false))
 	{
 		needNewCacheFile = true;
+		HOLD_INTERRUPTS();
+		haschecksums = DataChecksumsNeedWrite();
+		RESUME_INTERRUPTS();
 
 		formrdesc("pg_class", RelationRelation_Rowtype_Id, false,
-				  Natts_pg_class, Desc_pg_class);
+				  Natts_pg_class, Desc_pg_class, haschecksums);
 		formrdesc("pg_attribute", AttributeRelation_Rowtype_Id, false,
-				  Natts_pg_attribute, Desc_pg_attribute);
+				  Natts_pg_attribute, Desc_pg_attribute, haschecksums);
 		formrdesc("pg_proc", ProcedureRelation_Rowtype_Id, false,
-				  Natts_pg_proc, Desc_pg_proc);
+				  Natts_pg_proc, Desc_pg_proc, haschecksums);
 		formrdesc("pg_type", TypeRelation_Rowtype_Id, false,
-				  Natts_pg_type, Desc_pg_type);
+				  Natts_pg_type, Desc_pg_type, haschecksums);
 
 #define NUM_CRITICAL_LOCAL_RELS 4	/* fix if you change list above */
 	}
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..3d108a2348 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index bb6938caa2..1533b0c6c9 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -119,6 +119,9 @@ CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,Relat
 	/* is relation a partition? */
 	bool		relispartition BKI_DEFAULT(f);
 
+	/* does the relation have checksums enabled */
+	bool		relhaschecksums BKI_DEFAULT(f);
+
 	/* link to original rel during table rewrite; otherwise 0 */
 	Oid			relrewrite BKI_DEFAULT(0) BKI_LOOKUP_OPT(pg_class);
 
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..5087b5355e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11313,6 +11313,22 @@
   proname => 'jsonb_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'jsonb_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..845f6bceaa
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,30 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Start the background processes for enabling or disabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+void		ResetDataChecksumsStateInDatabase(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index f7859c2fd5..f468709e7e 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..3b229de915
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,92 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# No relation in pg_class should have relhaschecksums at this point
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;");
+is($result, '0', 'ensure no entries in pg_class has checksums recorded');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Check that relations with storage have been marked with relhaschecksums in
+# pg_class
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'ensure all relations are correctly flagged in the catalog');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+# Wait for checksums to be disabled. Disabling checksums clear the catalog
+# relhaschecksums state so await that before calling it done.
+$result = $node->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the relhaschecksums flags in the
+# catalog aren't tricking processing into skipping previously checksummed
+# relations
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..41a4d64037
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,117 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');",
+	'1');
+is($result, 1, 'ensure there is a single table left');
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+	"SELECT wait_event FROM pg_stat_activity WHERE backend_type = 'datachecksumsworker worker';"
+);
+is($result, 'ChecksumEnableFinishCondition', 'test for correct wait event');
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '1',
+	'doublecheck that there is a single table left before restarting');
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$result = $node->safe_psql('postgres',
+		"SELECT count(*) FROM pg_catalog.pg_class WHERE NOT relhaschecksums "
+	  . "AND relkind IN ('r', 'i', 'S', 't', 'm');");
+is($result, '0', 'no temporary tables this time around');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..1555a1694b
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,127 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+cmp_ok(
+	$result, '~~',
+	[ "inprogress-on", "on" ],
+	'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+$result = $node_primary->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until('postgres',
+	"SELECT count(*) FROM pg_catalog.pg_class WHERE relhaschecksums;", '0');
+is($result, '1', 'ensure no entries in pg_class has checksums recorded');
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..2dfca4df23
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..b7431a7600 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return;
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#81

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Daniel Gustafsson (#80)

Re: Online checksums patch - once again

(I may have said this before, but) My overall high-level impression of
this patch is that it's really cmmplex for a feature that you use maybe
once in the lifetime of a cluster. I'm happy to review but I'm not
planning to commit this myself. I don't object if some other committer
picks this up (Magnus?).

Now to the latest patch version:

On 03/02/2021 18:15, Daniel Gustafsson wrote:

The previous v35 had a tiny conflict in pg_class.h, the attached v36 (which is
a squash of the 2 commits in v35) fixes that. No other changes are introduced
in this version.

/*
* Check to see if my copy of RedoRecPtr is out of date. If so, may have
* to go back and have the caller recompute everything. This can only
* happen just after a checkpoint, so it's better to be slow in this case
* and fast otherwise.
*
* Also check to see if fullPageWrites or forcePageWrites was just turned
* on, or if we are in the process of enabling checksums in the cluster;
* if we weren't already doing full-page writes then go back and recompute.
*
* If we aren't doing full-page writes then RedoRecPtr doesn't actually
* affect the contents of the XLOG record, so we'll update our local copy
* but not force a recomputation. (If doPageWrites was just turned off,
* we could recompute the record without full pages, but we choose not to
* bother.)
*/
if (RedoRecPtr != Insert->RedoRecPtr)
{
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Why does this use DataChecksumsOnInProgress() instead of
DataChecksumsNeedWrite()? If checksums are enabled, you always need
full-page writes, don't you? If not, then why is it needed in the
inprogress-on state?

We also set doPageWrites in InitXLOGAccess(). That should match the
condition above (although it doesn't matter for correctness).

/*
* DataChecksumsNeedVerify
* Returns whether data checksums must be verified or not
*
* Data checksums are only verified if they are fully enabled in the cluster.
* During the "inprogress-on" and "inprogress-off" states they are only
* updated, not verified (see datachecksumsworker.c for a longer discussion).
*
* This function is intended for callsites which have read data and are about
* to perform checksum validation based on the result of this. To avoid the
* the risk of the checksum state changing between reading and performing the
* validation (or not), interrupts must be held off. This implies that calling
* this function must be performed as close to the validation call as possible
* to keep the critical section short. This is in order to protect against
* time of check/time of use situations around data checksum validation.
*/
bool
DataChecksumsNeedVerify(void)
{
Assert(InterruptHoldoffCount > 0);
return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
}

What exactly is the intended call pattern here? Something like this?

smgrread() a data page
HOLD_INTERRUPTS();
if (DataChecksumsNeedVerify())
{
if (pg_checksum_page((char *) page, blkno) != expected)
elog(ERROR, "bad checksum");
}
RESUME_INTERRUPTS();

That seems to be what the code currently does. What good does holding
interrupts do here? If checksums were not fully enabled at the
smgrread() call, the page might have incorrect checksums, and if the
state transitions from inprogress-on to on between the smggread() call
and the DataChecksumsNeedVerify() call, you'll get an error. I think you
need to hold the interrupts *before* the smgrread() call.

/*
* Set checksum for a page in private memory.
*
* This must only be used when we know that no other process can be modifying
* the page buffer.
*/
void
PageSetChecksumInplace(Page page, BlockNumber blkno)
{
HOLD_INTERRUPTS();
/* If we don't need a checksum, just return */
if (PageIsNew(page) || !DataChecksumsNeedWrite())
{
RESUME_INTERRUPTS();
return;
}

((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
RESUME_INTERRUPTS();
}

The checksums state might change just after this call, before the caller
has actually performed the smgrwrite() or smgrextend() call. The caller
needs to hold interrupts across this function and the
smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here,
because that just masks bugs where the caller isn't holding the
interrupts. Same in PageSetChecksumCopy().

- Heikki

#82

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Heikki Linnakangas (#81)

Re: Online checksums patch - once again

On Tue, Feb 09, 2021 at 10:54:50AM +0200, Heikki Linnakangas wrote:

(I may have said this before, but) My overall high-level impression of this
patch is that it's really cmmplex for a feature that you use maybe once in
the lifetime of a cluster. I'm happy to review but I'm not planning to
commit this myself. I don't object if some other committer picks this up
(Magnus?).

I was just looking at the latest patch set as a matter of curiosity,
and I have a shared feeling. I think that this is a lot of
complication in-core for what would be a one-time operation,
particularly knowing that there are other ways to do it already with
the offline checksum tool, even if that is more costly:
- Involve logical replication after initializing the new instance with
--data-checksums, or in an upgrade scenatio with pg_upgrade.
- Involve physical replication: stop the standby cleanly, enable
checksums on it and do a switchover.

Another thing we could do is to improve pg_checksums with a parallel
mode. The main design question would be how to distribute the I/O,
and that would mean balancing at least across tablespaces.
--
Michael

#83

Michael Banck

michael.banck@credativ.de

almost 5 years ago

In reply to: Michael Paquier (#82)

Offline activation of checksums via standby switchover (was: Online checksums patch - once again)

Hi,

Am Mittwoch, den 10.02.2021, 15:06 +0900 schrieb Michael Paquier:

On Tue, Feb 09, 2021 at 10:54:50AM +0200, Heikki Linnakangas wrote:

(I may have said this before, but) My overall high-level impression of this
patch is that it's really cmmplex for a feature that you use maybe once in
the lifetime of a cluster. I'm happy to review but I'm not planning to
commit this myself. I don't object if some other committer picks this up
(Magnus?).

I was just looking at the latest patch set as a matter of curiosity,
and I have a shared feeling.

I think this still would be a useful feature; not least for the online
deactivation - having to shut down the instance is sometimes just not an
option in production, even for just a few seconds.

However, there is also the shoot-the-whole-database-into-WAL (at least,
that is what happens, AIUI) issue which has not been discussed that much
either, the patch allows throttling, but I think the impact on actual
production workloads are not very clear yet.

I think that this is a lot of complication in-core for what would be a
one-time operation, particularly knowing that there are other ways to
do it already with the offline checksum tool, even if that is more
costly:
- Involve logical replication after initializing the new instance with
--data-checksums, or in an upgrade scenatio with pg_upgrade.

Logical replication is still somewhat unpractical for such a (possibly)
routine task, and I don't understand your pg_upgrade scenario, can
expand on that a bit?

- Involve physical replication: stop the standby cleanly, enable
checksums on it and do a switchover.

I would like to focus on this, so I changed the subject in order not to
derail the online acivation patch thread.

If this is something we support, then we should document it.

I have to admit that this possiblity escaped me when we first committed
offline (de)activation, it was brought to my attention via
https://twitter.com/samokhvalov/status/1281312586219188224 and the
following discussion.

So if we think this (to recap: shut down the standby, run pg_checksums
on it, start it up again, wait until it is back in sync, then
switchover) is a safe way to activate checksums on a streaming
replication setup, then we should document it I think. However, I have
only seen sorta hand-waiving on this so far and no deeper analysis of
what could possibly go wrong (but doesn't).

Anybody did some further work/tests on this and/or has something written
up to contribute to the documentation? Or do we think this is not
appropriate to document? I think once we agree this is safe, it is not
more complicated than the rsync-the-standby-after-pg_upgrade recipe we
did document.

Another thing we could do is to improve pg_checksums with a parallel
mode. The main design question would be how to distribute the I/O,
and that would mean balancing at least across tablespaces.

Right. I thought about this a while ago, but didn't have time to work on
it so far.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#84

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Heikki Linnakangas (#81)

Re: Online checksums patch - once again

On Tue, Feb 9, 2021 at 9:54 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(I may have said this before, but) My overall high-level impression of
this patch is that it's really cmmplex for a feature that you use maybe
once in the lifetime of a cluster. I'm happy to review but I'm not
planning to commit this myself. I don't object if some other committer
picks this up (Magnus?).

A fairly large amount of this complexity comes out of the fact that it
now supports restarting and tracks checksums on a per-table basis. We
skipped this in the original patch for exactly this reason (that's not
to say there isn't a fair amount of complexity even without it, but it
did substantially i increase both the size and the complexity of the
patch), but in the review of that i was specifically asked for having
that added. I personally don't think it's worth that complexity but at
the time that seemed to be a pretty strong argument. So I'm not
entirely sure how to move forward with that...

is your impression that it would still be too complicated, even without that?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#85

Bruce Momjian

bruce@momjian.us

almost 5 years ago

In reply to: Magnus Hagander (#84)

Re: Online checksums patch - once again

On Wed, Feb 10, 2021 at 03:25:58PM +0100, Magnus Hagander wrote:

On Tue, Feb 9, 2021 at 9:54 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(I may have said this before, but) My overall high-level impression of
this patch is that it's really cmmplex for a feature that you use maybe
once in the lifetime of a cluster. I'm happy to review but I'm not
planning to commit this myself. I don't object if some other committer
picks this up (Magnus?).

A fairly large amount of this complexity comes out of the fact that it
now supports restarting and tracks checksums on a per-table basis. We
skipped this in the original patch for exactly this reason (that's not
to say there isn't a fair amount of complexity even without it, but it
did substantially i increase both the size and the complexity of the
patch), but in the review of that i was specifically asked for having
that added. I personally don't think it's worth that complexity but at
the time that seemed to be a pretty strong argument. So I'm not
entirely sure how to move forward with that...

is your impression that it would still be too complicated, even without that?

I was wondering why this feature has stalled for so long --- now I know.
This does highlight the risk of implementing too many additions to a
feature. I am working against this dynamic in the cluster file
encryption feature I am working on.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#86

Heikki Linnakangas

hlinnaka@iki.fi

almost 5 years ago

In reply to: Magnus Hagander (#84)

Re: Online checksums patch - once again

On 10/02/2021 16:25, Magnus Hagander wrote:

On Tue, Feb 9, 2021 at 9:54 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(I may have said this before, but) My overall high-level impression of
this patch is that it's really cmmplex for a feature that you use maybe
once in the lifetime of a cluster. I'm happy to review but I'm not
planning to commit this myself. I don't object if some other committer
picks this up (Magnus?).

A fairly large amount of this complexity comes out of the fact that it
now supports restarting and tracks checksums on a per-table basis. We
skipped this in the original patch for exactly this reason (that's not
to say there isn't a fair amount of complexity even without it, but it
did substantially i increase both the size and the complexity of the
patch), but in the review of that i was specifically asked for having
that added. I personally don't think it's worth that complexity but at
the time that seemed to be a pretty strong argument. So I'm not
entirely sure how to move forward with that...

is your impression that it would still be too complicated, even without that?

I'm not sure. It would certainly be a lot better.

Wrt. restartability, I'm also not very happy with the way that works -
or rather doesn't :-) - in this patch. After shutting down and
restarting the cluster, you have to manually call
pg_enable_data_checksums() again to restart the checksumming process.

- Heikki

#87

Bruce Momjian

bruce@momjian.us

almost 5 years ago

In reply to: Bruce Momjian (#85)

Re: Online checksums patch - once again

On Wed, Feb 10, 2021 at 01:26:18PM -0500, Bruce Momjian wrote:

On Wed, Feb 10, 2021 at 03:25:58PM +0100, Magnus Hagander wrote:

A fairly large amount of this complexity comes out of the fact that it
now supports restarting and tracks checksums on a per-table basis. We
skipped this in the original patch for exactly this reason (that's not
to say there isn't a fair amount of complexity even without it, but it
did substantially i increase both the size and the complexity of the
patch), but in the review of that i was specifically asked for having
that added. I personally don't think it's worth that complexity but at
the time that seemed to be a pretty strong argument. So I'm not
entirely sure how to move forward with that...

is your impression that it would still be too complicated, even without that?

I was wondering why this feature has stalled for so long --- now I know.
This does highlight the risk of implementing too many additions to a
feature. I am working against this dynamic in the cluster file
encryption feature I am working on.

Oh, I think another reason this patchset has had problems is related to
something I mentioned in 2018:

/messages/by-id/20180801163613.GA2267@momjian.us

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution. Forcing a restart is probably part of that
primitive orchestration. We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly. These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

I don't think anyone has done anything wrong --- rather, it is what we
are _trying_ to do that is complex. Adding restartability to this just
added to the challenge.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#88

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Heikki Linnakangas (#81)

1 attachment(s)

Re: Online checksums patch - once again

On 9 Feb 2021, at 09:54, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(I may have said this before, but) My overall high-level impression of this patch is that it's really cmmplex for a feature that you use maybe once in the lifetime of a cluster.

The frequency of using a feature seems like a suboptimal metric of the value
and usefulness of it. I don't disagree that this patch is quite invasive and
complicated though, that's a side-effect of performing global state changes in
a distributed system which is hard to get around.

As was discussed downthread, some of the complexity stems from restartability
and the catalog state persistence. The attached version does away with this to
see how big the difference is. Personally I don't think it's enough to move
the needle but mmv.

I'm happy to review but I'm not planning to commit this myself.

Regardless, thanks for all review and thoughtful comments!

doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress());

Why does this use DataChecksumsOnInProgress() instead of DataChecksumsNeedWrite()? If checksums are enabled, you always need full-page writes, don't you? If not, then why is it needed in the inprogress-on state?

Correct, that's a thinko, it should be DataChecksumsNeedWrite.

We also set doPageWrites in InitXLOGAccess(). That should match the condition above (although it doesn't matter for correctness).

Good point, fixed.

I think you need to hold the interrupts *before* the smgrread() call.

Fixed.

/*
* Set checksum for a page in private memory.
*
* This must only be used when we know that no other process can be modifying
* the page buffer.
*/
void
PageSetChecksumInplace(Page page, BlockNumber blkno)
{
HOLD_INTERRUPTS();
/* If we don't need a checksum, just return */
if (PageIsNew(page) || !DataChecksumsNeedWrite())
{
RESUME_INTERRUPTS();
return;
}
((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
RESUME_INTERRUPTS();
}

The checksums state might change just after this call, before the caller has actually performed the smgrwrite() or smgrextend() call. The caller needs to hold interrupts across this function and the smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here, because that just masks bugs where the caller isn't holding the interrupts. Same in PageSetChecksumCopy().

Looking at the cases which could happen here in case of the below state change:

PageSetChecksumInPlace
B <---- state change
smgrwrite

The possible state transitions which can happen, and the consequences of them
are:

1) B = (off -> inprogress-on): We will write without a checksum. This relation
will then be processed by the DatachecksumsWorker. Readers will not verify
checksums.

2) B = (inprogress-on -> on): We will write with a checksum. Both those states
write checksums. Readers will verify the checksum.

3) B = (on -> inprogress-off): We will write with a checksum. Both these
states write checksums. Readers will not verify checksums.

4) B = (inprogress-off -> off): We will write with a checksum which is wasteful
and strictly speaking incorrect. Readers will not verify checksums.

The above assume there is only a single state transition between the call to
PageSetChecksumInPlace and smgrwrite/extend, which of course isn't guaranteed
to be the case (albeit a lot more likely than two).

The (off -> inprogress-on -> on) and (inprogress-on -> on -> inprogress-off)
transitions cannot happen here since the DatachecksumWorker will wait for
ongoing transactions before a transition to on can be initiated.

11) B = (on -> inprogress-off -> off): the checksum will be written when it
shouldn't be, but readers won't verify it.

22) B = (inprogress-off -> off -> inprogress-on): checksums are written without
being verified in both these states.

So these cornercases can happen but ending up with an incorrect verification is
likely hard, which is probably why they haven't shook out during testing. I've
updated to patch to hold interrupts across the smgrwrite/extend call.

--
Daniel Gustafsson https://vmware.com/

Attachments:

v37-0001-Support-checksum-enable-disable-in-a-running-clu.patchapplication/octet-stream; name=v37-0001-Support-checksum-enable-disable-in-a-running-clu.patch; x-unix-mode=0644Download

From 2eaee827433f319abd97969da8e2a43b05565679 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <daniel@yesql.se>
Date: Mon, 15 Feb 2021 11:03:12 +0100
Subject: [PATCH v37] Support checksum enable/disable in a running cluster v37

This allows data checksums to be enabled, or disabled, in a running
cluster without restricting access to the cluster during processing.

Data checksums could prior to this only be enabled at initdb time, or
when the cluster is offline using pg_checksums. This commit introduce
functionality to enable, and disable, data checksums without the need
for turning off the cluster.

A dynamic background worker is responsible for launching a per-database
worker which will mark all buffers dirty for all relation with storage
in order for them to have data checksums on write. Once all relations
in all databases have been processed, the data_checksums state can be
set to "on" and the cluster will at that point be identical to one
which had checksums enabled from the start.

While the cluster is writing checksums on existing buffers, checksums
are written but not verified during reading to avoid false negatives.
Disabling checksums will not touch any buffers (but existing checksums
cannot be re-used in case checksums are immediately re-enabled). While
disabling, checksums are again written but not verified to ensure that
concurrent backends which haven't started disabling checksums will
incur a verification error.

New in-progress states are introduced for data_checksums which during
processing ensures that backends know whether to verify and write
checksums. All state changes across backends are synchronized using a
procsignalbarrier.

Authors: Daniel Gustafsson, Magnus Hagander
Reviewed-by: Heikki Linnakangas, Robert Haas, Andres Freund, Tomas Vondra, Michael Banck, Andrey Borodin
Discussion: https://postgr.es/m/CABUevExz9hUUOLnJVr2kpw9Cx=o4MCr1SVKwbupzuxP7ckNutA@mail.gmail.com
Discussion: https://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
Discussion: https://postgr.es/m/CABUevEwE3urLtwxxqdgd5O2oQz9J717ZzMbh+ziCSa5YLLU_BA@mail.gmail.com
---
 contrib/bloom/blinsert.c                     |    2 +
 doc/src/sgml/func.sgml                       |   68 +
 doc/src/sgml/monitoring.sgml                 |    6 +-
 doc/src/sgml/ref/pg_checksums.sgml           |    6 +
 doc/src/sgml/wal.sgml                        |   57 +-
 src/backend/access/gist/gistbuild.c          |    4 +
 src/backend/access/hash/hashpage.c           |    2 +
 src/backend/access/heap/heapam.c             |    9 +-
 src/backend/access/heap/visibilitymap.c      |    2 +
 src/backend/access/nbtree/nbtree.c           |    2 +
 src/backend/access/nbtree/nbtsort.c          |    4 +
 src/backend/access/rmgrdesc/xlogdesc.c       |   18 +
 src/backend/access/spgist/spginsert.c        |    6 +
 src/backend/access/transam/xlog.c            |  438 +++++-
 src/backend/access/transam/xlogfuncs.c       |   47 +
 src/backend/catalog/storage.c                |    9 +
 src/backend/catalog/system_views.sql         |    5 +
 src/backend/postmaster/Makefile              |    1 +
 src/backend/postmaster/bgworker.c            |    7 +
 src/backend/postmaster/datachecksumsworker.c | 1327 ++++++++++++++++++
 src/backend/postmaster/pgstat.c              |    6 +
 src/backend/replication/basebackup.c         |    9 +-
 src/backend/replication/logical/decode.c     |    1 +
 src/backend/storage/buffer/bufmgr.c          |   13 +
 src/backend/storage/buffer/localbuf.c        |    3 +
 src/backend/storage/freespace/freespace.c    |    2 +
 src/backend/storage/ipc/ipci.c               |    3 +
 src/backend/storage/ipc/procsignal.c         |   33 +-
 src/backend/storage/lmgr/lwlocknames.txt     |    1 +
 src/backend/storage/page/README              |    4 +-
 src/backend/storage/page/bufpage.c           |   12 +-
 src/backend/utils/adt/pgstatfuncs.c          |    6 -
 src/backend/utils/init/miscinit.c            |    6 +
 src/backend/utils/init/postinit.c            |    5 +
 src/backend/utils/misc/guc.c                 |   37 +-
 src/bin/pg_checksums/pg_checksums.c          |    2 +-
 src/bin/pg_upgrade/controldata.c             |    9 +
 src/bin/pg_upgrade/pg_upgrade.h              |    2 +-
 src/include/access/xlog.h                    |   19 +-
 src/include/access/xlog_internal.h           |    7 +
 src/include/catalog/pg_control.h             |    1 +
 src/include/catalog/pg_proc.dat              |   16 +
 src/include/miscadmin.h                      |    2 +
 src/include/pgstat.h                         |    2 +
 src/include/postmaster/datachecksumsworker.h |   29 +
 src/include/storage/bufpage.h                |    3 +
 src/include/storage/checksum.h               |    8 +
 src/include/storage/procsignal.h             |   10 +-
 src/test/Makefile                            |    2 +-
 src/test/checksum/.gitignore                 |    2 +
 src/test/checksum/Makefile                   |   23 +
 src/test/checksum/README                     |   22 +
 src/test/checksum/t/001_basic.pl             |   74 +
 src/test/checksum/t/002_restarts.pl          |   94 ++
 src/test/checksum/t/003_standby_checksum.pl  |  121 ++
 src/test/checksum/t/004_offline.pl           |  105 ++
 src/test/perl/PostgresNode.pm                |   36 +
 57 files changed, 2673 insertions(+), 77 deletions(-)
 create mode 100644 src/backend/postmaster/datachecksumsworker.c
 create mode 100644 src/include/postmaster/datachecksumsworker.h
 create mode 100644 src/test/checksum/.gitignore
 create mode 100644 src/test/checksum/Makefile
 create mode 100644 src/test/checksum/README
 create mode 100644 src/test/checksum/t/001_basic.pl
 create mode 100644 src/test/checksum/t/002_restarts.pl
 create mode 100644 src/test/checksum/t/003_standby_checksum.pl
 create mode 100644 src/test/checksum/t/004_offline.pl

diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index d37ceef753..bbcb2ce037 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -177,9 +177,11 @@ blbuildempty(Relation index)
 	 * XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record.  Therefore, we need
 	 * this even when wal_level=minimal.
 	 */
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(metapage, BLOOM_METAPAGE_BLKNO);
 	smgrwrite(index->rd_smgr, INIT_FORKNUM, BLOOM_METAPAGE_BLKNO,
 			  (char *) metapage, true);
+	RESUME_INTERRUPTS();
 	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
 				BLOOM_METAPAGE_BLKNO, metapage, true);
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9056..1638ade9ae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25903,6 +25903,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   </sect2>
 
+  <sect2 id="functions-admin-checksum">
+   <title>Data Checksum Functions</title>
+
+   <para>
+    The functions shown in <xref linkend="functions-checksums-table" /> can
+    be used to enable or disable data checksums in a running cluster.
+    See <xref linkend="checksums" /> for details.
+   </para>
+
+   <table id="functions-checksums-table">
+    <title>Data Checksum Functions</title>
+    <tgroup cols="1">
+     <thead>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        Function
+       </para>
+       <para>
+        Description
+       </para></entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_enable_data_checksums</primary>
+        </indexterm>
+        <function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Initiates data checksums for the cluster. This will switch the data
+        checksums mode to <literal>inprogress-on</literal> as well as start a
+        background worker that will process all data in the database and enable
+        checksums for it. When all data pages have had checksums enabled, the
+        cluster will automatically switch data checksums mode to
+        <literal>on</literal>.
+       </para>
+       <para>
+        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
+        specified, the speed of the process is throttled using the same principles as
+        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
+       </para></entry>
+      </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_disable_data_checksums</primary>
+        </indexterm>
+        <function>pg_disable_data_checksums</function> ()
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Disables data checksums for the cluster. This will switch the data
+        checksum mode to <literal>inprogress-off</literal> while data checksums
+        are being disabled. When all active backends have ceased to validate
+        data checksums, the data checksum mode will be changed to <literal>off</literal>.
+       </para></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+  </sect2>
+
   <sect2 id="functions-admin-dbobject">
    <title>Database Object Management Functions</title>
 
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..c94faa11e0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3697,8 +3697,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Number of data page checksum failures detected in this
-       database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       database.
       </para></entry>
      </row>
 
@@ -3708,8 +3707,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       </para>
       <para>
        Time at which the last data page checksum failure was detected in
-       this database (or on a shared object), or NULL if data checksums are not
-       enabled.
+       this database (or on a shared object).
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index c84bc5c5b2..d879550e81 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -45,6 +45,12 @@ PostgreSQL documentation
    exit status is nonzero if the operation failed.
   </para>
 
+  <para>
+   When enabling checksums, if checksums were in the process of being enabled
+   when the cluster was shut down, <application>pg_checksums</application>
+   will still process all relations regardless of the online processing.
+  </para>
+
   <para>
    When verifying checksums, every file in the cluster is scanned. When
    enabling checksums, every file in the cluster is rewritten in-place.
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 66de1ee2f8..48890ccc9d 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -247,9 +247,10 @@
   <para>
    Checksums are normally enabled when the cluster is initialized using <link
    linkend="app-initdb-data-checksums"><application>initdb</application></link>.
-   They can also be enabled or disabled at a later time as an offline
-   operation. Data checksums are enabled or disabled at the full cluster
-   level, and cannot be specified individually for databases or tables.
+   They can also be enabled or disabled at a later time either as an offline
+   operation or online in a running cluster allowing concurrent access. Data
+   checksums are enabled or disabled at the full cluster level, and cannot be
+   specified individually for databases or tables.
   </para>
 
   <para>
@@ -266,7 +267,7 @@
   </para>
 
   <sect2 id="checksums-offline-enable-disable">
-   <title>Off-line Enabling of Checksums</title>
+   <title>Offline Enabling of Checksums</title>
 
    <para>
     The <link linkend="app-pgchecksums"><application>pg_checksums</application></link>
@@ -275,6 +276,54 @@
    </para>
 
   </sect2>
+
+  <sect2 id="checksums-online-enable-disable">
+   <title>Online Enabling of Checksums</title>
+
+   <para>
+    Checksums can be enabled or disabled online, by calling the appropriate
+    <link linkend="functions-admin-checksum">functions</link>.
+   </para>
+
+   <para>
+    Enabling checksums will put the cluster checksum mode in
+    <literal>inprogress-on</literal> mode.  During this time, checksums will be
+    written but not verified. In addition to this, a background worker process
+    is started that enables checksums on all existing data in the cluster. Once
+    this worker has completed processing all databases in the cluster, the
+    checksum mode will automatically switch to <literal>on</literal>. The
+    processing will consume a background worker process, make sure that
+    <varname>max_worker_processes</varname> allows for at least one more
+    additional process.
+   </para>
+
+   <para>
+    The process will initially wait for all open transactions to finish before
+    it starts, so that it can be certain that there are no tables that have been
+    created inside a transaction that has not committed yet and thus would not
+    be visible to the process enabling checksums. It will also, for each database,
+    wait for all pre-existing temporary tables to get removed before it finishes.
+    If long-lived temporary tables are used in the application it may be necessary
+    to terminate these application connections to allow the process to complete.
+   </para>
+
+   <para>
+    If the cluster is stopped while in <literal>inprogress-on</literal> mode, for
+    any reason, then this process must be restarted manually. To do this,
+    re-execute the function <function>pg_enable_data_checksums()</function>
+    once the cluster has been restarted. The background worker will attempt
+    to resume the work from where it was interrupted.
+   </para>
+
+   <note>
+    <para>
+     Enabling checksums can cause significant I/O to the system, as most of the
+     database pages will need to be rewritten, and will be written both to the
+     data files and the WAL.
+    </para>
+   </note>
+
+  </sect2>
  </sect1>
 
   <sect1 id="wal-intro">
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 1054f6f1f2..d46dceff31 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -452,9 +452,11 @@ gist_indexsortbuild(GISTBuildState *state)
 	/* Write out the root */
 	RelationOpenSmgr(state->indexrel);
 	PageSetLSN(pagestate->page, GistBuildLSN);
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(pagestate->page, GIST_ROOT_BLKNO);
 	smgrwrite(state->indexrel->rd_smgr, MAIN_FORKNUM, GIST_ROOT_BLKNO,
 			  pagestate->page, true);
+	RESUME_INTERRUPTS();
 	if (RelationNeedsWAL(state->indexrel))
 		log_newpage(&state->indexrel->rd_node, MAIN_FORKNUM, GIST_ROOT_BLKNO,
 					pagestate->page, true);
@@ -574,8 +576,10 @@ gist_indexsortbuild_flush_ready_pages(GISTBuildState *state)
 			elog(ERROR, "unexpected block number to flush GiST sorting build");
 
 		PageSetLSN(page, GistBuildLSN);
+		HOLD_INTERRUPTS();
 		PageSetChecksumInplace(page, blkno);
 		smgrextend(state->indexrel->rd_smgr, MAIN_FORKNUM, blkno, page, true);
+		RESUME_INTERRUPTS();
 
 		state->pages_written++;
 	}
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 49a9867787..68b2f69fcb 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1025,8 +1025,10 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 					true);
 
 	RelationOpenSmgr(rel);
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(page, lastblock);
 	smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf.data, false);
+	RESUME_INTERRUPTS();
 
 	return true;
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..ffcd889908 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -7927,7 +7927,7 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid,
  * and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
- * heap_buffer, if necessary.
+ * heap_buffer.
  */
 XLogRecPtr
 log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
@@ -7948,11 +7948,18 @@ log_heap_visible(RelFileNode rnode, Buffer heap_buffer, Buffer vm_buffer,
 	XLogRegisterBuffer(0, vm_buffer, 0);
 
 	flags = REGBUF_STANDARD;
+	/*
+	 * Hold interrupts for the duration of xlogging to avoid the state of data
+	 * checksums changing during the processing which would alter the premise
+	 * for xlogging hint bits.
+	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded())
 		flags |= REGBUF_NO_IMAGE;
 	XLogRegisterBuffer(1, heap_buffer, flags);
 
 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
+	RESUME_INTERRUPTS();
 
 	return recptr;
 }
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e198df65d8..ff449728c3 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -652,10 +652,12 @@ vm_extend(Relation rel, BlockNumber vm_nblocks)
 	/* Now extend the file */
 	while (vm_nblocks_now < vm_nblocks)
 	{
+		HOLD_INTERRUPTS();
 		PageSetChecksumInplace((Page) pg.data, vm_nblocks_now);
 
 		smgrextend(rel->rd_smgr, VISIBILITYMAP_FORKNUM, vm_nblocks_now,
 				   pg.data, false);
+		RESUME_INTERRUPTS();
 		vm_nblocks_now++;
 	}
 
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 289bd3c15d..efc3435c3f 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -175,9 +175,11 @@ btbuildempty(Relation index)
 	 * XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record.  Therefore, we need
 	 * this even when wal_level=minimal.
 	 */
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(metapage, BTREE_METAPAGE);
 	smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
 			  (char *) metapage, true);
+	RESUME_INTERRUPTS();
 	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
 				BTREE_METAPAGE, metapage, true);
 
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 5683daa34d..fd319488c4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -664,6 +664,8 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
 				   true);
 	}
 
+	HOLD_INTERRUPTS();
+
 	PageSetChecksumInplace(page, blkno);
 
 	/*
@@ -684,6 +686,8 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
 				  (char *) page, true);
 	}
 
+	RESUME_INTERRUPTS();
+
 	pfree(page);
 }
 
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 92cc7ea073..fa074c6046 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -18,6 +18,7 @@
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/pg_control.h"
+#include "storage/bufpage.h"
 #include "utils/guc.h"
 #include "utils/timestamp.h"
 
@@ -140,6 +141,20 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 						 xlrec.ThisTimeLineID, xlrec.PrevTimeLineID,
 						 timestamptz_to_str(xlrec.end_time));
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state xlrec;
+
+		memcpy(&xlrec, rec, sizeof(xl_checksum_state));
+		if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			appendStringInfo(buf, "on");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			appendStringInfo(buf, "inprogress-off");
+		else if (xlrec.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			appendStringInfo(buf, "inprogress-on");
+		else
+			appendStringInfo(buf, "off");
+	}
 }
 
 const char *
@@ -185,6 +200,9 @@ xlog_identify(uint8 info)
 		case XLOG_FPI_FOR_HINT:
 			id = "FPI_FOR_HINT";
 			break;
+		case XLOG_CHECKSUMS:
+			id = "CHECKSUMS";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index 0ca621450e..addcfa1908 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -168,27 +168,33 @@ spgbuildempty(Relation index)
 	 * of their existing content when the corresponding create records are
 	 * replayed.
 	 */
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(page, SPGIST_METAPAGE_BLKNO);
 	smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
 			  (char *) page, true);
+	RESUME_INTERRUPTS();
 	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
 				SPGIST_METAPAGE_BLKNO, page, true);
 
 	/* Likewise for the root page. */
 	SpGistInitPage(page, SPGIST_LEAF);
 
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(page, SPGIST_ROOT_BLKNO);
 	smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_ROOT_BLKNO,
 			  (char *) page, true);
+	RESUME_INTERRUPTS();
 	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
 				SPGIST_ROOT_BLKNO, page, true);
 
 	/* Likewise for the null-tuples root page. */
 	SpGistInitPage(page, SPGIST_LEAF | SPGIST_NULLS);
 
+	HOLD_INTERRUPTS();
 	PageSetChecksumInplace(page, SPGIST_NULL_BLKNO);
 	smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_NULL_BLKNO,
 			  (char *) page, true);
+	RESUME_INTERRUPTS();
 	log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
 				SPGIST_NULL_BLKNO, page, true);
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e3b5df7dc..d20a61b0e6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -38,6 +38,7 @@
 #include "access/xlogreader.h"
 #include "access/xlogutils.h"
 #include "catalog/catversion.h"
+#include "catalog/pg_class.h"
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/progress.h"
@@ -50,6 +51,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -253,6 +255,16 @@ static bool LocalPromoteIsTriggered = false;
  */
 static int	LocalXLogInsertAllowed = -1;
 
+/*
+ * Local state for Controlfile data_checksum_version. After initialization,
+ * this is only updated when absorbing a procsignal barrier during interrupt
+ * processing.  The reason for keeping a copy in backend-private memory is to
+ * avoid locking for interrogating checksum state.  Possible values are the
+ * checksum versions defined in storage/bufpage.h and zero for when checksums
+ * are disabled.
+ */
+static uint32 LocalDataChecksumVersion = 0;
+
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
  * ie. signal files were present. When InArchiveRecovery is set, we are
@@ -900,6 +912,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XLogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
 								TimeLineID prevTLI);
 static void LocalSetXLogInsertAllowed(void);
@@ -1073,8 +1086,8 @@ XLogInsertRecord(XLogRecData *rdata,
 	 * and fast otherwise.
 	 *
 	 * Also check to see if fullPageWrites or forcePageWrites was just turned
-	 * on; if we weren't already doing full-page writes then go back and
-	 * recompute.
+	 * on, or of we are in the process of enabling checksums in the cluster;
+	 * if we weren't already doing full-page writes then go back and recompute.
 	 *
 	 * If we aren't doing full-page writes then RedoRecPtr doesn't actually
 	 * affect the contents of the XLOG record, so we'll update our local copy
@@ -1087,7 +1100,7 @@ XLogInsertRecord(XLogRecData *rdata,
 		Assert(RedoRecPtr < Insert->RedoRecPtr);
 		RedoRecPtr = Insert->RedoRecPtr;
 	}
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsNeedWrite());
 
 	if (doPageWrites &&
 		(!prevDoPageWrites ||
@@ -4915,9 +4928,7 @@ ReadControlFile(void)
 
 	CalculateCheckpointSegments();
 
-	/* Make the initdb settings visible as GUC variables, too */
-	SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
-					PGC_INTERNAL, PGC_S_OVERRIDE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
 }
 
 /*
@@ -4951,13 +4962,354 @@ GetMockAuthenticationNonce(void)
 }
 
 /*
- * Are checksums enabled for data pages?
+ * DataChecksumsNeedWrite
+ *		Returns whether data checksums must be written or not
+ *
+ * Returns true iff data checksums are enabled or are in the process of being
+ * enabled.   During "inprogress-on" and "inprogress-off" states checksums must
+ * be written even though they are not verified (see datachecksumsworker.c for
+ * a longer discussion).
+ *
+ * This function is intedewd for callsites which are about to write a data page
+ * to storage, and need to know whether to re-calculate the checksum for the
+ * page header. Interrupts must be held off during calling this and until the
+ * write operation has finished to avoid the risk of the checksum state
+ * changing. This implies that calling this function must be performed as close
+ * to write operation as possible to keep the critical section short.
+ */
+bool
+DataChecksumsNeedWrite(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
+			LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * DataChecksumsNeedVerify
+ *		Returns whether data checksums must be verified or not
+ *
+ * Data checksums are only verified if they are fully enabled in the cluster.
+ * During the "inprogress-on" and "inprogress-off" states they are only
+ * updated, not verified (see datachecksumsworker.c for a longer discussion).
+ *
+ * This function is intended for callsites which have read data and are about
+ * to perform checksum validation based on the result of this. To avoid the
+ * the risk of the checksum state changing between reading and performing the
+ * validation (or not), interrupts must be held off. This implies that calling
+ * this function must be performed as close to the validation call as possible
+ * to keep the critical section short. This is in order to protect against
+ * time of check/time of use situations around data checksum validation.
+ */
+bool
+DataChecksumsNeedVerify(void)
+{
+	Assert(InterruptHoldoffCount > 0);
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
+}
+
+/*
+ * DataChecksumsOnInProgress
+ *		Returns whether data checksums are being enabled
+ *
+ * Most operations don't need to worry about the "inprogress" states, and
+ * should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite(). The
+ * "inprogress-on" state for enabling checksums is used when the checksum
+ * worker is setting checksums on all pages, it can thus be used to check for
+ * aborted checksum processing which need to be restarted.
+ */
+inline bool
+DataChecksumsOnInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+}
+
+/*
+ * DataChecksumsOffInProgress
+ *		Returns whether data checksums are being disabled
+ *
+ * The "inprogress-off" state for disabling checksums is used for when the
+ * worker resets the catalog state.  DataChecksumsNeedVerify() or
+ * DataChecksumsNeedWrite() should be used for deciding whether to read/write
+ * checksums.
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsOffInProgress(void)
+{
+	return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+}
+
+/*
+ * SetDataChecksumsOnInProgress
+ *		Sets the data checksum state to "inprogress-on" to enable checksums
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on". See
+ * SetDataChecksumsOn below for a description on how this state change works.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOnInProgress(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile != NULL);
+
+	/*
+	 * The state transition is performed in a critical section with
+	 * checkpoints held off to provide crash safety.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state change in all backends to ensure that all backends are in
+	 * "inprogress-on". Once done we know that all backends are writing data
+	 * checksums.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOn
+ *		Enables data checksums cluster-wide
+ *
+ * Enabling data checksums is performed using two barriers, the first one to
+ * set the checksums state to "inprogress-on" (which is performed by
+ * SetDataChecksumsOnInProgress()) and the second one to set the state to "on"
+ * (performed here).
+ *
+ * To start the process of enabling data checksums in a running cluster the
+ * data_checksum_version state must be changed to "inprogress-on".  This state
+ * requires data checksums to be written but not verified. This ensures that
+ * all data pages can be checksummed without the risk of false negatives in
+ * validation during the process.  When all existing pages are guaranteed to
+ * have checksums, and all new pages will be initiated with checksums, the
+ * state can be changed to "on". Once the state is "on" checksums will be both
+ * written and verified. See datachecksumsworker.c for a longer discussion on
+ * how data checksums can be enabled in a running cluster.
+ *
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOn(void)
 {
+	uint64		barrier;
+
 	Assert(ControlFile != NULL);
-	return (ControlFile->data_checksum_version > 0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/*
+	 * The only allowed state transition to "on" is from "inprogress-on" since
+	 * that state ensures that all pages will have data checksums written.
+	 */
+	if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+		elog(ERROR, "checksums not in \"inprogress-on\" mode");
+	}
+
+	LWLockRelease(ControlFileLock);
+
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(PG_DATA_CHECKSUM_VERSION);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	/*
+	 * Await state transition of "on" in all backends. When done we know that
+	 * data checksums are enabled in all backends and data checksums are both
+	 * written and verified.
+	 */
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * SetDataChecksumsOff
+ *		Disables data checksums cluster-wide
+ *
+ * Disabling data checksums must be performed with two sets of barriers, each
+ * carrying a different state. The state is first set to "inprogress-off"
+ * during which checksums are still written but not verified. This ensures that
+ * backends which have yet to observe the state change from "on" won't get
+ * validation errors on concurrently modified pages. Once all backends have
+ * changed to "inprogress-off", the barrier for moving to "off" can be emitted.
+ * This function blocks until all backends in the cluster have acknowledged the
+ * state transition.
+ */
+void
+SetDataChecksumsOff(void)
+{
+	uint64		barrier;
+
+	Assert(ControlFile);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+	/* If data checksums are already disabled there is nothing to do */
+	if (ControlFile->data_checksum_version == 0)
+	{
+		LWLockRelease(ControlFileLock);
+		return;
+	}
+
+	/*
+	 * If data checksums are currently enabled we first transition to the
+	 * "inprogress-off" state during which backends continue to write
+	 * checksums without verifying them. When all backends are in
+	 * "inprogress-off" the next transition to "off" can be performed, after
+	 * which all data checksum processing is disabled.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+	{
+		LWLockRelease(ControlFileLock);
+
+		MyProc->delayChkpt = true;
+		START_CRIT_SECTION();
+
+		XLogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+
+		barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF);
+
+		END_CRIT_SECTION();
+		MyProc->delayChkpt = false;
+
+		/*
+		 * Update local state in all backends to ensure that any backend in
+		 * "on" state is changed to "inprogress-off".
+		 */
+		WaitForProcSignalBarrier(barrier);
+
+		/*
+		 * At this point we know that no backends are verifying data checksums
+		 * during reading. Next, we can safely move to state "off" to also
+		 * stop writing checksums.
+		 */
+	}
+	else
+	{
+		/*
+		 * Ending up here implies that the checksums state is "inprogress-on"
+		 * or "inprogress-off" and we can transition directly to "off" from
+		 * there.
+		 */
+		LWLockRelease(ControlFileLock);
+	}
+
+	/*
+	 * Ensure that we don't incur a checkpoint during disabling checksums.
+	 */
+	MyProc->delayChkpt = true;
+	START_CRIT_SECTION();
+
+	XLogChecksums(0);
+
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	ControlFile->data_checksum_version = 0;
+	UpdateControlFile();
+	LWLockRelease(ControlFileLock);
+
+	barrier = EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF);
+
+	END_CRIT_SECTION();
+	MyProc->delayChkpt = false;
+
+	WaitForProcSignalBarrier(barrier);
+}
+
+/*
+ * ProcSignalBarrier absorption functions for enabling and disabling data
+ * checksums in a running cluster. The procsignalbarriers are emitted in the
+ * SetDataChecksums* functions.
+ */
+bool
+AbsorbChecksumsOnInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOnBarrier(void)
+{
+	Assert(LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION);
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffInProgressBarrier(void)
+{
+	LocalDataChecksumVersion = PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+	return true;
+}
+
+bool
+AbsorbChecksumsOffBarrier(void)
+{
+	LocalDataChecksumVersion = 0;
+	return true;
+}
+
+/*
+ * InitLocalControlData
+ *
+ * Set up backend local caches of controldata variables which may change at
+ * any point during runtime and thus require special cased locking. So far
+ * this only applies to data_checksum_version, but it's intended to be general
+ * purpose enough to handle future cases.
+ */
+void
+InitLocalControldata(void)
+{
+	LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+	LocalDataChecksumVersion = ControlFile->data_checksum_version;
+	LWLockRelease(ControlFileLock);
+}
+
+/* guc hook */
+const char *
+show_data_checksums(void)
+{
+	if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION)
+		return "on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		return "inprogress-on";
+	else if (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+		return "inprogress-off";
+	else
+		return "off";
 }
 
 /*
@@ -7994,6 +8346,32 @@ StartupXLOG(void)
 	 */
 	CompleteCommitTsInitialization();
 
+	/*
+	 * If we reach this point with checksums being enabled ("inprogress-on"
+	 * state), we notify the user that they need to manually restart the
+	 * process to enable checksums. This is because we cannot launch a dynamic
+	 * background worker directly from here, it has to be launched from a
+	 * regular backend.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+		ereport(WARNING,
+				(errmsg("data checksums are being enabled, but no worker is running"),
+				 errhint("Either disable or enable data checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
+
+	/*
+	 * If data checksums were being disabled when the cluster was shutdown, we
+	 * know that we have a state where all backends have stopped validating
+	 * checksums and we can move to off instead.
+	 */
+	if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+	{
+		XLogChecksums(0);
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = 0;
+		LWLockRelease(ControlFileLock);
+	}
+
 	/*
 	 * All done with end-of-recovery actions.
 	 *
@@ -8423,7 +8801,7 @@ InitXLOGAccess(void)
 	/* Use GetRedoRecPtr to copy the RedoRecPtr safely */
 	(void) GetRedoRecPtr();
 	/* Also update our copy of doPageWrites. */
-	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+	doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsNeedWrite());
 
 	/* Also initialize the working areas for constructing WAL records */
 	InitXLogInsert();
@@ -9902,6 +10280,24 @@ XLogReportParameters(void)
 	}
 }
 
+/*
+ * Log the new state of checksums
+ */
+static void
+XLogChecksums(ChecksumType new_type)
+{
+	xl_checksum_state xlrec;
+	XLogRecPtr	recptr;
+
+	xlrec.new_checksumtype = new_type;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, sizeof(xl_checksum_state));
+
+	recptr = XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
+	XLogFlush(recptr);
+}
+
 /*
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
@@ -10357,6 +10753,28 @@ xlog_redo(XLogReaderState *record)
 		/* Keep track of full_page_writes */
 		lastFullPageWrites = fpw;
 	}
+	else if (info == XLOG_CHECKSUMS)
+	{
+		xl_checksum_state state;
+
+		memcpy(&state, XLogRecGetData(record), sizeof(xl_checksum_state));
+
+		LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+		ControlFile->data_checksum_version = state.new_checksumtype;
+		UpdateControlFile();
+		LWLockRelease(ControlFileLock);
+		if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+		else if (state.new_checksumtype == PG_DATA_CHECKSUM_VERSION)
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_ON));
+		else
+		{
+			Assert(state.new_checksumtype == 0);
+			WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+		}
+	}
 }
 
 #ifdef WAL_DEBUG
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index d8c5bf6dc2..6c92ad9def 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -25,6 +25,7 @@
 #include "catalog/pg_type.h"
 #include "funcapi.h"
 #include "miscadmin.h"
+#include "postmaster/datachecksumsworker.h"
 #include "pgstat.h"
 #include "replication/walreceiver.h"
 #include "storage/fd.h"
@@ -787,3 +788,49 @@ pg_promote(PG_FUNCTION_ARGS)
 						   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Starts a background worker that turns off data checksums.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+	PG_RETURN_VOID();
+}
+
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)
+{
+	int			cost_delay = PG_GETARG_INT32(0);
+	int			cost_limit = PG_GETARG_INT32(1);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errmsg("must be superuser")));
+
+	if (cost_delay < 0)
+		ereport(ERROR,
+				(errmsg("cost delay cannot be less than zero")));
+
+	if (cost_limit <= 0)
+		ereport(ERROR,
+				(errmsg("cost limit must be a positive value")));
+
+	StartDatachecksumsWorkerLauncher(true, cost_delay, cost_limit);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index cba7a9ada0..d3f89d01fd 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -441,6 +441,13 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst,
 		/* If we got a cancel signal during the copy of the data, quit */
 		CHECK_FOR_INTERRUPTS();
 
+		/*
+		 * Hold interrupts for the duration of the IO operation to ensure
+		 * that the data checksums state cannot change and thus risking a
+		 * false positive or negative.
+		 */
+		HOLD_INTERRUPTS();
+
 		smgrread(src, forkNum, blkno, buf.data);
 
 		if (!PageIsVerifiedExtended(page, blkno,
@@ -469,6 +476,8 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst,
 		 * ourselves below.
 		 */
 		smgrextend(dst, forkNum, blkno, buf.data, true);
+
+		RESUME_INTERRUPTS();
 	}
 
 	/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..516ae666b7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1264,6 +1264,11 @@ CREATE OR REPLACE FUNCTION
   RETURNS boolean STRICT VOLATILE LANGUAGE INTERNAL AS 'pg_promote'
   PARALLEL SAFE;
 
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS void STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;
+
 -- legacy definition for compatibility with 9.3
 CREATE OR REPLACE FUNCTION
   json_populate_record(base anyelement, from_json json, use_json_as_text boolean DEFAULT false)
diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index bfdf6a833d..59b82ee9ce 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -17,6 +17,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	datachecksumsworker.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index dd3dad3de3..661edff5e0 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -18,6 +18,7 @@
 #include "pgstat.h"
 #include "port/atomics.h"
 #include "postmaster/bgworker_internals.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/interrupt.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
@@ -128,6 +129,12 @@ static const struct
 	},
 	{
 		"ApplyWorkerMain", ApplyWorkerMain
+	},
+	{
+		"DatachecksumsWorkerLauncherMain", DatachecksumsWorkerLauncherMain
+	},
+	{
+		"DatachecksumsWorkerMain", DatachecksumsWorkerMain
 	}
 };
 
diff --git a/src/backend/postmaster/datachecksumsworker.c b/src/backend/postmaster/datachecksumsworker.c
new file mode 100644
index 0000000000..34780f818d
--- /dev/null
+++ b/src/backend/postmaster/datachecksumsworker.c
@@ -0,0 +1,1327 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.c
+ *	  Background worker for enabling or disabling data checksums online
+ *
+ * When enabling data checksums on a database at initdb time or with
+ * pg_checksums, no extra process is required as each page is checksummed, and
+ * verified, when accessed.  When enabling checksums on an already running
+ * cluster, which does not run with checksums enabled, this worker will ensure
+ * that all pages are checksummed before verification of the checksums is
+ * turned on. In the case of disabling checksums, the state transition is only
+ * control file, no changes are performed on the data pages.
+ *
+ * Checksums can be either enabled or disabled cluster-wide, with on/off being
+ * the end state for data_checksums.
+ *
+ * Enabling checksums
+ * ------------------
+ * When enabling checksums in an online cluster, data_checksums will be set to
+ * "inprogress-on" which signals that write operations MUST compute and write
+ * the checksum on the data page, but during reading the checksum SHALL NOT be
+ * verified. This ensures that all objects created during checksumming will
+ * have checksums set, but no reads will fail due to incorrect checksum. The
+ * DataChecksumsWorker will compile a list of databases which exist at the
+ * start of checksumming, and all of these which haven't been dropped during
+ * the processing MUST have been processed successfully in order for checksums
+ * to be enabled. Any new relation created during processing will see the
+ * in-progress state and will automatically be checksummed.
+ *
+ * For each database, all relations which have storage are read and every data
+ * page is marked dirty to force a write with the checksum. This will generate
+ * a lot of WAL as the entire database is read and written.
+ *
+ * If the processing is interrupted by a cluster restart, it will be restarted
+ * from the beginning again as state isn't persisted.
+ *
+ * Disabling checksums
+ * -------------------
+ * When disabling checksums, data_checksums will be set to "inprogress-off"
+ * which signals that checksums are written but no longer verified. This ensure
+ * that backends which have yet to move from the "on" state will still be able
+ * to process data checksum validation.
+ *
+ * Synchronization and Correctness
+ * -------------------------------
+ * The processes involved in enabling, or disabling, data checksums in an
+ * online cluster must be properly synchronized with the normal backends
+ * serving concurrent queries to ensure correctness. Correctness is defined
+ * as the following:
+ *
+ *    - Backends SHALL NOT violate local datachecksum state
+ *    - Data checksums SHALL NOT be considered enabled cluster-wide until all
+ *      currently connected backends have the local state "enabled"
+ *
+ * There are two levels of synchronization required for enabling data checksums
+ * in an online cluster: (i) changing state in the active backends ("on",
+ * "off", "inprogress-on" and "inprogress-off"), and (ii) ensuring no
+ * incompatible objects and processes are left in a database when workers end.
+ * The former deals with cluster-wide agreement on data checksum state and the
+ * latter with ensuring that any concurrent activity cannot break the data
+ * checksum contract during processing.
+ *
+ * Synchronizing the state change is done with procsignal barriers, where the
+ * WAL logging backend updating the global state in the controlfile will wait
+ * for all other backends to absorb the barrier. Barrier absorption will happen
+ * during interrupt processing, which means that connected backends will change
+ * state at different times. To prevent data checksum state changes when
+ * writing and verifying checksums, interrupts shall be held off before
+ * interrogating state and resumed when the IO operation has been performed.
+ *
+ *   When Enabling Data Checksums
+ *   ----------------------------
+ *   A process which fails to observe data checksums being enabled can induce
+ *   two types of errors: failing to write the checksum when modifying the page
+ *   and failing to validate the data checksum on the page when reading it.
+ *
+ *   When processing starts all backends belong to one of the below sets, with
+ *   one set being empty:
+ *
+ *   Bd: Backends in "off" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   If processing is started in an online cluster then all backends are in Bd.
+ *   If processing was halted by the cluster shutting down, the controlfile
+ *   state "inprogress-on" will be observed on system startup and all backends
+ *   will be in Bd. Backends transition Bd -> Bi via a procsignalbarrier.  When
+ *   the DataChecksumsWorker has finished writing checksums on all pages and
+ *   enables data checksums cluster-wide, there are four sets of backends where
+ *   Bd shall be an empty set:
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bi: Backends in "inprogress-on" state
+ *
+ *   Backends in Bi and Be will write checksums when modifying a page, but only
+ *   backends in Be will verify the checksum during reading. The Bg backend is
+ *   blocked waiting for all backends in Bi to process interrupts and move to
+ *   Be. Any backend starting while Bg is waiting on the procsignalbarrier will
+ *   observe the global state being "on" and will thus automatically belong to
+ *   Be.  Checksums are enabled cluster-wide when Bi is an empty set. Bi and Be
+ *   are compatible sets while still operating based on their local state as
+ *   both write data checksums.
+ *
+ *   When Disabling Data Checksums
+ *   -----------------------------
+ *   A process which fails to observe that data checksums have been disabled
+ *   can induce two types of errors: writing the checksum when modifying the
+ *   page and validating a data checksum which is no longer correct due to
+ *   modifications to the page.
+ *
+ *   Bg: Backend updating the global state and emitting the procsignalbarrier
+ *   Bd: Backends in "off" state
+ *   Be: Backends in "on" state
+ *   Bo: Backends in "inprogress-off" state
+ *
+ *   Backends transition from the Be state to Bd like so: Be -> Bo -> Bd
+ *
+ *   The goal is to transition all backends to Bd making the others empty sets.
+ *   Backends in Bo write data checksums, but don't validate them, such that
+ *   backends still in Be can continue to validate pages until the barrier has
+ *   been absorbed such that they are in Bo. Once all backends are in Bo, the
+ *   barrier to transition to "off" can be raised and all backends can safely
+ *   stop writing data checksums as no backend is enforcing data checksum
+ *   validation any longer.
+ *
+ *
+ * Potential optimizations
+ * -----------------------
+ * Below are some potential optimizations and improvements which were brought
+ * up during reviews of this feature, but which weren't implemented in the
+ * initial version. These are ideas listed without any validation on their
+ * feasability or potential payoff. More discussion on these can be found on
+ * the -hackers threads linked to in the commit message of this feature.
+ *
+ *   * Launching datachecksumsworker for resuming operation from the startup
+ *     process: Currently users have to restart processing manually after a
+ *     restart since dynamic background worker cannot be started from the
+ *     postmaster. Changing to the startup process could make resuming the
+ *     processing automatic.
+ *   * Avoid dirtying the page when checksums already match: Iff the checksum
+ *     on the page happens to already match we still dirty the page. It should
+ *     be enough to only do the log_newpage_buffer() call in that case.
+ *   * Invent a lightweight WAL record that doesn't contain the full-page
+ *     image but just the block number: On replay, the redo routine would read
+ *     the page from disk.
+ *   * Teach pg_checksums to avoid checksummed pages when pg_checksums is used
+ *     to enable checksums on a cluster which is in inprogress-on state and
+ *     may have checksummed pages (make pg_checksums be able to resume an
+ *     online operation).
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/postmaster/datachecksumsworker.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/htup_details.h"
+#include "access/xact.h"
+#include "catalog/indexing.h"
+#include "catalog/pg_class.h"
+#include "catalog/pg_database.h"
+#include "commands/vacuum.h"
+#include "common/relpath.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "postmaster/bgworker.h"
+#include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
+#include "storage/bufmgr.h"
+#include "storage/checksum.h"
+#include "storage/lmgr.h"
+#include "storage/ipc.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/ps_status.h"
+#include "utils/syscache.h"
+
+#define DATACHECKSUMSWORKER_MAX_DB_RETRIES 5
+
+#define MAX_OPS 4
+
+typedef enum
+{
+	DATACHECKSUMSWORKER_SUCCESSFUL = 0,
+	DATACHECKSUMSWORKER_ABORTED,
+	DATACHECKSUMSWORKER_FAILED,
+	DATACHECKSUMSWORKER_RETRYDB,
+}			DatachecksumsWorkerResult;
+
+/*
+ * Signaling between backends calling pg_enable/disable_data_checksums, the
+ * checksums launcher process, and the checksums worker process.
+ *
+ * This struct is protected by DatachecksumsWorkerLock
+ */
+typedef struct DatachecksumsWorkerShmemStruct
+{
+	/*
+	 * These are set by pg_enable/disable_data_checksums, to tell the launcher
+	 * what the target state is.
+	 */
+	bool		launch_enable_checksums;	/* True if checksums are being
+											 * enabled, else false */
+	int			launch_cost_delay;
+	int			launch_cost_limit;
+
+	/*
+	 * Is a launcher process is currently running?
+	 *
+	 * This is set by the launcher process, after it has read the above launch_*
+	 * parameters.
+	 */
+	bool		launcher_running;
+
+	/*
+	 * These fields indicate the target state that the launcher is currently
+	 * working towards. They can be different from the corresponding launch_*
+	 * fields, if a new pg_enable/disable_data_checksums() call was made while
+	 * the launcher/worker was already running.
+	 *
+	 * The below members are set when the launcher starts, and are only
+	 * accessed read-only by the single worker. Thus, we can access these
+	 * without a lock. If multiple workers, or dynamic cost parameters, are
+	 * supported at some point then this would need to be revisited.
+	 */
+	bool		enabling_checksums;	/* True if checksums are being enabled,
+									 * else false */
+	int			cost_delay;
+	int			cost_limit;
+
+	/*
+	 * Signaling between the launcher and the worker process.
+	 *
+	 * As there is only a single worker, and the launcher won't read these
+	 * until the worker exits, they can be accessed without the need for a
+	 * lock. If multiple workers are supported then this will have to be
+	 * revisited.
+	 */
+
+	/* result, set by worker before exiting */
+	DatachecksumsWorkerResult success;
+
+	/* tells the worker process whether it should also process the shared catalogs */
+	bool		process_shared_catalogs;
+} DatachecksumsWorkerShmemStruct;
+
+/* Shared memory segment for datachecksumsworker */
+static DatachecksumsWorkerShmemStruct *DatachecksumsWorkerShmem;
+
+/* Bookkeeping for work to do */
+typedef struct DatachecksumsWorkerDatabase
+{
+	Oid			dboid;
+	char	   *dbname;
+}			DatachecksumsWorkerDatabase;
+
+typedef struct DatachecksumsWorkerResultEntry
+{
+	Oid			dboid;
+	DatachecksumsWorkerResult result;
+	int			retries;
+}			DatachecksumsWorkerResultEntry;
+
+
+/*
+ * Flag set by the interrupt handler
+ */
+static volatile sig_atomic_t abort_requested = false;
+
+/*
+ * Have we set the DatachecksumsWorkerShmemStruct->launcher_running flag?
+ * If we have, we need to clear it before exiting!
+ */
+static volatile sig_atomic_t launcher_running = false;
+
+/*
+ * Are we enabling data checksums, or disabling them?
+ */
+static bool enabling_checksums;
+
+/* Prototypes */
+static List *BuildDatabaseList(void);
+static List *BuildRelationList(bool temp_relations, bool include_shared);
+static DatachecksumsWorkerResult ProcessDatabase(DatachecksumsWorkerDatabase *db);
+static bool ProcessAllDatabases(bool *already_connected);
+static bool ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy);
+static void launcher_cancel_handler(SIGNAL_ARGS);
+static void WaitForAllTransactionsToFinish(void);
+
+/*
+ * StartDataChecksumsWorkerLauncher
+ *		Main entry point for datachecksumsworker launcher process
+ *
+ * The main entrypoint for starting data checksums processing for enabling as
+ * well as disabling.
+ */
+void
+StartDatachecksumsWorkerLauncher(bool enable_checksums, int cost_delay, int cost_limit)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	bool		launcher_running;
+
+	/* the cost delay settings have no effect when disabling */
+	Assert(enable_checksums || cost_delay == 0);
+	Assert(enable_checksums || cost_limit == 0);
+
+	/* store the desired state in shared memory */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	DatachecksumsWorkerShmem->launch_enable_checksums = enable_checksums;
+	DatachecksumsWorkerShmem->launch_cost_delay = cost_delay;
+	DatachecksumsWorkerShmem->launch_cost_limit = cost_limit;
+
+	/* is the launcher already running? */
+	launcher_running = DatachecksumsWorkerShmem->launcher_running;
+
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * Launch a new launcher process, if it's not running already.
+	 *
+	 * If the launcher is currently busy enabling the checksums, and we want
+	 * them disabled (or vice versa), the launcher will notice that at latest
+	 * when it's about to exit, and will loop back process the new request.
+	 * So if the launcher is already running, we don't need to do anything
+	 * more here to abort it.
+	 *
+	 * If you call pg_enable/disable_data_checksums() twice in a row, before
+	 * the launcher has had a chance to start up, we still end up launching it
+	 * twice.  That's OK, the second invocation will see that a launcher is
+	 * already running and exit quickly.
+	 *
+	 * TODO: We could optimize here and skip launching the launcher, if we are
+	 * already in the desired state, i.e. if the checksums are already enabled
+	 * and you call pg_enable_data_checksums().
+	 */
+	if (!launcher_running)
+	{
+		/*
+		 * Prepare the BackgroundWorker and launch it.
+		 */
+		memset(&bgw, 0, sizeof(bgw));
+		bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+		bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+		snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+		snprintf(bgw.bgw_function_name, BGW_MAXLEN, "DatachecksumsWorkerLauncherMain");
+		snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker launcher");
+		snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker launcher");
+		bgw.bgw_restart_time = BGW_NEVER_RESTART;
+		bgw.bgw_notify_pid = MyProcPid;
+		bgw.bgw_main_arg = (Datum) 0;
+
+		if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+			ereport(ERROR,
+					(errmsg("failed to start background worker to process data checksums")));
+	}
+}
+
+/*
+ * ProcessSingleRelationFork
+ *		Enable data checksums in a single relation/fork.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
+{
+	BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
+	BlockNumber blknum;
+	char		activity[NAMEDATALEN * 2 + 128];
+	char	   *relns;
+
+	relns = get_namespace_name(RelationGetNamespace(reln));
+
+	if (!relns)
+		return false;
+
+	/*
+	 * We are looping over the blocks which existed at the time of process
+	 * start, which is safe since new blocks are created with checksums set
+	 * already due to the state being "inprogress-on".
+	 */
+	for (blknum = 0; blknum < numblocks; blknum++)
+	{
+		Buffer		buf = ReadBufferExtended(reln, forkNum, blknum, RBM_NORMAL, strategy);
+
+		/*
+		 * Report to pgstat every 100 blocks to keep from overwhelming the
+		 * activity reporting with close to identical reports.
+		 */
+		if ((blknum % 100) == 0)
+		{
+			snprintf(activity, sizeof(activity) - 1, "processing: %s.%s (%s block %d/%d)",
+					 relns, RelationGetRelationName(reln),
+					 forkNames[forkNum], blknum, numblocks);
+			pgstat_report_activity(STATE_RUNNING, activity);
+		}
+
+		/* Need to get an exclusive lock before we can flag as dirty */
+		LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
+
+		/*
+		 * Mark the buffer as dirty and force a full page write.  We have to
+		 * re-write the page to WAL even if the checksum hasn't changed,
+		 * because if there is a replica it might have a slightly different
+		 * version of the page with an invalid checksum, caused by unlogged
+		 * changes (e.g. hintbits) on the master happening while checksums
+		 * were off. This can happen if there was a valid checksum on the page
+		 * at one point in the past, so only when checksums are first on, then
+		 * off, and then turned on again. Iff wal_level is set to "minimal",
+		 * this could be avoided iff the checksum is calculated to be correct.
+		 */
+		START_CRIT_SECTION();
+		MarkBufferDirty(buf);
+		log_newpage_buffer(buf, false);
+		END_CRIT_SECTION();
+
+		UnlockReleaseBuffer(buf);
+
+		/*
+		 * This is the only place where we check if we are asked to abort, the
+		 * abortion will bubble up from here. It's safe to check this without
+		 * a lock, because if we miss it being set, we will try again soon.
+		 */
+		Assert(enabling_checksums);
+		if (!DatachecksumsWorkerShmem->launch_enable_checksums)
+			abort_requested = true;
+		if (abort_requested)
+			return false;
+
+		vacuum_delay_point();
+	}
+
+	pfree(relns);
+	return true;
+}
+
+/*
+ * ProcessSingleRelationByOid
+ *		Process a single relation based on oid.
+ *
+ * Returns true if successful, and false if *aborted*. On error, an actual
+ * error is raised in the lower levels.
+ */
+static bool
+ProcessSingleRelationByOid(Oid relationId, BufferAccessStrategy strategy)
+{
+	Relation	rel;
+	ForkNumber	fnum;
+	bool		aborted = false;
+
+	StartTransactionCommand();
+
+	elog(DEBUG2,
+		 "adding data checksums to relation with OID %u",
+		 relationId);
+
+	rel = try_relation_open(relationId, AccessShareLock);
+	if (rel == NULL)
+	{
+		/*
+		 * Relation no longer exists. We don't consider this an error since
+		 * there are no pages in it that need data checksums, and thus return
+		 * true. The worker operates off a list of relations generated at the
+		 * start of processing, so relations being dropped in the meantime is
+		 * to be expected.
+		 */
+		CommitTransactionCommand();
+		pgstat_report_activity(STATE_IDLE, NULL);
+		return true;
+	}
+	RelationOpenSmgr(rel);
+
+	for (fnum = 0; fnum <= MAX_FORKNUM; fnum++)
+	{
+		if (smgrexists(rel->rd_smgr, fnum))
+		{
+			if (!ProcessSingleRelationFork(rel, fnum, strategy))
+			{
+				aborted = true;
+				break;
+			}
+		}
+	}
+	relation_close(rel, AccessShareLock);
+	elog(DEBUG2,
+		 "data checksum processing done for relation with OID %u: %s",
+		 relationId, (aborted ? "aborted" : "finished"));
+
+	CommitTransactionCommand();
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return !aborted;
+}
+
+/*
+ * ProcessDatabase
+ *		Enable data checksums in a single database.
+ *
+ * We do this by launching a dynamic background worker into this database, and
+ * waiting for it to finish.  We have to do this in a separate worker, since
+ * each process can only be connected to one database during its lifetime.
+ */
+static DatachecksumsWorkerResult
+ProcessDatabase(DatachecksumsWorkerDatabase *db)
+{
+	BackgroundWorker bgw;
+	BackgroundWorkerHandle *bgw_handle;
+	BgwHandleStatus status;
+	pid_t		pid;
+	char		activity[NAMEDATALEN + 64];
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_FAILED;
+
+	memset(&bgw, 0, sizeof(bgw));
+	bgw.bgw_flags = BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+	bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+	snprintf(bgw.bgw_library_name, BGW_MAXLEN, "postgres");
+	snprintf(bgw.bgw_function_name, BGW_MAXLEN, "%s", "DatachecksumsWorkerMain");
+	snprintf(bgw.bgw_name, BGW_MAXLEN, "datachecksumsworker worker");
+	snprintf(bgw.bgw_type, BGW_MAXLEN, "datachecksumsworker worker");
+	bgw.bgw_restart_time = BGW_NEVER_RESTART;
+	bgw.bgw_notify_pid = MyProcPid;
+	bgw.bgw_main_arg = ObjectIdGetDatum(db->dboid);
+
+	/*
+	 * If there are no worker slots available, make sure we retry processing
+	 * this database. This will make the datachecksumsworker move on to the
+	 * next database and quite likely fail with the same problem. TODO: Maybe
+	 * we need a backoff to avoid running through all the databases here in
+	 * short order.
+	 */
+	if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle))
+	{
+		ereport(WARNING,
+				(errmsg("failed to start worker for enabling data checksums in database \"%s\", retrying",
+						db->dbname),
+				 errhint("The max_worker_processes setting might be too low.")));
+		return DATACHECKSUMSWORKER_RETRYDB;
+	}
+
+	status = WaitForBackgroundWorkerStartup(bgw_handle, &pid);
+	if (status == BGWH_STOPPED)
+	{
+		ereport(WARNING,
+				(errmsg("could not start background worker for enabling data checksums in database \"%s\"",
+						db->dbname),
+				 errhint("More details on the error might be found in the server log.")));
+		return DATACHECKSUMSWORKER_FAILED;
+	}
+
+	/*
+	 * If the postmaster crashed we cannot end up with a processed database so
+	 * we have no alternative other than exiting. When enabling checksums we
+	 * won't at this time have changed the pg_control version to enabled so
+	 * when the cluster comes back up processing will have to be restarted.
+	 * When disabling, the pg_control version will be set to off before this so
+	 * when the cluster comes up checksums will be off as expected.
+	 */
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("cannot enable data checksums without the postmaster process"),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	Assert(status == BGWH_STARTED);
+	ereport(DEBUG1,
+			(errmsg("initiating data checksum processing in database \"%s\"",
+					db->dbname)));
+
+	snprintf(activity, sizeof(activity) - 1,
+			 "Waiting for worker in database %s (pid %d)", db->dbname, pid);
+	pgstat_report_activity(STATE_RUNNING, activity);
+
+	status = WaitForBackgroundWorkerShutdown(bgw_handle);
+	if (status == BGWH_POSTMASTER_DIED)
+		ereport(FATAL,
+				(errmsg("postmaster exited during data checksum processing in \"%s\"",
+						db->dbname),
+				 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+	if (DatachecksumsWorkerShmem->success == DATACHECKSUMSWORKER_ABORTED)
+		ereport(LOG,
+				(errmsg("data checksums processing was aborted in database \"%s\"",
+						db->dbname)));
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+
+	return DatachecksumsWorkerShmem->success;
+}
+
+/*
+ * launcher_exit
+ *
+ * Internal routine for cleaning up state when the launcher process exits. We
+ * need to clean up the abort flag to ensure that processing can be restarted
+ * again after it was previously aborted.
+ */
+static void
+launcher_exit(int code, Datum arg)
+{
+	if (launcher_running)
+	{
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		launcher_running = false;
+		DatachecksumsWorkerShmem->launcher_running = false;
+		LWLockRelease(DatachecksumsWorkerLock);
+	}
+}
+
+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	abort_requested = true;
+
+	/*
+	 * There is no sleeping in the main loop, the flag will be checked
+	 * periodically in ProcessSingleRelationFork. The worker does however
+	 * sleep when waiting for concurrent transactions to end so we still need
+	 * to set the latch.
+	 */
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+}
+
+/*
+ * WaitForAllTransactionsToFinish
+ *		Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ *
+ * NB: this will return early, if aborted by SIGINT or if the target state
+ * is changed while we're running.
+ */
+static void
+WaitForAllTransactionsToFinish(void)
+{
+	TransactionId waitforxid;
+
+	LWLockAcquire(XidGenLock, LW_SHARED);
+	waitforxid = XidFromFullTransactionId(ShmemVariableCache->nextXid);
+	LWLockRelease(XidGenLock);
+
+	while (TransactionIdPrecedes(GetOldestActiveTransactionId(), waitforxid))
+	{
+		char		activity[64];
+		int			rc;
+
+		/* Oldest running xid is older than us, so wait */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for current transactions to finish (waiting for %u)",
+				 waitforxid);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		rc = WaitLatch(MyLatch,
+					   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+					   5000,
+					   WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION);
+
+		/*
+		 * If the postmaster died we won't be able to enable checksums
+		 * cluster-wide so abort and hope to continue when restarted.
+		 */
+		if (rc & WL_POSTMASTER_DEATH)
+			ereport(FATAL,
+					(errmsg("postmaster exited during data checksum processing"),
+					 errhint("Restart the database and restart data checksum processing by calling pg_enable_data_checksums().")));
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_SHARED);
+		if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+			abort_requested = true;
+		LWLockRelease(DatachecksumsWorkerLock);
+		if (abort_requested)
+			break;
+	}
+
+	pgstat_report_activity(STATE_IDLE, NULL);
+	return;
+}
+
+/*
+ * DatachecksumsWorkerLauncherMain
+ *
+ * Main function for launching dynamic background workers for processing data
+ * checksums in databases. This function has the bgworker management, with
+ * ProcessAllDatabases being responsible for looping over the databases and
+ * initiating processing.
+ */
+void
+DatachecksumsWorkerLauncherMain(Datum arg)
+{
+	bool		connected = false;
+	bool		status = false;
+
+	on_shmem_exit(launcher_exit, 0);
+
+	ereport(DEBUG1,
+			(errmsg("background worker \"datachecksumsworker\" launcher started")));
+
+	pqsignal(SIGTERM, die);
+	pqsignal(SIGINT, launcher_cancel_handler);
+
+	BackgroundWorkerUnblockSignals();
+
+	InitXLOGAccess();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_LAUNCHER;
+	init_ps_display(NULL);
+
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+	if (DatachecksumsWorkerShmem->launcher_running)
+	{
+		/* Launcher was already running, let it finish */
+		LWLockRelease(DatachecksumsWorkerLock);
+		return;
+	}
+
+	launcher_running = true;
+
+	enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+	DatachecksumsWorkerShmem->launcher_running = true;
+	DatachecksumsWorkerShmem->enabling_checksums = enabling_checksums;
+	DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+	DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+	LWLockRelease(DatachecksumsWorkerLock);
+
+	/*
+	 * The target state can change while we are busy enabling/disabling
+	 * checksums, if the user calls pg_disable/enable_data_checksums() before
+	 * we are finished with the previous request. In that case, we will loop
+	 * back here, to process the new request.
+	 */
+again:
+
+	/*
+	 * If we're asked to enable checksums, we need to check if processing was
+	 * previously interrupted such that we should resume rather than start
+	 * from scratch.
+	 */
+	if (enabling_checksums)
+	{
+		/*
+		 * If we are asked to enable checksums in a cluster which already
+		 * has checksums enabled, exit immediately as there is nothing
+		 * more to do.
+		 */
+		HOLD_INTERRUPTS();
+		if (DataChecksumsNeedVerify())
+		{
+			RESUME_INTERRUPTS();
+			goto done;
+		}
+		RESUME_INTERRUPTS();
+
+		SetDataChecksumsOnInProgress();
+
+		status = ProcessAllDatabases(&connected);
+		if (!status)
+		{
+			/*
+			 * If the target state changed during processing then it's
+			 * not a failure, so restart processing instead.
+			 */
+			LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+			if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+			{
+				LWLockRelease(DatachecksumsWorkerLock);
+				goto done;
+			}
+			LWLockRelease(DatachecksumsWorkerLock);
+			ereport(ERROR,
+					(errmsg("unable to enable checksums in cluster")));
+		}
+
+		SetDataChecksumsOn();
+	}
+	else
+	{
+		SetDataChecksumsOff();
+	}
+
+done:
+	/*
+	 * All done. But before we exit, check if the target state was changed while
+	 * we were running. In that case we will have to start all over again.
+	 */
+	LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+	if (DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums)
+	{
+		DatachecksumsWorkerShmem->enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		enabling_checksums = DatachecksumsWorkerShmem->launch_enable_checksums;
+		DatachecksumsWorkerShmem->cost_delay = DatachecksumsWorkerShmem->launch_cost_delay;
+		DatachecksumsWorkerShmem->cost_limit = DatachecksumsWorkerShmem->launch_cost_limit;
+		LWLockRelease(DatachecksumsWorkerLock);
+		goto again;
+	}
+
+	launcher_running = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+	LWLockRelease(DatachecksumsWorkerLock);
+}
+
+/*
+ * ProcessAllDatabases
+ *		Compute the list of all databases and process checksums in each
+ *
+ * This will repeatedly generate a list of databases to process for enabling
+ * checksums. Until no new databases are found, this will loop around computing
+ * a new list and comparing it to the already seen ones.
+ */
+static bool
+ProcessAllDatabases(bool *already_connected)
+{
+	List	   *DatabaseList;
+	HTAB	   *ProcessedDatabases = NULL;
+	ListCell   *lc;
+	HASHCTL		hash_ctl;
+	bool		found_failed = false;
+
+	/* Initialize a hash tracking all processed databases */
+	memset(&hash_ctl, 0, sizeof(hash_ctl));
+	hash_ctl.keysize = sizeof(Oid);
+	hash_ctl.entrysize = sizeof(DatachecksumsWorkerResultEntry);
+	ProcessedDatabases = hash_create("Processed databases",
+									 64,
+									 &hash_ctl,
+									 HASH_ELEM | HASH_BLOBS);
+
+	/*
+	 * Initialize a connection to shared catalogs only.
+	 */
+	if (!*already_connected)
+		BackgroundWorkerInitializeConnection(NULL, NULL, 0);
+
+	*already_connected = true;
+
+	/*
+	 * Set up so first run processes shared catalogs, but not once in every
+	 * db.
+	 */
+	DatachecksumsWorkerShmem->process_shared_catalogs = true;
+
+	/*
+	 * Get a list of all databases to process. This may include databases that
+	 * were created during our runtime.  Since a database can be created as a
+	 * copy of any other database (which may not have existed in our last
+	 * run), we have to repeat this loop until no new databases show up in the
+	 * list.
+	 */
+	DatabaseList = BuildDatabaseList();
+
+	while (true)
+	{
+		int			processed_databases = 0;
+
+		foreach(lc, DatabaseList)
+		{
+			DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+			DatachecksumsWorkerResult result;
+			DatachecksumsWorkerResultEntry *entry;
+			bool		found;
+
+			elog(DEBUG1,
+				 "starting processing of database %s with oid %u",
+				 db->dbname, db->dboid);
+
+			entry = (DatachecksumsWorkerResultEntry *) hash_search(ProcessedDatabases, &db->dboid,
+																   HASH_FIND, NULL);
+
+			if (entry)
+			{
+				if (entry->result == DATACHECKSUMSWORKER_RETRYDB)
+				{
+					/*
+					 * Limit the number of retries to avoid infinite looping
+					 * in case there simply wont be enough workers in the
+					 * cluster to finish this operation.
+					 */
+					if (entry->retries > DATACHECKSUMSWORKER_MAX_DB_RETRIES)
+						entry->result = DATACHECKSUMSWORKER_FAILED;
+				}
+
+				/* Skip if this database has been processed already */
+				if (entry->result != DATACHECKSUMSWORKER_RETRYDB)
+				{
+					pfree(db->dbname);
+					pfree(db);
+					continue;
+				}
+			}
+
+			result = ProcessDatabase(db);
+			processed_databases++;
+
+			if (result == DATACHECKSUMSWORKER_SUCCESSFUL)
+			{
+				/*
+				 * If one database has completed shared catalogs, we don't
+				 * have to process them again.
+				 */
+				if (DatachecksumsWorkerShmem->process_shared_catalogs)
+					DatachecksumsWorkerShmem->process_shared_catalogs = false;
+			}
+			else if (result == DATACHECKSUMSWORKER_ABORTED)
+			{
+				/* Abort flag set, so exit the whole process */
+				return false;
+			}
+
+			entry = hash_search(ProcessedDatabases, &db->dboid, HASH_ENTER, &found);
+			entry->dboid = db->dboid;
+			entry->result = result;
+			if (!found)
+				entry->retries = 0;
+			else
+				entry->retries++;
+
+			pfree(db->dbname);
+			pfree(db);
+		}
+
+		elog(DEBUG1,
+			 "%i databases processed for data checksum enabling, %s",
+			 processed_databases,
+			 (processed_databases ? "process with restart" : "process completed"));
+
+		list_free(DatabaseList);
+
+		/*
+		 * If no databases were processed in this run of the loop, we have now
+		 * finished all databases and no concurrently created ones can exist.
+		 */
+		if (processed_databases == 0)
+			break;
+
+		/*
+		 * Re-generate the list of databases for another pass. Since we wait
+		 * for all pre-existing transactions finish, this way we can be
+		 * certain that there are no databases left without checksums.
+		 */
+		WaitForAllTransactionsToFinish();
+		DatabaseList = BuildDatabaseList();
+	}
+
+	/*
+	 * ProcessedDatabases now has all databases and the results of their
+	 * processing. Failure to enable checksums for a database can be because
+	 * they actually failed for some reason, or because the database was
+	 * dropped between us getting the database list and trying to process it.
+	 * Get a fresh list of databases to detect the second case where the
+	 * database was dropped before we had started processing it. If a database
+	 * still exists, but enabling checksums failed then we fail the entire
+	 * checksumming process and exit with an error.
+	 */
+	WaitForAllTransactionsToFinish();
+	DatabaseList = BuildDatabaseList();
+
+	foreach(lc, DatabaseList)
+	{
+		DatachecksumsWorkerDatabase *db = (DatachecksumsWorkerDatabase *) lfirst(lc);
+		DatachecksumsWorkerResult *entry;
+		bool		found;
+
+		entry = hash_search(ProcessedDatabases, (void *) &db->dboid,
+							HASH_FIND, &found);
+
+		/*
+		 * We are only interested in the databases where the failed database
+		 * still exists.
+		 */
+		if (found && *entry == DATACHECKSUMSWORKER_FAILED)
+		{
+			ereport(WARNING,
+					(errmsg("failed to enable data checksums in \"%s\"",
+							db->dbname)));
+			found_failed = found;
+			continue;
+		}
+	}
+
+	if (found_failed)
+	{
+		/* Disable checksums on cluster, because we failed */
+		SetDataChecksumsOff();
+		ereport(ERROR,
+				(errmsg("checksums failed to get enabled in all databases, aborting"),
+				 errhint("The server log might have more information on the error.")));
+	}
+
+	/*
+	 * Force a checkpoint to get everything out to disk. TODO: we probably
+	 * don't want to use a CHECKPOINT_IMMEDIATE here but it's very convenient
+	 * for testing until the patch is fully baked, as it may otherwise make
+	 * tests take a lot longer.
+	 */
+	RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
+
+	return true;
+}
+
+/*
+ * DatachecksumsWorkerShmemSize
+ *		Compute required space for datachecksumsworker-related shared memory
+ */
+Size
+DatachecksumsWorkerShmemSize(void)
+{
+	Size		size;
+
+	size = sizeof(DatachecksumsWorkerShmemStruct);
+	size = MAXALIGN(size);
+
+	return size;
+}
+
+/*
+ * DatachecksumsWorkerShmemInit
+ *		Allocate and initialize datachecksumsworker-related shared memory
+ */
+void
+DatachecksumsWorkerShmemInit(void)
+{
+	bool		found;
+
+	DatachecksumsWorkerShmem = (DatachecksumsWorkerShmemStruct *)
+		ShmemInitStruct("DatachecksumsWorker Data",
+						DatachecksumsWorkerShmemSize(),
+						&found);
+
+	MemSet(DatachecksumsWorkerShmem, 0, DatachecksumsWorkerShmemSize());
+
+	/*
+	 * Even if this is a redundant assignment, we want to be explicit about
+	 * our intent for readability, since we want to be able to query this
+	 * state in case of restartability.
+	 */
+	DatachecksumsWorkerShmem->launch_enable_checksums = false;
+	DatachecksumsWorkerShmem->launcher_running = false;
+}
+
+/*
+ * BuildDatabaseList
+ *		Compile a list of all currently available databases in the cluster
+ *
+ * This creates the list of databases for the datachecksumsworker workers to
+ * add checksums to. If the caller wants to ensure that no concurrently
+ * running CREATE DATABASE calls exist, this needs to be preceeded by a call
+ * to WaitForAllTransactionsToFinish().
+ */
+static List *
+BuildDatabaseList(void)
+{
+	List	   *DatabaseList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(DatabaseRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_database pgdb = (Form_pg_database) GETSTRUCT(tup);
+		DatachecksumsWorkerDatabase *db;
+
+		oldctx = MemoryContextSwitchTo(ctx);
+
+		db = (DatachecksumsWorkerDatabase *) palloc(sizeof(DatachecksumsWorkerDatabase));
+
+		db->dboid = pgdb->oid;
+		db->dbname = pstrdup(NameStr(pgdb->datname));
+
+		DatabaseList = lappend(DatabaseList, db);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return DatabaseList;
+}
+
+/*
+ * BuildRelationList
+ *		Compile a list of relations in the database
+ *
+ * Returns a list of OIDs for the request relation types. If temp_relations
+ * is True then only temporary relations are returned. If temp_relations is
+ * False then non-temporary relations which have data checksums are returned.
+ * If include_shared is True then shared relations are included as well in a
+ * non-temporary list. include_shared has no relevance when building a list of
+ * temporary relations.
+ */
+static List *
+BuildRelationList(bool temp_relations, bool include_shared)
+{
+	List	   *RelationList = NIL;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tup;
+	MemoryContext ctx = CurrentMemoryContext;
+	MemoryContext oldctx;
+
+	StartTransactionCommand();
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 0, NULL);
+
+	while (HeapTupleIsValid(tup = heap_getnext(scan, ForwardScanDirection)))
+	{
+		Form_pg_class pgc = (Form_pg_class) GETSTRUCT(tup);
+
+		/*
+		 * Only include temporary relations when asked for a temp relation
+		 * list.
+		 */
+		if (pgc->relpersistence == RELPERSISTENCE_TEMP)
+		{
+			if (!temp_relations)
+				continue;
+		}
+		else
+		{
+			if (temp_relations)
+				continue;
+
+			if (!RELKIND_HAS_STORAGE(pgc->relkind))
+				continue;
+
+			if (pgc->relisshared && !include_shared)
+				continue;
+		}
+
+		oldctx = MemoryContextSwitchTo(ctx);
+		RelationList = lappend_oid(RelationList, pgc->oid);
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+
+	CommitTransactionCommand();
+
+	return RelationList;
+}
+
+/*
+ * DatachecksumsWorkerMain
+ *
+ * Main function for enabling checksums in a single database, This is the
+ * function set as the bgw_function_name in the dynamic background worker
+ * process initiated for each database by the worker launcher. After enabling
+ * data checksums in each applicable relation in the database, it will wait for
+ * all temporary relations that were present when the function started to
+ * disappear before returning. This is required since we cannot rewrite
+ * existing temporary relations with data checksums.
+ */
+void
+DatachecksumsWorkerMain(Datum arg)
+{
+	Oid			dboid = DatumGetObjectId(arg);
+	List	   *RelationList = NIL;
+	List	   *InitialTempTableList = NIL;
+	ListCell   *lc;
+	BufferAccessStrategy strategy;
+	bool		aborted = false;
+
+	enabling_checksums = true;
+
+	pqsignal(SIGTERM, die);
+
+	BackgroundWorkerUnblockSignals();
+
+	MyBackendType = B_DATACHECKSUMSWORKER_WORKER;
+	init_ps_display(NULL);
+
+	ereport(DEBUG1,
+			(errmsg("starting data checksum processing in database with OID %u",
+					dboid)));
+
+	BackgroundWorkerInitializeConnectionByOid(dboid, InvalidOid,
+											  BGWORKER_BYPASS_ALLOWCONN);
+
+	/*
+	 * Get a list of all temp tables present as we start in this database. We
+	 * need to wait until they are all gone until we are done, since we cannot
+	 * access these relations and modify them.
+	 */
+	InitialTempTableList = BuildRelationList(true, false);
+
+	/*
+	 * Enable vacuum cost delay, if any.
+	 */
+	Assert(DatachecksumsWorkerShmem->enabling_checksums);
+	VacuumCostDelay = DatachecksumsWorkerShmem->cost_delay;
+	VacuumCostLimit = DatachecksumsWorkerShmem->cost_limit;
+	VacuumCostActive = (VacuumCostDelay > 0);
+	VacuumCostBalance = 0;
+	VacuumPageHit = 0;
+	VacuumPageMiss = 0;
+	VacuumPageDirty = 0;
+
+	/*
+	 * Create and set the vacuum strategy as our buffer strategy.
+	 */
+	strategy = GetAccessStrategy(BAS_VACUUM);
+
+	RelationList = BuildRelationList(false,
+									 DatachecksumsWorkerShmem->process_shared_catalogs);
+	foreach(lc, RelationList)
+	{
+		Oid			reloid = lfirst_oid(lc);
+
+		if (!ProcessSingleRelationByOid(reloid, strategy))
+		{
+			aborted = true;
+			break;
+		}
+	}
+	list_free(RelationList);
+
+	if (aborted)
+	{
+		DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+		ereport(DEBUG1,
+				(errmsg("data checksum processing aborted in database OID %u",
+						dboid)));
+		return;
+	}
+
+	/*
+	 * Wait for all temp tables that existed when we started to go away. This
+	 * is necessary since we cannot "reach" them to enable checksums. Any temp
+	 * tables created after we started will already have checksums in them
+	 * (due to the "inprogress-on" state), so no need to wait for those.
+	 */
+	for (;;)
+	{
+		List	   *CurrentTempTables;
+		ListCell   *lc;
+		int			numleft;
+		char		activity[64];
+
+		CurrentTempTables = BuildRelationList(true, false);
+		numleft = 0;
+		foreach(lc, InitialTempTableList)
+		{
+			if (list_member_oid(CurrentTempTables, lfirst_oid(lc)))
+				numleft++;
+		}
+		list_free(CurrentTempTables);
+
+		if (numleft == 0)
+			break;
+
+		/* At least one temp table is left to wait for */
+		snprintf(activity,
+				 sizeof(activity),
+				 "Waiting for %d temp tables to be removed", numleft);
+		pgstat_report_activity(STATE_RUNNING, activity);
+
+		/* Retry every 5 seconds */
+		ResetLatch(MyLatch);
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+						 5000,
+						 WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION);
+
+		LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+		aborted = DatachecksumsWorkerShmem->launch_enable_checksums != enabling_checksums;
+		LWLockRelease(DatachecksumsWorkerLock);
+
+		if (aborted || abort_requested)
+		{
+			DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_ABORTED;
+			ereport(DEBUG1,
+					(errmsg("data checksum processing aborted in database OID %u",
+							dboid)));
+			return;
+		}
+	}
+
+	list_free(InitialTempTableList);
+
+	DatachecksumsWorkerShmem->success = DATACHECKSUMSWORKER_SUCCESSFUL;
+	ereport(DEBUG1,
+			(errmsg("data checksum processing completed in database with OID %u",
+					dboid)));
+}
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index f75b52719d..0fef097eb8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4017,6 +4017,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_CHECKPOINT_START:
 			event_name = "CheckpointStart";
 			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION:
+			event_name = "ChecksumEnableStartCondition";
+			break;
+		case WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION:
+			event_name = "ChecksumEnableFinishCondition";
+			break;
 		case WAIT_EVENT_EXECUTE_GATHER:
 			event_name = "ExecuteGather";
 			break;
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..cc494b6f13 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1612,7 +1612,7 @@ sendFile(const char *readfilename, const char *tarfilename,
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
-	if (!noverify_checksums && DataChecksumsEnabled())
+	if (!noverify_checksums)
 	{
 		char	   *filename;
 
@@ -1698,7 +1698,14 @@ sendFile(const char *readfilename, const char *tarfilename,
 				 */
 				if (!PageIsNew(page) && PageGetLSN(page) < startptr)
 				{
+					HOLD_INTERRUPTS();
+					if (!DataChecksumsNeedVerify())
+					{
+						RESUME_INTERRUPTS();
+						continue;
+					}
 					checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
+					RESUME_INTERRUPTS();
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afa1df00d0..d9c482454f 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -223,6 +223,7 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_FPW_CHANGE:
 		case XLOG_FPI_FOR_HINT:
 		case XLOG_FPI:
+		case XLOG_CHECKSUMS:
 			break;
 		default:
 			elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 561c212092..8c918df902 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -919,6 +919,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			if (track_io_timing)
 				INSTR_TIME_SET_CURRENT(io_start);
 
+			HOLD_INTERRUPTS();
+
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
 			if (track_io_timing)
@@ -949,6 +951,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 									blockNum,
 									relpath(smgr->smgr_rnode, forkNum))));
 			}
+
+			RESUME_INTERRUPTS();
 		}
 	}
 
@@ -2809,6 +2813,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 	 * buffer, other processes might be updating hint bits in it, so we must
 	 * copy the page to private storage if we do checksumming.
 	 */
+	HOLD_INTERRUPTS();
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
 	if (track_io_timing)
@@ -2822,6 +2827,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 			  buf->tag.blockNum,
 			  bufToWrite,
 			  false);
+	RESUME_INTERRUPTS();
 
 	if (track_io_timing)
 	{
@@ -2944,8 +2950,13 @@ BufferGetLSNAtomic(Buffer buffer)
 	/*
 	 * If we don't need locking for correctness, fastpath out.
 	 */
+	HOLD_INTERRUPTS();
 	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer))
+	{
+		RESUME_INTERRUPTS();
 		return PageGetLSN(page);
+	}
+	RESUME_INTERRUPTS();
 
 	/* Make sure we've got a real buffer, and that we hold a pin on it. */
 	Assert(BufferIsValid(buffer));
@@ -3468,6 +3479,7 @@ FlushRelationBuffers(Relation rel)
 				errcallback.previous = error_context_stack;
 				error_context_stack = &errcallback;
 
+				HOLD_INTERRUPTS();
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
 				smgrwrite(rel->rd_smgr,
@@ -3475,6 +3487,7 @@ FlushRelationBuffers(Relation rel)
 						  bufHdr->tag.blockNum,
 						  localpage,
 						  false);
+				RESUME_INTERRUPTS();
 
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 04b3558ea3..f5a70d390d 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -18,6 +18,7 @@
 #include "access/parallel.h"
 #include "catalog/catalog.h"
 #include "executor/instrument.h"
+#include "miscadmin.h"
 #include "storage/buf_internals.h"
 #include "storage/bufmgr.h"
 #include "utils/guc.h"
@@ -217,6 +218,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 		/* Find smgr relation for buffer */
 		oreln = smgropen(bufHdr->tag.rnode, MyBackendId);
 
+		HOLD_INTERRUPTS();
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
 		/* And write... */
@@ -225,6 +227,7 @@ LocalBufferAlloc(SMgrRelation smgr, ForkNumber forkNum, BlockNumber blockNum,
 				  bufHdr->tag.blockNum,
 				  localpage,
 				  false);
+		RESUME_INTERRUPTS();
 
 		/* Mark not-dirty now in case we error out below */
 		buf_state &= ~BM_DIRTY;
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 8c12dda238..5da952a794 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -637,10 +637,12 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
 
 	while (fsm_nblocks_now < fsm_nblocks)
 	{
+		HOLD_INTERRUPTS();
 		PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
 
 		smgrextend(rel->rd_smgr, FSM_FORKNUM, fsm_nblocks_now,
 				   pg.data, false);
+		RESUME_INTERRUPTS();
 		fsm_nblocks_now++;
 	}
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..c7928f3495 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -28,6 +28,7 @@
 #include "postmaster/autovacuum.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/datachecksumsworker.h"
 #include "postmaster/postmaster.h"
 #include "replication/logicallauncher.h"
 #include "replication/origin.h"
@@ -149,6 +150,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, DatachecksumsWorkerShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -259,6 +261,7 @@ CreateSharedMemoryAndSemaphores(void)
 	WalSndShmemInit();
 	WalRcvShmemInit();
 	ApplyLauncherShmemInit();
+	DatachecksumsWorkerShmemInit();
 
 	/*
 	 * Set up other modules that need some shared memory space
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..a3720617f9 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/parallel.h"
+#include "access/xlog.h"
 #include "port/pg_bitutils.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -98,7 +99,6 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 static void ResetProcSignalBarrierBits(uint32 flags);
-static bool ProcessBarrierPlaceholder(void);
 
 /*
  * ProcSignalShmemSize
@@ -538,8 +538,17 @@ ProcessProcSignalBarrier(void)
 				type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags);
 				switch (type)
 				{
-					case PROCSIGNAL_BARRIER_PLACEHOLDER:
-						processed = ProcessBarrierPlaceholder();
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON:
+						processed = AbsorbChecksumsOnInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_ON:
+						processed = AbsorbChecksumsOnBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF:
+						processed = AbsorbChecksumsOffInProgressBarrier();
+						break;
+					case PROCSIGNAL_BARRIER_CHECKSUM_OFF:
+						processed = AbsorbChecksumsOffBarrier();
 						break;
 				}
 
@@ -604,24 +613,6 @@ ResetProcSignalBarrierBits(uint32 flags)
 	InterruptPending = true;
 }
 
-static bool
-ProcessBarrierPlaceholder(void)
-{
-	/*
-	 * XXX. This is just a placeholder until the first real user of this
-	 * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
-	 * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
-	 * appropriately descriptive. Get rid of this function and instead have
-	 * ProcessBarrierSomethingElse. Most likely, that function should live in
-	 * the file pertaining to that subsystem, rather than here.
-	 *
-	 * The return value should be 'true' if the barrier was successfully
-	 * absorbed and 'false' if not. Note that returning 'false' can lead to
-	 * very frequent retries, so try hard to make that an uncommon case.
-	 */
-	return true;
-}
-
 /*
  * CheckProcSignal - check to see if a particular reason has been
  * signaled, and clear the signal flag.  Should be called after receiving
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index 6c7cf6c295..5b083749d5 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock					44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock				46
 NotifyQueueTailLock					47
+DatachecksumsWorkerLock				48
diff --git a/src/backend/storage/page/README b/src/backend/storage/page/README
index e30d7ac59a..78edf57adc 100644
--- a/src/backend/storage/page/README
+++ b/src/backend/storage/page/README
@@ -10,7 +10,9 @@ http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
 2010/12/22 on -hackers list.
 
 Current implementation requires this be enabled system-wide at initdb time, or
-by using the pg_checksums tool on an offline cluster.
+by using the pg_checksums tool on an offline cluster. Checksums can also be
+turned on and off using pg_enable_data_checksums()/pg_disable_data_checksums()
+at runtime.
 
 The checksum is not valid at all times on a data page!!
 The checksum is valid when the page leaves the shared pool and is checked
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 9ac556b4ae..d88f8c5e49 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -100,7 +100,7 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
 	 */
 	if (!PageIsNew(page))
 	{
-		if (DataChecksumsEnabled())
+		if (DataChecksumsNeedVerify())
 		{
 			checksum = pg_checksum_page((char *) page, blkno);
 
@@ -1394,10 +1394,6 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 {
 	static char *pageCopy = NULL;
 
-	/* If we don't need a checksum, just return the passed-in data */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
-		return (char *) page;
-
 	/*
 	 * We allocate the copy space once and use it over on each subsequent
 	 * call.  The point of palloc'ing here, rather than having a static char
@@ -1407,6 +1403,10 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
 	if (pageCopy == NULL)
 		pageCopy = MemoryContextAlloc(TopMemoryContext, BLCKSZ);
 
+	/* If we don't need a checksum, just return the passed-in data */
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
+		return (char *) page;
+
 	memcpy(pageCopy, (char *) page, BLCKSZ);
 	((PageHeader) pageCopy)->pd_checksum = pg_checksum_page(pageCopy, blkno);
 	return pageCopy;
@@ -1422,7 +1422,7 @@ void
 PageSetChecksumInplace(Page page, BlockNumber blkno)
 {
 	/* If we don't need a checksum, just return */
-	if (PageIsNew(page) || !DataChecksumsEnabled())
+	if (PageIsNew(page) || !DataChecksumsNeedWrite())
 		return;
 
 	((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 62bff52638..4ac396ccf1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1567,9 +1567,6 @@ pg_stat_get_db_checksum_failures(PG_FUNCTION_ARGS)
 	int64		result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
@@ -1585,9 +1582,6 @@ pg_stat_get_db_checksum_last_failure(PG_FUNCTION_ARGS)
 	TimestampTz result;
 	PgStat_StatDBEntry *dbentry;
 
-	if (!DataChecksumsEnabled())
-		PG_RETURN_NULL();
-
 	if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
 		result = 0;
 	else
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 0f67b99cc5..045da21904 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -275,6 +275,12 @@ GetBackendTypeDesc(BackendType backendType)
 		case B_LOGGER:
 			backendDesc = "logger";
 			break;
+		case B_DATACHECKSUMSWORKER_LAUNCHER:
+			backendDesc = "datachecksumsworker launcher";
+			break;
+		case B_DATACHECKSUMSWORKER_WORKER:
+			backendDesc = "datachecksumsworker worker";
+			break;
 	}
 
 	return backendDesc;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index e5965bc517..92367ece4b 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -606,6 +606,11 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 	if (MyBackendId > MaxBackends || MyBackendId <= 0)
 		elog(FATAL, "bad backend ID: %d", MyBackendId);
 
+	/*
+	 * Set up backend local cache of Controldata values.
+	 */
+	InitLocalControldata();
+
 	/* Now that we have a BackendId, we can participate in ProcSignal */
 	ProcSignalInit(MyBackendId);
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eafdb1118e..3d108a2348 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -36,6 +36,7 @@
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
@@ -76,6 +77,7 @@
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
 #include "storage/bufmgr.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/large_object.h"
@@ -500,6 +502,17 @@ static struct config_enum_entry shared_memory_options[] = {
 	{NULL, 0, false}
 };
 
+/*
+ * Options for data_checksums enum.
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+	{"on", DATA_CHECKSUMS_ON, true},
+	{"off", DATA_CHECKSUMS_OFF, true},
+	{"inprogress-on", DATA_CHECKSUMS_INPROGRESS_ON, true},
+	{"inprogress-off", DATA_CHECKSUMS_INPROGRESS_OFF, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -609,7 +622,7 @@ static int	max_identifier_length;
 static int	block_size;
 static int	segment_size;
 static int	wal_block_size;
-static bool data_checksums;
+static int	data_checksums;
 static bool integer_datetimes;
 static bool assert_enabled;
 static bool in_hot_standby;
@@ -1910,17 +1923,6 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
-			gettext_noop("Shows whether data checksums are turned on for this cluster."),
-			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
-		},
-		&data_checksums,
-		false,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
 			gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
@@ -4830,6 +4832,17 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+			gettext_noop("Shows whether data checksums are turned on for this cluster."),
+			NULL,
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+		},
+		&data_checksums,
+		DATA_CHECKSUMS_OFF, data_checksum_options,
+		NULL, NULL, show_data_checksums
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 0223ee4408..f3f029f41e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -600,7 +600,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	if (ControlFile->data_checksum_version > 0 &&
+	if (ControlFile->data_checksum_version == DATA_CHECKSUMS_ON &&
 		mode == PG_MODE_ENABLE)
 	{
 		pg_log_error("data checksums are already enabled in cluster");
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
index 4f647cdf33..1298857458 100644
--- a/src/bin/pg_upgrade/controldata.c
+++ b/src/bin/pg_upgrade/controldata.c
@@ -671,6 +671,15 @@ check_control_data(ControlData *oldctrl,
 	 * check_for_isn_and_int8_passing_mismatch().
 	 */
 
+	/*
+	 * If checksums have been turned on in the old cluster, but the
+	 * datachecksumsworker have yet to finish, then disallow upgrading. The
+	 * user should either let the process finish, or turn off checksums,
+	 * before retrying.
+	 */
+	if (oldctrl->data_checksum_version == 2)
+		pg_fatal("checksum enabling in old cluster is in progress\n");
+
 	/*
 	 * We might eventually allow upgrades from checksum to no-checksum
 	 * clusters.
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 919a7849fd..b35cd4d503 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -218,7 +218,7 @@ typedef struct
 	uint32		large_object;
 	bool		date_is_int;
 	bool		float8_pass_by_value;
-	bool		data_checksum_version;
+	uint32		data_checksum_version;
 } ControlData;
 
 /*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 75ec1073bd..6947c09591 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -198,8 +198,11 @@ extern PGDLLIMPORT int wal_level;
  * individual bits on a page, it's still consistent no matter what combination
  * of the bits make it to disk, but the checksum wouldn't match.  Also WAL-log
  * them if forced by wal_log_hints=on.
+ *
+ * Since XLogHintBitIsNeeded calls DataChecksumsNeedWrite, interrupts must be
+ * held off during this call.
  */
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())
 
 /* Do we need to WAL-log information required only for Hot Standby and logical replication? */
 #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -318,7 +321,19 @@ extern TimestampTz GetCurrentChunkReplayStartTime(void);
 extern void UpdateControlFile(void);
 extern uint64 GetSystemIdentifier(void);
 extern char *GetMockAuthenticationNonce(void);
-extern bool DataChecksumsEnabled(void);
+extern bool DataChecksumsNeedWrite(void);
+extern bool DataChecksumsNeedVerify(void);
+extern bool DataChecksumsOnInProgress(void);
+extern bool DataChecksumsOffInProgress(void);
+extern void SetDataChecksumsOnInProgress(void);
+extern void SetDataChecksumsOn(void);
+extern void SetDataChecksumsOff(void);
+extern bool AbsorbChecksumsOnInProgressBarrier(void);
+extern bool AbsorbChecksumsOffInProgressBarrier(void);
+extern bool AbsorbChecksumsOnBarrier(void);
+extern bool AbsorbChecksumsOffBarrier(void);
+extern const char *show_data_checksums(void);
+extern void InitLocalControldata(void);
 extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
 extern Size XLOGShmemSize(void);
 extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 224cae0246..adbe81e890 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -25,6 +25,7 @@
 #include "lib/stringinfo.h"
 #include "pgtime.h"
 #include "storage/block.h"
+#include "storage/checksum.h"
 #include "storage/relfilenode.h"
 
 
@@ -249,6 +250,12 @@ typedef struct xl_restore_point
 	char		rp_name[MAXFNAMELEN];
 } xl_restore_point;
 
+/* Information logged when checksum level is changed */
+typedef struct xl_checksum_state
+{
+	ChecksumType new_checksumtype;
+}			xl_checksum_state;
+
 /* End of recovery mark, when we don't do an END_OF_RECOVERY checkpoint */
 typedef struct xl_end_of_recovery
 {
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e3f48158ce..d8229422af 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -76,6 +76,7 @@ typedef struct CheckPoint
 #define XLOG_END_OF_RECOVERY			0x90
 #define XLOG_FPI_FOR_HINT				0xA0
 #define XLOG_FPI						0xB0
+#define XLOG_CHECKSUMS					0xC0
 
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1487710d59..e954abd4b6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11313,6 +11313,22 @@
   proname => 'jsonb_subscript_handler', prorettype => 'internal',
   proargtypes => 'internal', prosrc => 'jsonb_subscript_handler' },
 
+{ oid => '9258',
+  descr => 'disable data checksums',
+  proname => 'pg_disable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => '',
+  prosrc => 'disable_data_checksums' },
+
+{ oid => '9257',
+  descr => 'enable data checksums',
+  proname => 'pg_enable_data_checksums', provolatile => 'v', prorettype => 'void',
+  proparallel => 'r',
+  proargtypes => 'int4 int4', proallargtypes => '{int4,int4}',
+  proargmodes => '{i,i}',
+  proargnames => '{cost_delay,cost_limit}',
+  prosrc => 'enable_data_checksums' },
+
 # collation management functions
 { oid => '3445', descr => 'import collations from operating system',
   proname => 'pg_import_system_collations', procost => '100',
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 1bdc97e308..f013acba76 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -324,6 +324,8 @@ typedef enum BackendType
 	B_ARCHIVER,
 	B_STATS_COLLECTOR,
 	B_LOGGER,
+	B_DATACHECKSUMSWORKER_LAUNCHER,
+	B_DATACHECKSUMSWORKER_WORKER,
 } BackendType;
 
 extern BackendType MyBackendType;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 724068cf87..0974dfadfe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -963,6 +963,8 @@ typedef enum
 	WAIT_EVENT_BTREE_PAGE,
 	WAIT_EVENT_CHECKPOINT_DONE,
 	WAIT_EVENT_CHECKPOINT_START,
+	WAIT_EVENT_CHECKSUM_ENABLE_STARTCONDITION,
+	WAIT_EVENT_CHECKSUM_ENABLE_FINISHCONDITION,
 	WAIT_EVENT_EXECUTE_GATHER,
 	WAIT_EVENT_HASH_BATCH_ALLOCATE,
 	WAIT_EVENT_HASH_BATCH_ELECT,
diff --git a/src/include/postmaster/datachecksumsworker.h b/src/include/postmaster/datachecksumsworker.h
new file mode 100644
index 0000000000..560f0aa5f2
--- /dev/null
+++ b/src/include/postmaster/datachecksumsworker.h
@@ -0,0 +1,29 @@
+/*-------------------------------------------------------------------------
+ *
+ * datachecksumsworker.h
+ *	  header file for checksum helper background worker
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/postmaster/datachecksumsworker.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef DATACHECKSUMSWORKER_H
+#define DATACHECKSUMSWORKER_H
+
+/* Shared memory */
+extern Size DatachecksumsWorkerShmemSize(void);
+extern void DatachecksumsWorkerShmemInit(void);
+
+/* Start the background processes for enabling or disabling checksums */
+void		StartDatachecksumsWorkerLauncher(bool enable_checksums,
+											 int cost_delay, int cost_limit);
+
+/* Background worker entrypoints */
+void		DatachecksumsWorkerLauncherMain(Datum arg);
+void		DatachecksumsWorkerMain(Datum arg);
+
+#endif							/* DATACHECKSUMSWORKER_H */
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 359b749f7f..c35b747520 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -198,6 +198,9 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION		4
 #define PG_DATA_CHECKSUM_VERSION	1
+#define PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION		2
+#define PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION		3
+
 
 /* ----------------------------------------------------------------
  *						page support macros
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 80d2359192..f736b12f98 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,6 +15,14 @@
 
 #include "storage/block.h"
 
+typedef enum ChecksumType
+{
+	DATA_CHECKSUMS_OFF = 0,
+	DATA_CHECKSUMS_ON,
+	DATA_CHECKSUMS_INPROGRESS_ON,
+	DATA_CHECKSUMS_INPROGRESS_OFF
+}			ChecksumType;
+
 /*
  * Compute the checksum for a Postgres page.  The page must be aligned on a
  * 4-byte boundary.
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 4ae7dc33b8..d865796d04 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -48,12 +48,10 @@ typedef enum
 
 typedef enum
 {
-	/*
-	 * XXX. PROCSIGNAL_BARRIER_PLACEHOLDER should be replaced when the first
-	 * real user of the ProcSignalBarrier mechanism is added. It's just here
-	 * for now because we can't have an empty enum.
-	 */
-	PROCSIGNAL_BARRIER_PLACEHOLDER = 0
+	PROCSIGNAL_BARRIER_CHECKSUM_OFF = 0,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_ON,
+	PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF,
+	PROCSIGNAL_BARRIER_CHECKSUM_ON
 } ProcSignalBarrierType;
 
 /*
diff --git a/src/test/Makefile b/src/test/Makefile
index f7859c2fd5..f468709e7e 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -13,7 +13,7 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
 SUBDIRS = perl regress isolation modules authentication recovery subscription \
-	  locale
+	  locale checksum
 
 # Test suites that are not safe by default but can be run if selected
 # by the user via the whitespace-separated list in variable
diff --git a/src/test/checksum/.gitignore b/src/test/checksum/.gitignore
new file mode 100644
index 0000000000..871e943d50
--- /dev/null
+++ b/src/test/checksum/.gitignore
@@ -0,0 +1,2 @@
+# Generated by test suite
+/tmp_check/
diff --git a/src/test/checksum/Makefile b/src/test/checksum/Makefile
new file mode 100644
index 0000000000..fd60f7e97f
--- /dev/null
+++ b/src/test/checksum/Makefile
@@ -0,0 +1,23 @@
+#-------------------------------------------------------------------------
+#
+# Makefile for src/test/checksum
+#
+# Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
+# src/test/checksum/Makefile
+#
+#-------------------------------------------------------------------------
+
+subdir = src/test/checksum
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
+
+check:
+	$(prove_check)
+
+installcheck:
+	$(prove_installcheck)
+
+clean distclean maintainer-clean:
+	rm -rf tmp_check
diff --git a/src/test/checksum/README b/src/test/checksum/README
new file mode 100644
index 0000000000..0f0317060b
--- /dev/null
+++ b/src/test/checksum/README
@@ -0,0 +1,22 @@
+src/test/checksum/README
+
+Regression tests for data checksums
+===================================
+
+This directory contains a test suite for enabling data checksums
+in a running cluster.
+
+Running the tests
+=================
+
+    make check
+
+or
+
+    make installcheck
+
+NOTE: This creates a temporary installation (in the case of "check"),
+with multiple nodes, be they master or standby(s) for the purpose of
+the tests.
+
+NOTE: This requires the --enable-tap-tests argument to configure.
diff --git a/src/test/checksum/t/001_basic.pl b/src/test/checksum/t/001_basic.pl
new file mode 100644
index 0000000000..a81c1784b7
--- /dev/null
+++ b/src/test/checksum/t/001_basic.pl
@@ -0,0 +1,74 @@
+# Test suite for testing enabling data checksums in an online cluster
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are turned off
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable data checksums
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Wait for checksums to become enabled
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Enable data checksums again which should be a no-op..
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+# ..and make sure we can still read/write data
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums again
+$node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are disabled');
+
+# Test reading again
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure previously checksummed pages can be read back');
+
+# Re-enable checksums and make sure that the underlying data has changed such
+# that checksums will be different.
+$node->safe_psql('postgres', "UPDATE t SET a = a + 1;");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$node->stop;
+
+done_testing();
diff --git a/src/test/checksum/t/002_restarts.pl b/src/test/checksum/t/002_restarts.pl
new file mode 100644
index 0000000000..46769c2b6f
--- /dev/null
+++ b/src/test/checksum/t/002_restarts.pl
@@ -0,0 +1,94 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# restarting the processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, '1', "ensure checksums aren't enabled yet");
+
+$node->stop;
+$node->start;
+
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'inprogress-on', "ensure checksums aren't enabled yet");
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are turned on');
+
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+$result = $node->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are turned off');
+
+done_testing();
diff --git a/src/test/checksum/t/003_standby_checksum.pl b/src/test/checksum/t/003_standby_checksum.pl
new file mode 100644
index 0000000000..6495c66a67
--- /dev/null
+++ b/src/test/checksum/t/003_standby_checksum.pl
@@ -0,0 +1,121 @@
+# Test suite for testing enabling data checksums in an online cluster with
+# streaming replication
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+
+# Initialize primary node
+my $node_primary = get_new_node('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->start;
+my $backup_name = 'my_backup';
+
+# Take backup
+$node_primary->backup($backup_name);
+
+# Create streaming standby linking to primary
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_primary, $backup_name,
+	has_streaming => 1);
+$node_standby_1->start;
+
+# Create some content on the primary to have un-checksummed data in the cluster
+$node_primary->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Wait for standbys to catch up
+$node_primary->wait_for_catchup($node_standby_1, 'replay',
+	$node_primary->lsn('insert'));
+
+# Check that checksums are turned off on all nodes
+my $result = $node_primary->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on primary');
+
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, "off", 'ensure checksums are turned off on standby_1');
+
+# Enable checksums for the cluster
+$node_primary->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+# Ensure that the primary switches to "inprogress-on"
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	"inprogress-on");
+is($result, 1, 'ensure checksums are in progress on primary');
+
+# Wait for checksum enable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+
+# Ensure that the standby has switched to "inprogress-on" or "on".  Normally it
+# would be "inprogress-on", but it is theoretically possible for the primary to
+# complete the checksum enabling *and* have the standby replay that record
+# before we reach the check below.
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting = 'off' FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'f');
+is($result, 1, 'ensure standby has absorbed the inprogress-on barrier');
+$result = $node_standby_1->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+cmp_ok(
+	$result, '~~',
+	[ "inprogress-on", "on" ],
+	'ensure checksums are on, or in progress, on standby_1');
+
+# Insert some more data which should be checksummed on INSERT
+$node_primary->safe_psql('postgres',
+	"INSERT INTO t VALUES (generate_series(1, 10000));");
+
+# Wait for checksums enabled on the primary
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the primary');
+
+# Wait for checksums enabled on the standby
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'on');
+is($result, 1, 'ensure checksums are enabled on the standby');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, '20000', 'ensure we can safely read all data with checksums');
+
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE backend_type LIKE 'datachecksumsworker%';",
+	'0');
+is($result, 1, 'await datachecksums worker/launcher termination');
+
+# Disable checksums and ensure it's propagated to standby and that we can
+# still read all data
+$node_primary->safe_psql('postgres', "SELECT pg_disable_data_checksums();");
+# Wait for checksum disable to be replayed
+$node_primary->wait_for_catchup($node_standby_1, 'replay');
+$result = $node_primary->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure data checksums are disabled on the primary 2');
+
+# Ensure that the standby has switched to off
+$result = $node_standby_1->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'off');
+is($result, 1, 'ensure checksums are off on standby_1');
+
+$result = $node_primary->safe_psql('postgres', "SELECT count(a) FROM t");
+is($result, "20000", 'ensure we can safely read all data without checksums');
+
+done_testing();
diff --git a/src/test/checksum/t/004_offline.pl b/src/test/checksum/t/004_offline.pl
new file mode 100644
index 0000000000..2dfca4df23
--- /dev/null
+++ b/src/test/checksum/t/004_offline.pl
@@ -0,0 +1,105 @@
+# Test suite for testing enabling data checksums offline from various states
+# of checksum processing
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More;
+use IPC::Run qw(pump finish timer);
+
+# If we don't have IO::Pty, forget it, because IPC::Run depends on that
+# to support pty connections
+eval { require IO::Pty; };
+if ($@)
+{
+	plan skip_all => 'IO::Pty is needed to run this test';
+}
+
+# Initialize node with checksums disabled.
+my $node = get_new_node('main');
+$node->init();
+$node->start();
+
+# Create some content to have un-checksummed data in the cluster
+$node->safe_psql('postgres',
+	"CREATE TABLE t AS SELECT generate_series(1,10000) AS a;");
+
+# Ensure that checksums are disabled
+my $result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Enable checksums offline using pg_checksums
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are enabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+# Disable checksums offline again using pg_checksums
+$node->stop();
+$node->checksum_disable_offline();
+$node->start();
+
+# Ensure that checksums are disabled
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'off', 'ensure checksums are disabled');
+
+# Create a barrier for checksumming to block on, in this case a pre-existing
+# temporary table which is kept open while processing is started. We can
+# accomplish this by setting up an interactive psql process which keeps the
+# temporary table created as we enable checksums in another psql process.
+my $in    = '';
+my $out   = '';
+my $timer = timer(5);
+
+my $h = $node->interactive_psql('postgres', \$in, \$out, $timer);
+
+$out = '';
+$timer->start(5);
+
+$in .= "CREATE TEMPORARY TABLE tt (a integer);\n";
+pump $h until ($out =~ /CREATE TABLE/ || $timer->is_expired);
+
+# In another session, make sure we can see the blocking temp table but start
+# processing anyways and check that we are blocked with a proper wait event.
+$result = $node->safe_psql('postgres',
+	"SELECT relpersistence FROM pg_catalog.pg_class WHERE relname = 'tt';");
+is($result, 't', 'ensure we can see the temporary table');
+
+$node->safe_psql('postgres', "SELECT pg_enable_data_checksums();");
+
+$result = $node->poll_query_until(
+	'postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';",
+	'inprogress-on');
+is($result, 1, 'ensure checksums are in the process of being enabled');
+
+# Turn the cluster off and enable checksums offline, then start back up
+$node->stop();
+$node->checksum_enable_offline();
+$node->start();
+
+# Ensure that checksums are now enabled even though processing wasn't
+# restarted
+$result = $node->safe_psql('postgres',
+	"SELECT setting FROM pg_catalog.pg_settings WHERE name = 'data_checksums';"
+);
+is($result, 'on', 'ensure checksums are enabled');
+
+# Run a dummy query just to make sure we can read back some data
+$result = $node->safe_psql('postgres', "SELECT count(*) FROM t");
+is($result, '10000', 'ensure checksummed pages can be read back');
+
+done_testing();
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..b7431a7600 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2221,6 +2221,42 @@ sub pg_recvlogical_upto
 	}
 }
 
+=item $node->checksum_enable_offline()
+
+Enable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_enable_offline
+{
+	my ($self) = @_;
+
+	print "# Enabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-e');
+	print "# Checksums enabled\n";
+	return;
+}
+
+=item checksum_disable_offline
+
+Disable data page checksums in an offline cluster with B<pg_checksums>. The
+caller is responsible for ensuring that the cluster is in the right state for
+this operation.
+
+=cut
+
+sub checksum_disable_offline
+{
+	my ($self) = @_;
+
+	print "# Disabling checksums in \"$self->data_dir\"\n";
+	TestLib::system_or_bail('pg_checksums', '-D', $self->data_dir, '-d');
+	print "# Checksums disabled\n";
+	return;
+}
+
 =pod
 
 =back
-- 
2.21.1 (Apple Git-122.3)

#89

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Bruce Momjian (#87)

Re: Online checksums patch - once again

On 11 Feb 2021, at 14:10, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 10, 2021 at 01:26:18PM -0500, Bruce Momjian wrote:

On Wed, Feb 10, 2021 at 03:25:58PM +0100, Magnus Hagander wrote:

A fairly large amount of this complexity comes out of the fact that it
now supports restarting and tracks checksums on a per-table basis. We
skipped this in the original patch for exactly this reason (that's not
to say there isn't a fair amount of complexity even without it, but it
did substantially i increase both the size and the complexity of the
patch), but in the review of that i was specifically asked for having
that added. I personally don't think it's worth that complexity but at
the time that seemed to be a pretty strong argument. So I'm not
entirely sure how to move forward with that...

is your impression that it would still be too complicated, even without that?

I was wondering why this feature has stalled for so long --- now I know.
This does highlight the risk of implementing too many additions to a
feature. I am working against this dynamic in the cluster file
encryption feature I am working on.

Oh, I think another reason this patchset has had problems is related to
something I mentioned in 2018:

/messages/by-id/20180801163613.GA2267@momjian.us

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running. We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution. Forcing a restart is probably part of that
primitive orchestration. We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly. These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

I don't think anyone has done anything wrong --- rather, it is what we
are _trying_ to do that is complex.

Global state changes in a cluster are complicated, and are unlikely to never
not be. By writing patches to attempts such state changes we can see which
pieces of infrastructure we're lacking to reduce complexity. A good example is
the ProcSignalBarrier work that Andres and Robert did, inspired in part by this
patch IIUC. The more we do, the more we learn.

--
Daniel Gustafsson https://vmware.com/

#90

Bruce Momjian

bruce@momjian.us

almost 5 years ago

In reply to: Daniel Gustafsson (#89)

Re: Online checksums patch - once again

On Mon, Feb 15, 2021 at 02:02:02PM +0100, Daniel Gustafsson wrote:

On 11 Feb 2021, at 14:10, Bruce Momjian <bruce@momjian.us> wrote:
I don't think anyone has done anything wrong --- rather, it is what we
are _trying_ to do that is complex.

Global state changes in a cluster are complicated, and are unlikely to never
not be. By writing patches to attempts such state changes we can see which
pieces of infrastructure we're lacking to reduce complexity. A good example is
the ProcSignalBarrier work that Andres and Robert did, inspired in part by this
patch IIUC. The more we do, the more we learn.

Do we support or document the ability to create a standby with checksums
from a primary without it, and is that a better approach?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#91

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Bruce Momjian (#90)

Re: Online checksums patch - once again

On 9 Mar 2021, at 19:12, Bruce Momjian <bruce@momjian.us> wrote:

Since this patch is de-facto rejected I'll mark it withdrawn in the CF app to
save on cfbot bandwidth.

Do we support or document the ability to create a standby with checksums
from a primary without it, and is that a better approach?

Michael Banck started a new thread for that forking off of this one on message
id 8f193f949b39817b9c642623e1fe7ccb94137ce4.camel@credativ.de so it's probably
better to continue the discussion of that over there.

--
Daniel Gustafsson https://vmware.com/