[PATCH] Add pg_disable_checksums() and supporting infrastructure
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.
We do a few things:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).
- Add pg_control support for parsing/displaying the new setting.
- Distinguish between the possible checksum states and be specific about whether we care about
checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().
- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
cluster.
I have *not* changed the default in initdb to enable checksums, but this would be trivial.
Attachments:
disable-checksums.patchapplication/octet-stream; name=disable-checksums.patch; x-unix-mode=0644Download
From 6b5c6001dbf6f34fcc7432e5274f63aef6e5f717 Mon Sep 17 00:00:00 2001
From: David Christensen <david@endpoint.com>
Date: Wed, 15 Feb 2017 15:18:57 -0600
Subject: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.
We do a few things:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).
- Add pg_control support for parsing/displaying the new setting.
- Distinguish between the possible checksum states and be specific about whether we care about
checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().
- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
cluster.
I have *not* changed the default in initdb to enable checksums, but this would be trivial.
---
src/backend/access/transam/xlog.c | 77 ++++++++++++++++++++++++++++++---
src/backend/commands/dbcommands.c | 1 +
src/backend/storage/page/bufpage.c | 6 +--
src/backend/storage/page/checksum.c | 3 ++
src/backend/utils/misc/guc.c | 51 +++++++++++++++++-----
src/bin/pg_controldata/pg_controldata.c | 19 ++++++++
src/bin/pg_resetwal/pg_resetwal.c | 2 +
src/include/access/xlog.h | 7 ++-
src/include/catalog/pg_control.h | 10 ++++-
src/include/catalog/pg_proc.h | 2 +
src/include/storage/checksum.h | 15 +++++++
11 files changed, 169 insertions(+), 24 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f23e108..79c3c17 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4628,8 +4628,7 @@ ReadControlFile(void)
#endif
/* Make the initdb settings visible as GUC variables, too */
- SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
- PGC_INTERNAL, PGC_S_OVERRIDE);
+ data_checksums = ControlFile->data_checksum_state;
}
void
@@ -4685,13 +4684,36 @@ GetSystemIdentifier(void)
}
/*
- * Are checksums enabled for data pages?
+ * Returns the checksum state
*/
+ChecksumState
+DataChecksumsState(void)
+{
+ Assert(ControlFile != NULL);
+ return ControlFile->data_checksum_state;
+}
+
+/*
+ * Are read checksums enabled for data pages?
+ */
+bool
+DataChecksumsEnabledReads(void)
+{
+ Assert(ControlFile != NULL);
+ return (ControlFile->data_checksum_version > 0 &&
+ ControlFile->data_checksum_state >= CHECKSUMS_ENFORCING);
+}
+
+/*
+ * Are write checksums enabled for data pages?
+ */
+
bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledWrites(void)
{
Assert(ControlFile != NULL);
- return (ControlFile->data_checksum_version > 0);
+ return (ControlFile->data_checksum_version > 0 &&
+ (ControlFile->data_checksum_state != CHECKSUMS_DISABLED));
}
/*
@@ -5075,6 +5097,7 @@ BootStrapXLOG(void)
ControlFile->wal_log_hints = wal_log_hints;
ControlFile->track_commit_timestamp = track_commit_timestamp;
ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+ ControlFile->data_checksum_state = bootstrap_data_checksum_version == 0 ? CHECKSUMS_DISABLED : CHECKSUMS_ENFORCING;
/* some additional ControlFile fields are set in WriteControlFile() */
@@ -12016,3 +12039,47 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Disable checksums in a cluster
+ * We might need a "DISABLING" state, but hopefully not
+ */
+Datum
+pg_disable_checksums(PG_FUNCTION_ARGS)
+{
+ bool status = false;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to disable checksums"))));
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ switch (ControlFile->data_checksum_state)
+ {
+ case CHECKSUMS_DISABLED:
+ ereport(NOTICE,
+ (errmsg("checksums already disabled; skipping")));
+ break;
+ case CHECKSUMS_ENABLING:
+ case CHECKSUMS_ENFORCING:
+ case CHECKSUMS_REVALIDATING:
+
+
+ ControlFile->data_checksum_state = CHECKSUMS_DISABLED;
+ /* ControlFile->data_checksum_version = 0; */
+ data_checksums = CHECKSUMS_DISABLED;
+ UpdateControlFile();
+ status = true;
+
+ ereport(NOTICE,
+ (errmsg("disabled checksums")));
+ break;
+
+ }
+ LWLockRelease(ControlFileLock);
+
+ PG_RETURN_BOOL(status);
+}
+
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 1ebacbc..59f19d8 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -56,6 +56,7 @@
#include "storage/ipc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
+#include "storage/checksum.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6fc5fa4..0a0ed51 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
*/
if (!PageIsNew(page))
{
- if (DataChecksumsEnabled())
+ if (DataChecksumsEnabledReads())
{
checksum = pg_checksum_page((char *) page, blkno);
@@ -1145,7 +1145,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
static char *pageCopy = NULL;
/* If we don't need a checksum, just return the passed-in data */
- if (PageIsNew(page) || !DataChecksumsEnabled())
+ if (PageIsNew(page) || !DataChecksumsEnabledWrites())
return (char *) page;
/*
@@ -1172,7 +1172,7 @@ void
PageSetChecksumInplace(Page page, BlockNumber blkno)
{
/* If we don't need a checksum, just return */
- if (PageIsNew(page) || !DataChecksumsEnabled())
+ if (PageIsNew(page) || !DataChecksumsEnabledWrites())
return;
((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/storage/page/checksum.c b/src/backend/storage/page/checksum.c
index b96440a..9fe8959 100644
--- a/src/backend/storage/page/checksum.c
+++ b/src/backend/storage/page/checksum.c
@@ -21,3 +21,6 @@
* that file from the exported Postgres headers. (Compare our CRC code.)
*/
#include "storage/checksum_impl.h"
+
+/* global variable to store global checksum state */
+ChecksumState data_checksums = CHECKSUMS_DISABLED;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5d8fb2e..1b790ca 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -67,6 +67,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
+#include "storage/checksum.h"
#include "storage/dsm_impl.h"
#include "storage/standby.h"
#include "storage/fd.h"
@@ -116,6 +117,7 @@ extern char *default_tablespace;
extern char *temp_tablespaces;
extern bool ignore_checksum_failure;
extern bool synchronize_seqscans;
+extern ChecksumState data_checksums;
#ifdef TRACE_SYNCSCAN
extern bool trace_syncscan;
@@ -190,6 +192,7 @@ static void assign_application_name(const char *newval, void *extra);
static bool check_cluster_name(char **newval, void **extra, GucSource source);
static const char *show_unix_socket_permissions(void);
static const char *show_log_file_mode(void);
+static const char *show_data_checksum_mode(void);
/* Private functions in guc-file.l that need to be called from guc.c */
static ConfigVariable *ProcessConfigFileInternal(GucContext context,
@@ -417,6 +420,17 @@ static const struct config_enum_entry password_encryption_options[] = {
{"no", PASSWORD_TYPE_PLAINTEXT, true},
{"1", PASSWORD_TYPE_MD5, true},
{"0", PASSWORD_TYPE_PLAINTEXT, true},
+};
+
+/*
+ * Although only "on", "off", and "force" are documented, we
+ * accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+ {"disabled", CHECKSUMS_DISABLED, false},
+ {"enabling", CHECKSUMS_ENABLING, false},
+ {"enforcing", CHECKSUMS_ENFORCING, false},
+ {"revalidating", CHECKSUMS_REVALIDATING, false},
{NULL, 0, false}
};
@@ -515,7 +529,6 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int wal_block_size;
-static bool data_checksums;
static int wal_segment_size;
static bool integer_datetimes;
static bool assert_enabled;
@@ -1631,17 +1644,6 @@ static struct config_bool ConfigureNamesBool[] =
},
{
- {"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
- gettext_noop("Shows whether data checksums are turned on for this cluster."),
- NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
- },
- &data_checksums,
- false,
- NULL, NULL, NULL
- },
-
- {
{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
NULL
@@ -3847,6 +3849,18 @@ static struct config_enum ConfigureNamesEnum[] =
},
{
+ {"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the status of data_checksums in this cluster."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+
+ },
+ &data_checksums,
+ CHECKSUMS_DISABLED, data_checksum_options,
+ NULL, NULL, show_data_checksum_mode
+ },
+
+ {
{"force_parallel_mode", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Forces use of parallel query facilities."),
gettext_noop("If possible, run query using a parallel worker and with parallel restrictions.")
@@ -10457,4 +10471,17 @@ show_log_file_mode(void)
return buf;
}
+
+static const char *
+show_data_checksum_mode(void)
+{
+ static char *states[] = {"disabled", "enabling", "enabled", "revalidating", "invalid"};
+ int state = DataChecksumsState();
+
+ if (state < CHECKSUMS_DISABLED || state > CHECKSUMS_REVALIDATING)
+ state = 4; /* invalid */
+
+ return states[state];
+}
+
#include "guc-file.c"
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 20077a6..8563b45 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -67,6 +67,23 @@ dbState(DBState state)
}
static const char *
+checksumState(ChecksumState state)
+{
+ switch (state)
+ {
+ case CHECKSUMS_DISABLED:
+ return _("disabled");
+ case CHECKSUMS_ENABLING:
+ return _("enabling");
+ case CHECKSUMS_ENFORCING:
+ return _("enforcing");
+ case CHECKSUMS_REVALIDATING:
+ return _("revalidating");
+ }
+ return _("unrecognized status code");
+}
+
+static const char *
wal_level_str(WalLevel wal_level)
{
switch (wal_level)
@@ -301,5 +318,7 @@ main(int argc, char *argv[])
(ControlFile->float8ByVal ? _("by value") : _("by reference")));
printf(_("Data page checksum version: %u\n"),
ControlFile->data_checksum_version);
+ printf(_("Data page checksum status: %s\n"),
+ checksumState(ControlFile->data_checksum_state));
return 0;
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 96b7097..a50768f 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -696,6 +696,8 @@ PrintControlValues(bool guessed)
(ControlFile.float8ByVal ? _("by value") : _("by reference")));
printf(_("Data page checksum version: %u\n"),
ControlFile.data_checksum_version);
+ printf(_("Data page checksum state: %u\n"),
+ ControlFile.data_checksum_state);
}
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9f036c7..98c018e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -19,6 +19,7 @@
#include "lib/stringinfo.h"
#include "nodes/pg_list.h"
#include "storage/fd.h"
+#include "storage/checksum.h"
/* Sync methods */
@@ -153,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
* of the bits make it to disk, but the checksum wouldn't match. Also WAL-log
* them if forced by wal_log_hints=on.
*/
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledWrites() || wal_log_hints)
/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -256,7 +257,9 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
extern void UpdateControlFile(void);
extern uint64 GetSystemIdentifier(void);
-extern bool DataChecksumsEnabled(void);
+extern ChecksumState DataChecksumsState(void);
+extern bool DataChecksumsEnabledReads(void);
+extern bool DataChecksumsEnabledWrites(void);
extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 23731e9..db57d62 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -18,10 +18,10 @@
#include "access/xlogdefs.h"
#include "pgtime.h" /* for pg_time_t */
#include "port/pg_crc32c.h"
-
+#include "storage/checksum.h"
/* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION 960
+#define PG_CONTROL_VERSION 1000
/*
* Body of CheckPoint XLOG records. This is declared here because we keep
@@ -225,6 +225,12 @@ typedef struct ControlFileData
/* Are data pages protected by checksums? Zero if no checksum version */
uint32 data_checksum_version;
+ /*
+ * What state the cluster is in for the in-place upgrade of the checksum
+ * setting
+ */
+ ChecksumState data_checksum_state;
+
/* CRC of all above ... MUST BE LAST! */
pg_crc32c crc;
} ControlFileData;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index bb7053a..8c9a891 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3173,6 +3173,8 @@ DESCR("export a snapshot");
DATA(insert OID = 3810 ( pg_is_in_recovery PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_recovery _null_ _null_ _null_ ));
DESCR("true if server is in recovery");
+DATA(insert OID = 3353 ( pg_disable_checksums PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_disable_checksums _null_ _null_ _null_ ));
+DESCR("disables cluster checksums");
DATA(insert OID = 3820 ( pg_last_wal_receive_location PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_last_wal_receive_location _null_ _null_ _null_ ));
DESCR("current wal flush location");
DATA(insert OID = 3821 ( pg_last_wal_replay_location PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_last_wal_replay_location _null_ _null_ _null_ ));
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 682043f..14229cc 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,10 +15,25 @@
#include "storage/block.h"
+
+/*
+ * Checksum state indicator. Note this is stored in pg_control; if you change
+ * it, you must bump PG_CONTROL_VERSION
+ */
+typedef enum ChecksumState
+{
+ CHECKSUMS_DISABLED = 0,
+ CHECKSUMS_ENABLING,
+ CHECKSUMS_ENFORCING,
+ CHECKSUMS_REVALIDATING
+} ChecksumState;
+
+
/*
* Compute the checksum for a Postgres page. The page must be aligned on a
* 4-byte boundary.
*/
extern uint16 pg_checksum_page(char *page, BlockNumber blkno);
+extern ChecksumState data_checksums;
#endif /* CHECKSUM_H */
--
2.8.4 (Apple Git-73)
On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com>
wrote:
Extracted from a larger patch, this patch provides the basic
infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control
fields to support the
necessary multiple states for handling the transitions.We do a few things:
- Change "data_checksums" from a simple boolean to "data_checksum_state",
an enum type for all of
the potentially-required states for this feature (as well as enabling).- Add pg_control support for parsing/displaying the new setting.
- Distinguish between the possible checksum states and be specific about
whether we care about
checksum read failures or write failures at all call sites, turning
DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().- Create a superuser function pg_disable_checksums() to perform the actual
disabling of this in the
cluster.I have *not* changed the default in initdb to enable checksums, but this
would be trivial.
Per the point made by somebody else (I think Simon?) on the other thread, I
think it also needs WAL support. Otherwise you turn it off on the master,
but it remains on on a replica which will cause failures once datablocks
without checksum starts replicating.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:
Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support. Otherwise you turn it off on the master, but it remains on on a replica which will cause failures once datablocks without checksum starts replicating.
Good point; I’ll send a revision soon.
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Feb 17, 2017, at 10:31 AM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Feb 16, 2017 at 9:58 PM, David Christensen <david@endpoint.com> wrote:
Extracted from a larger patch, this patch provides the basic infrastructure for turning data
checksums off in a cluster. This also sets up the necessary pg_control fields to support the
necessary multiple states for handling the transitions.We do a few things:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).- Add pg_control support for parsing/displaying the new setting.
- Distinguish between the possible checksum states and be specific about whether we care about
checksum read failures or write failures at all call sites, turning DataChecksumsEnabled() into two
functions: DataChecksumsEnabledReads() and DataChecksumsEnabledWrites().- Create a superuser function pg_disable_checksums() to perform the actual disabling of this in the
cluster.I have *not* changed the default in initdb to enable checksums, but this would be trivial.
Per the point made by somebody else (I think Simon?) on the other thread, I think it also needs WAL support. Otherwise you turn it off on the master, but it remains on on a replica which will cause failures once datablocks without checksum starts replicating.
Enclosed is a version which supports WAL logging and proper application of the checksum state change. I have verified it works with a replica as far as applying the updated data_checksum_state, though I had to manually call pg_switch_wal() on the master to get it to show up on the replica. (I don’t know if this is a flaw in my patch or just a natural consequence of testing on a low-volume local cluster.)
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171
Attachments:
disable-checksums.patchapplication/octet-stream; name=disable-checksums.patch; x-unix-mode=0644Download
From 5718e5e5e1d40abdcefac99866f907db93989587 Mon Sep 17 00:00:00 2001
From: David Christensen <david@endpoint.com>
Date: Wed, 15 Feb 2017 15:18:57 -0600
Subject: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Extracted from a larger patch, this patch provides the basic infrastructure for turning
data checksums off in a cluster. This also sets up the necessary pg_control fields to
support the necessary multiple states for handling the transitions.
We do a few things:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type
for all of the potentially-required states for this feature (as well as enabling).
- Add pg_control support for parsing/displaying the new setting.
- Distinguish between the possible checksum states and be specific about whether we care
about checksum read failures or write failures at all call sites, turning
DataChecksumsEnabled() into two functions: DataChecksumsEnabledReads() and
DataChecksumsEnabledWrites().
- Create a superuser function pg_disable_checksums() to perform the actual disabling of
this in the cluster.
- Add WAL replay/support of the checksum state.
I have *not* changed the default in initdb to enable checksums, but this would be
trivial.
---
src/backend/access/rmgrdesc/Makefile | 3 +-
src/backend/access/rmgrdesc/checksumdesc.c | 51 ++++++++++
src/backend/access/transam/rmgr.c | 1 +
src/backend/access/transam/xlog.c | 158 ++++++++++++++++++++++++++++-
src/backend/commands/dbcommands.c | 1 +
src/backend/storage/page/bufpage.c | 6 +-
src/backend/storage/page/checksum.c | 3 +
src/backend/utils/misc/guc.c | 51 +++++++---
src/bin/pg_controldata/pg_controldata.c | 19 ++++
src/bin/pg_resetwal/pg_resetwal.c | 2 +
src/bin/pg_waldump/rmgrdesc.c | 1 +
src/include/access/checksumxlog.h | 28 +++++
src/include/access/rmgrlist.h | 1 +
src/include/access/xlog.h | 7 +-
src/include/access/xlog_internal.h | 2 +-
src/include/catalog/pg_control.h | 10 +-
src/include/catalog/pg_proc.h | 2 +
src/include/storage/checksum.h | 15 +++
18 files changed, 335 insertions(+), 26 deletions(-)
create mode 100644 src/backend/access/rmgrdesc/checksumdesc.c
create mode 100644 src/include/access/checksumxlog.h
diff --git a/src/backend/access/rmgrdesc/Makefile b/src/backend/access/rmgrdesc/Makefile
index 5514db1..e07e79d 100644
--- a/src/backend/access/rmgrdesc/Makefile
+++ b/src/backend/access/rmgrdesc/Makefile
@@ -11,6 +11,7 @@ include $(top_builddir)/src/Makefile.global
OBJS = brindesc.o clogdesc.o committsdesc.o dbasedesc.o genericdesc.o \
gindesc.o gistdesc.o hashdesc.o heapdesc.o logicalmsgdesc.o \
mxactdesc.o nbtdesc.o relmapdesc.o replorigindesc.o seqdesc.o \
- smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o
+ smgrdesc.o spgdesc.o standbydesc.o tblspcdesc.o xactdesc.o xlogdesc.o \
+ checksumdesc.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/rmgrdesc/checksumdesc.c b/src/backend/access/rmgrdesc/checksumdesc.c
new file mode 100644
index 0000000..e056f2b
--- /dev/null
+++ b/src/backend/access/rmgrdesc/checksumdesc.c
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumdesc.c
+ * rmgr descriptor routines for commands/checksum.c
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/access/rmgrdesc/checksumdesc.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/checksumxlog.h"
+
+void
+checksum_desc(StringInfo buf, XLogReaderState *record)
+{
+ uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+ /* not sure if we need anything here */
+ return;
+}
+
+const char *
+checksum_identify(uint8 info)
+{
+ const char *id = NULL;
+
+ switch (info & ~XLR_INFO_MASK)
+ {
+ case XLOG_CHECKSUM_DISABLE:
+ id = "DISABLE";
+ break;
+ case XLOG_CHECKSUM_ENABLE:
+ id = "ENABLE";
+ break;
+ case XLOG_CHECKSUM_ENFORCE:
+ id = "ENFORCE";
+ break;
+ case XLOG_CHECKSUM_REVALIDATE:
+ id = "REVALIDATE";
+ break;
+ }
+
+ return id;
+}
+
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index 9368b56..038ca2b 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -7,6 +7,7 @@
*/
#include "postgres.h"
+#include "access/checksumxlog.h"
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/ginxlog.h"
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f23e108..ff6a30f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -23,6 +23,7 @@
#include "access/clog.h"
#include "access/commit_ts.h"
+#include "access/checksumxlog.h"
#include "access/multixact.h"
#include "access/rewriteheap.h"
#include "access/subtrans.h"
@@ -916,6 +917,8 @@ static void WALInsertLockAcquireExclusive(void);
static void WALInsertLockRelease(void);
static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt);
+static void LogChecksumStateChange(ChecksumState state);
+
/*
* Insert an XLOG record represented by an already-constructed chain of data
* chunks. This is a low-level routine; to construct the WAL record header
@@ -4628,8 +4631,7 @@ ReadControlFile(void)
#endif
/* Make the initdb settings visible as GUC variables, too */
- SetConfigOption("data_checksums", DataChecksumsEnabled() ? "yes" : "no",
- PGC_INTERNAL, PGC_S_OVERRIDE);
+ data_checksums = ControlFile->data_checksum_state;
}
void
@@ -4685,13 +4687,36 @@ GetSystemIdentifier(void)
}
/*
- * Are checksums enabled for data pages?
+ * Returns the checksum state
+ */
+ChecksumState
+DataChecksumsState(void)
+{
+ Assert(ControlFile != NULL);
+ return ControlFile->data_checksum_state;
+}
+
+/*
+ * Are read checksums enabled for data pages?
+ */
+bool
+DataChecksumsEnabledReads(void)
+{
+ Assert(ControlFile != NULL);
+ return (ControlFile->data_checksum_version > 0 &&
+ ControlFile->data_checksum_state >= CHECKSUMS_ENFORCING);
+}
+
+/*
+ * Are write checksums enabled for data pages?
*/
+
bool
-DataChecksumsEnabled(void)
+DataChecksumsEnabledWrites(void)
{
Assert(ControlFile != NULL);
- return (ControlFile->data_checksum_version > 0);
+ return (ControlFile->data_checksum_version > 0 &&
+ (ControlFile->data_checksum_state != CHECKSUMS_DISABLED));
}
/*
@@ -5075,6 +5100,7 @@ BootStrapXLOG(void)
ControlFile->wal_log_hints = wal_log_hints;
ControlFile->track_commit_timestamp = track_commit_timestamp;
ControlFile->data_checksum_version = bootstrap_data_checksum_version;
+ ControlFile->data_checksum_state = bootstrap_data_checksum_version == 0 ? CHECKSUMS_DISABLED : CHECKSUMS_ENFORCING;
/* some additional ControlFile fields are set in WriteControlFile() */
@@ -12016,3 +12042,125 @@ XLogRequestWalReceiverReply(void)
{
doRequestWalReceiverReply = true;
}
+
+/*
+ * Disable checksums in a cluster
+ * We might need a "DISABLING" state, but hopefully not
+ */
+Datum
+pg_disable_checksums(PG_FUNCTION_ARGS)
+{
+ bool status = false;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to disable checksums"))));
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ switch (ControlFile->data_checksum_state)
+ {
+ case CHECKSUMS_DISABLED:
+ ereport(NOTICE,
+ (errmsg("checksums already disabled; skipping")));
+ break;
+ case CHECKSUMS_ENABLING:
+ case CHECKSUMS_ENFORCING:
+ case CHECKSUMS_REVALIDATING:
+
+
+ ControlFile->data_checksum_state = CHECKSUMS_DISABLED;
+ /* ControlFile->data_checksum_version = 0; */
+ data_checksums = CHECKSUMS_DISABLED;
+ UpdateControlFile();
+ status = true;
+
+ ereport(NOTICE,
+ (errmsg("disabled checksums")));
+ break;
+
+ }
+ LWLockRelease(ControlFileLock);
+
+ LogChecksumStateChange(CHECKSUMS_DISABLED);
+
+ PG_RETURN_BOOL(status);
+}
+
+
+/*
+ * Log a change in the checksum state
+ */
+void LogChecksumStateChange(ChecksumState state)
+{
+ uint8 message;
+
+ Assert(state >= CHECKSUMS_DISABLED && state <= CHECKSUMS_REVALIDATING);
+ XLogBeginInsert();
+
+ switch (state) {
+ case CHECKSUMS_DISABLED:
+ message = XLOG_CHECKSUM_DISABLE;
+ break;
+ case CHECKSUMS_ENABLING:
+ message = XLOG_CHECKSUM_ENABLE;
+ break;
+ case CHECKSUMS_ENFORCING:
+ message = XLOG_CHECKSUM_ENFORCE;
+ break;
+ case CHECKSUMS_REVALIDATING:
+ message = XLOG_CHECKSUM_REVALIDATE;
+ break;
+ }
+
+ XLogInsert(RM_CHECKSUM_ID, message);
+}
+
+
+/*
+ * Replay the ChecksumState state change
+ *
+ * Essentially we just change the pg_control field, which means we need the
+ * lock and then change the field in question.
+ */
+
+void
+checksum_redo(XLogReaderState *record) {
+ uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+ ChecksumState state, oldState;
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ oldState = ControlFile->data_checksum_state;
+ elog(NOTICE, "checksum_redo: running redo on info %u", info);
+
+ /* TODO: some sanity-checking of the checksum states here? */
+
+ switch (info)
+ {
+ case XLOG_CHECKSUM_DISABLE:
+ state = CHECKSUMS_DISABLED;
+ break;
+ case XLOG_CHECKSUM_ENABLE:
+ state = CHECKSUMS_ENABLING;
+ break;
+ case XLOG_CHECKSUM_ENFORCE:
+ state = CHECKSUMS_ENFORCING;
+ break;
+ case XLOG_CHECKSUM_REVALIDATE:
+ state = CHECKSUMS_REVALIDATING;
+ break;
+ default:
+ LWLockRelease(ControlFileLock);
+ elog(PANIC, "checksum_redo: unknown op code %u", info);
+ break;
+ }
+
+ ControlFile->data_checksum_state = state;
+
+ data_checksums = state;
+
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 1ebacbc..59f19d8 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -56,6 +56,7 @@
#include "storage/ipc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
+#include "storage/checksum.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index 6fc5fa4..0a0ed51 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -93,7 +93,7 @@ PageIsVerified(Page page, BlockNumber blkno)
*/
if (!PageIsNew(page))
{
- if (DataChecksumsEnabled())
+ if (DataChecksumsEnabledReads())
{
checksum = pg_checksum_page((char *) page, blkno);
@@ -1145,7 +1145,7 @@ PageSetChecksumCopy(Page page, BlockNumber blkno)
static char *pageCopy = NULL;
/* If we don't need a checksum, just return the passed-in data */
- if (PageIsNew(page) || !DataChecksumsEnabled())
+ if (PageIsNew(page) || !DataChecksumsEnabledWrites())
return (char *) page;
/*
@@ -1172,7 +1172,7 @@ void
PageSetChecksumInplace(Page page, BlockNumber blkno)
{
/* If we don't need a checksum, just return */
- if (PageIsNew(page) || !DataChecksumsEnabled())
+ if (PageIsNew(page) || !DataChecksumsEnabledWrites())
return;
((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, blkno);
diff --git a/src/backend/storage/page/checksum.c b/src/backend/storage/page/checksum.c
index b96440a..9fe8959 100644
--- a/src/backend/storage/page/checksum.c
+++ b/src/backend/storage/page/checksum.c
@@ -21,3 +21,6 @@
* that file from the exported Postgres headers. (Compare our CRC code.)
*/
#include "storage/checksum_impl.h"
+
+/* global variable to store global checksum state */
+ChecksumState data_checksums = CHECKSUMS_DISABLED;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5d8fb2e..1b790ca 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -67,6 +67,7 @@
#include "replication/walreceiver.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
+#include "storage/checksum.h"
#include "storage/dsm_impl.h"
#include "storage/standby.h"
#include "storage/fd.h"
@@ -116,6 +117,7 @@ extern char *default_tablespace;
extern char *temp_tablespaces;
extern bool ignore_checksum_failure;
extern bool synchronize_seqscans;
+extern ChecksumState data_checksums;
#ifdef TRACE_SYNCSCAN
extern bool trace_syncscan;
@@ -190,6 +192,7 @@ static void assign_application_name(const char *newval, void *extra);
static bool check_cluster_name(char **newval, void **extra, GucSource source);
static const char *show_unix_socket_permissions(void);
static const char *show_log_file_mode(void);
+static const char *show_data_checksum_mode(void);
/* Private functions in guc-file.l that need to be called from guc.c */
static ConfigVariable *ProcessConfigFileInternal(GucContext context,
@@ -417,6 +420,17 @@ static const struct config_enum_entry password_encryption_options[] = {
{"no", PASSWORD_TYPE_PLAINTEXT, true},
{"1", PASSWORD_TYPE_MD5, true},
{"0", PASSWORD_TYPE_PLAINTEXT, true},
+};
+
+/*
+ * Although only "on", "off", and "force" are documented, we
+ * accept all the likely variants of "on" and "off".
+ */
+static const struct config_enum_entry data_checksum_options[] = {
+ {"disabled", CHECKSUMS_DISABLED, false},
+ {"enabling", CHECKSUMS_ENABLING, false},
+ {"enforcing", CHECKSUMS_ENFORCING, false},
+ {"revalidating", CHECKSUMS_REVALIDATING, false},
{NULL, 0, false}
};
@@ -515,7 +529,6 @@ static int max_identifier_length;
static int block_size;
static int segment_size;
static int wal_block_size;
-static bool data_checksums;
static int wal_segment_size;
static bool integer_datetimes;
static bool assert_enabled;
@@ -1631,17 +1644,6 @@ static struct config_bool ConfigureNamesBool[] =
},
{
- {"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
- gettext_noop("Shows whether data checksums are turned on for this cluster."),
- NULL,
- GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
- },
- &data_checksums,
- false,
- NULL, NULL, NULL
- },
-
- {
{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
gettext_noop("Add sequence number to syslog messages to avoid duplicate suppression."),
NULL
@@ -3847,6 +3849,18 @@ static struct config_enum ConfigureNamesEnum[] =
},
{
+ {"data_checksums", PGC_INTERNAL, PRESET_OPTIONS,
+ gettext_noop("Shows the status of data_checksums in this cluster."),
+ NULL,
+ GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+
+ },
+ &data_checksums,
+ CHECKSUMS_DISABLED, data_checksum_options,
+ NULL, NULL, show_data_checksum_mode
+ },
+
+ {
{"force_parallel_mode", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Forces use of parallel query facilities."),
gettext_noop("If possible, run query using a parallel worker and with parallel restrictions.")
@@ -10457,4 +10471,17 @@ show_log_file_mode(void)
return buf;
}
+
+static const char *
+show_data_checksum_mode(void)
+{
+ static char *states[] = {"disabled", "enabling", "enabled", "revalidating", "invalid"};
+ int state = DataChecksumsState();
+
+ if (state < CHECKSUMS_DISABLED || state > CHECKSUMS_REVALIDATING)
+ state = 4; /* invalid */
+
+ return states[state];
+}
+
#include "guc-file.c"
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 20077a6..8563b45 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -67,6 +67,23 @@ dbState(DBState state)
}
static const char *
+checksumState(ChecksumState state)
+{
+ switch (state)
+ {
+ case CHECKSUMS_DISABLED:
+ return _("disabled");
+ case CHECKSUMS_ENABLING:
+ return _("enabling");
+ case CHECKSUMS_ENFORCING:
+ return _("enforcing");
+ case CHECKSUMS_REVALIDATING:
+ return _("revalidating");
+ }
+ return _("unrecognized status code");
+}
+
+static const char *
wal_level_str(WalLevel wal_level)
{
switch (wal_level)
@@ -301,5 +318,7 @@ main(int argc, char *argv[])
(ControlFile->float8ByVal ? _("by value") : _("by reference")));
printf(_("Data page checksum version: %u\n"),
ControlFile->data_checksum_version);
+ printf(_("Data page checksum status: %s\n"),
+ checksumState(ControlFile->data_checksum_state));
return 0;
}
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index 96b7097..a50768f 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -696,6 +696,8 @@ PrintControlValues(bool guessed)
(ControlFile.float8ByVal ? _("by value") : _("by reference")));
printf(_("Data page checksum version: %u\n"),
ControlFile.data_checksum_version);
+ printf(_("Data page checksum state: %u\n"),
+ ControlFile.data_checksum_state);
}
diff --git a/src/bin/pg_waldump/rmgrdesc.c b/src/bin/pg_waldump/rmgrdesc.c
index 852d8ca..e1988bc 100644
--- a/src/bin/pg_waldump/rmgrdesc.c
+++ b/src/bin/pg_waldump/rmgrdesc.c
@@ -9,6 +9,7 @@
#include "postgres.h"
#include "access/brin_xlog.h"
+#include "access/checksumxlog.h"
#include "access/clog.h"
#include "access/commit_ts.h"
#include "access/generic_xlog.h"
diff --git a/src/include/access/checksumxlog.h b/src/include/access/checksumxlog.h
new file mode 100644
index 0000000..1488b9c
--- /dev/null
+++ b/src/include/access/checksumxlog.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * checksumxlog.h
+ * checksum xlog routines
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/checksumxlog.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CHECKSUM_XLOG_H
+#define CHECKSUM_XLOG_H
+
+#include "access/xlogreader.h"
+#include "lib/stringinfo.h"
+
+#define XLOG_CHECKSUM_DISABLE 0x00
+#define XLOG_CHECKSUM_ENABLE 0x10
+#define XLOG_CHECKSUM_ENFORCE 0x20
+#define XLOG_CHECKSUM_REVALIDATE 0x30
+
+extern void checksum_redo(XLogReaderState *record);
+extern void checksum_desc(StringInfo buf, XLogReaderState *record);
+extern const char *checksum_identify(uint8 info);
+
+#endif
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
index b892aea..87e8a76 100644
--- a/src/include/access/rmgrlist.h
+++ b/src/include/access/rmgrlist.h
@@ -47,3 +47,4 @@ PG_RMGR(RM_COMMIT_TS_ID, "CommitTs", commit_ts_redo, commit_ts_desc, commit_ts_i
PG_RMGR(RM_REPLORIGIN_ID, "ReplicationOrigin", replorigin_redo, replorigin_desc, replorigin_identify, NULL, NULL, NULL)
PG_RMGR(RM_GENERIC_ID, "Generic", generic_redo, generic_desc, generic_identify, NULL, NULL, generic_mask)
PG_RMGR(RM_LOGICALMSG_ID, "LogicalMessage", logicalmsg_redo, logicalmsg_desc, logicalmsg_identify, NULL, NULL, NULL)
+PG_RMGR(RM_CHECKSUM_ID, "Checksum", checksum_redo, checksum_desc, checksum_identify, NULL, NULL, NULL)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 9f036c7..98c018e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -19,6 +19,7 @@
#include "lib/stringinfo.h"
#include "nodes/pg_list.h"
#include "storage/fd.h"
+#include "storage/checksum.h"
/* Sync methods */
@@ -153,7 +154,7 @@ extern PGDLLIMPORT int wal_level;
* of the bits make it to disk, but the checksum wouldn't match. Also WAL-log
* them if forced by wal_log_hints=on.
*/
-#define XLogHintBitIsNeeded() (DataChecksumsEnabled() || wal_log_hints)
+#define XLogHintBitIsNeeded() (DataChecksumsEnabledWrites() || wal_log_hints)
/* Do we need to WAL-log information required only for Hot Standby and logical replication? */
#define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_REPLICA)
@@ -256,7 +257,9 @@ extern char *XLogFileNameP(TimeLineID tli, XLogSegNo segno);
extern void UpdateControlFile(void);
extern uint64 GetSystemIdentifier(void);
-extern bool DataChecksumsEnabled(void);
+extern ChecksumState DataChecksumsState(void);
+extern bool DataChecksumsEnabledReads(void);
+extern bool DataChecksumsEnabledWrites(void);
extern XLogRecPtr GetFakeLSNForUnloggedRel(void);
extern Size XLOGShmemSize(void);
extern void XLOGShmemInit(void);
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index 578bff5..d2aba7d 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -31,7 +31,7 @@
/*
* Each page of XLOG file has a header like this:
*/
-#define XLOG_PAGE_MAGIC 0xD095 /* can be used as WAL version indicator */
+#define XLOG_PAGE_MAGIC 0xD096 /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData
{
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 23731e9..db57d62 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -18,10 +18,10 @@
#include "access/xlogdefs.h"
#include "pgtime.h" /* for pg_time_t */
#include "port/pg_crc32c.h"
-
+#include "storage/checksum.h"
/* Version identifier for this pg_control format */
-#define PG_CONTROL_VERSION 960
+#define PG_CONTROL_VERSION 1000
/*
* Body of CheckPoint XLOG records. This is declared here because we keep
@@ -225,6 +225,12 @@ typedef struct ControlFileData
/* Are data pages protected by checksums? Zero if no checksum version */
uint32 data_checksum_version;
+ /*
+ * What state the cluster is in for the in-place upgrade of the checksum
+ * setting
+ */
+ ChecksumState data_checksum_state;
+
/* CRC of all above ... MUST BE LAST! */
pg_crc32c crc;
} ControlFileData;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index bb7053a..8c9a891 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3173,6 +3173,8 @@ DESCR("export a snapshot");
DATA(insert OID = 3810 ( pg_is_in_recovery PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_is_in_recovery _null_ _null_ _null_ ));
DESCR("true if server is in recovery");
+DATA(insert OID = 3353 ( pg_disable_checksums PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 16 "" _null_ _null_ _null_ _null_ _null_ pg_disable_checksums _null_ _null_ _null_ ));
+DESCR("disables cluster checksums");
DATA(insert OID = 3820 ( pg_last_wal_receive_location PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_last_wal_receive_location _null_ _null_ _null_ ));
DESCR("current wal flush location");
DATA(insert OID = 3821 ( pg_last_wal_replay_location PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_last_wal_replay_location _null_ _null_ _null_ ));
diff --git a/src/include/storage/checksum.h b/src/include/storage/checksum.h
index 682043f..14229cc 100644
--- a/src/include/storage/checksum.h
+++ b/src/include/storage/checksum.h
@@ -15,10 +15,25 @@
#include "storage/block.h"
+
+/*
+ * Checksum state indicator. Note this is stored in pg_control; if you change
+ * it, you must bump PG_CONTROL_VERSION
+ */
+typedef enum ChecksumState
+{
+ CHECKSUMS_DISABLED = 0,
+ CHECKSUMS_ENABLING,
+ CHECKSUMS_ENFORCING,
+ CHECKSUMS_REVALIDATING
+} ChecksumState;
+
+
/*
* Compute the checksum for a Postgres page. The page must be aligned on a
* 4-byte boundary.
*/
extern uint16 pg_checksum_page(char *page, BlockNumber blkno);
+extern ChecksumState data_checksums;
#endif /* CHECKSUM_H */
--
2.8.4 (Apple Git-73)
On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).
Color me skeptical. I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?
If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.
However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default. Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable. I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work. I think it would be a lot more useful
if we had a way to *enable* checksums online. Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Feb 19, 2017, at 5:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 17, 2017 at 2:28 AM, David Christensen <david@endpoint.com> wrote:
- Change "data_checksums" from a simple boolean to "data_checksum_state", an enum type for all of
the potentially-required states for this feature (as well as enabling).Color me skeptical. I don't know what CHECKSUMS_ENABLING,
CHECKSUMS_ENFORCING, and CHECKSUMS_REVALIDATING are intended to
represent -- and there's no comments in the patch explaining it -- but
if we haven't yet written the code to enable checksums, how do we know
for sure which states it will require?If we're going to accept a patch to disable checksums without also
having the capability to enable checksums, I think we should leave out
the speculative elements about what might be needed on the "enable"
side and just implement the minimal "disable" side.However, FWIW, I don't accept that being able to disable checksums
online is a sufficient advance to justify enabling checksums by
default. Tomas had some good points on another thread about what
might be needed to really make that a good choice, and I'm still
skeptical about whether checksums catch any meaningful number of
errors that wouldn't be caught otherwise, and about the degree to
which any complaints it issues are actionable. I'm not really against
this patch on its own merits, but I think it's a small advance in an
area that needs a lot of work. I think it would be a lot more useful
if we had a way to *enable* checksums online. Then people who find
out that checksums exist and want them have an easier way of getting
them, and anyone who uses the functionality in this patch and then
regrets it has a way to get back.
Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.
My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com
I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
Best,
David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/19/17 11:02 AM, David Christensen wrote:
My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com
I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
A couple notes:
- AFAIK unlogged tables get checksummed today if checksums are enabled;
the same should hold true if someone enables checksums on the whole cluster.
- Shared relations should be handled as well; you don't mention them.
- If an entire cluster is going to be considered as checksummed, then
even databases that don't allow connections would need to get enabled.
I like the idea of revalidation, but I'd suggest leaving that off of the
first pass.
It might be easier on a first pass to look at supporting per-database
checksums (in this case, essentially treating shared catalogs as their
own database). All normal backends do per-database stuff (such as
setting current_database) during startup anyway. That doesn't really
help for things like recovery and replication though. :/ And there's
still the question of SLRUs (or are those not checksum'd today??).
BTW, it occurs to me that this is related to the problem we have with
trying to make changes that break page binary compatibility. If we had a
method for handling that it would probably be useful for enabling
checksums as well. You'd essentially treat an un-checksum'd page as if
it was an "old page version". The biggest problem there is dealing with
the potential that the new page needs to be larger than the old one was,
but maybe there's some useful progress to be had in this area before
tackling the "page too small" problem.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Feb 19, 2017, at 8:14 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
On 2/19/17 11:02 AM, David Christensen wrote:
My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com
I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
A couple notes:
- AFAIK unlogged tables get checksummed today if checksums are enabled; the same should hold true if someone enables checksums on the whole cluster.
Agreed; AFAIK this should already be working if it’s using the PageIsVerified() API, since I just effectively modified the logic there, depending on state.
- Shared relations should be handled as well; you don't mention them.
I agree that they should be handled as well; I thought I had mentioned it later in the design doc, though TBH I’m not sure if there is more involved than just visiting the global relations in pg_class. In addition we need to visit all forks of each relation. I’m not certain if loading those into shared_buffers would be sufficient; I think so, but I’d be glad to be informed otherwise. (As long as they’re already using the PageIsVerified() API call they get this logic for free.
- If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.
Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.
I like the idea of revalidation, but I'd suggest leaving that off of the first pass.
Yeah, agreed.
It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).
So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.
What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.
BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.
I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure. That being said, what I could see being a general case is the piece which basically walks all pages in the cluster; as long as the page checksum/format validation happens at Page read time, we could do page version checks/conversions at the same time, and the only special code we’d need is to keep track of which files still need to be visited and how to minimize the impact on the cluster via some sort of throttling/leveling. (It’s also similar to vacuum in that way, however we have been going out of our way to make vacuum smart enough to *avoid* visiting every page, so I think it is a different enough use case that we can’t tie the two systems together, nor do I feel like taking that project on.)
We could call the checksum_cycle something else (page_walk_cycle? bikeshed time!) and basically have the infrastructure to initiate online/gradual conversion/processing of all pages for free.
Thoughts?
David
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2/20/17 11:22 AM, David Christensen wrote:
- If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.
Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.
The problem with ignoring datallowconn is any database where that's
false is fair game for CREATE DATABASE. I think just enforcing that
everything's connectable is good enough for now.
I like the idea of revalidation, but I'd suggest leaving that off of the first pass.
Yeah, agreed.
It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).
So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.
Something like that, yeah.
What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.
I'm specifically worried about the entire cluster not being complete.
That makes it harder for replicas to know what blocks they can and can't
verify the checksum on.
That *might* still be simpler than trying to handle converting the
entire cluster in one shot. If it's not simpler I certainly wouldn't do
it right now.
BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.
I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure.
Yeah, I was just mentioning it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:
Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.
Ah, sorry, I had missed that patch.
My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com
I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
I think the concept of a checksum cycle might be overkill. It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that. If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.
While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10. We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category. I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high. I would rather wait for v11 and have a little more time to try
to get everything right.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mar 9, 2017, at 1:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 19, 2017 at 12:02 PM, David Christensen <david@endpoint.com> wrote:
Hi Robert, this is part of a larger patch which *does* enable the checksums online; I’ve been extracting the necessary pieces out with the understanding that some people thought the disable code might be useful in its own merit. I can add documentation for the various states. The CHECKSUM_REVALIDATE is the only one I feel is a little wibbly-wobbly; the other states are required because of the fact that enabling checksums requires distinguishing between “in process of enabling” and “everything is enabled”.
Ah, sorry, I had missed that patch.
My design notes for the patch were submitted to the list with little comment; see: /messages/by-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8@endpoint.com
I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
I think the concept of a checksum cycle might be overkill. It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that. If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.
I hear what you are saying, however a boolean on pg_class/pg_database to show if the relation in question is insufficient if we allow arbitrary enabling/disabling while enabling is in progress. In particular, if we disable checksums while the enabling was in progress and had only a boolean to indicate whether the checksums are complete for a relation there will have been a window when new pages in a relation will *not* be checksummed but the system table flag will indicate that the checksum is up-to-date, which is false. This would lead to checksum failures when those pages are encountered. Similar failures will occur as well when doing a pg_upgrade of an in-progress enabling. Saying you can’t disable/cancel the checksum process while it’s running seems undesirable too, as what happens if you have a system failure mid-process.
We could certainly avoid this problem by just saying that we have to start over with the entire cluster and process every page from scratch rather than trying to preserve/track state that we know is good, perhaps only dirtying buffers which have a non-matching checksum while we’re in the conversion state, but this seems undesirable to me, plus if we made it work in a single session we’d have a long-running snapshot open (presumably) which would wreak havoc while it processes the entire database (as someone had suggested by doing it in a single function that just hangs while it’s running)
I think we still need a way to short-circuit the process/incrementally update and note which tables have been checksummed and which ones haven’t. (Perhaps I could make the code which currently checks DataChecksumsEnabled() a little smarter, depending on the relation it’s looking at.)
As per one of Jim’s suggestions, I’ve been looking at making the data_checkum_state localized per database at postinit time, but really it probably doesn’t matter to anything but the buffer code whether it’s a global setting, a per-database setting or what.
While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10. We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category. I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high. I would rather wait for v11 and have a little more time to try
to get everything right.
Makes sense, but to that end I think we need to have at least some agreement on how this should work ahead of time. Obviously it’s easier to critique existing code, but I don’t want to go out in left field of a particular approach is going to be obviously unworkable per some of the more experienced devs here.
--
David Christensen
End Point Corporation
david@endpoint.com
785-727-1171
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers