[patch] Fix pg_checksums to allow checking of offline base backup directories

Started by Michael Banckalmost 6 years ago3 messages

michael.banck@credativ.de

almost 6 years ago

1 attachment(s)

Hi,

Right now, pg_checksums cannot check a base backup directory taken by
pg_basebackup:

initdb -k data > /dev/null
pg_ctl -D data -l logfile start > /dev/null
pg_basebackup -D data_backup
pg_checksums -D data_backup
pg_checksums: error: cluster must be shut down

So users need to start and then stop postgres on the base backup
directory in order to run pg_checksums on it. This is due to this check
in pg_checksums.c:

if (ControlFile->state != DB_SHUTDOWNED &&
ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
{
pg_log_error("cluster must be shut down");

I think we can allow checking of base backups if we make sure
backup_label exists in the data directory or am I missing something?
I think we need to have similar checks about pages changed during base
backup, so this patch ignores checksum failures between the checkpoint
LSN and (as a reasonable upper bound) the last LSN of the last existing
transaction log file. If no xlog files exist (the --wal-method=none
case), the last LSN of the checkpoint WAL segment is taken.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachments:

0001-Allow-checking-base-backups-in-pg_checksums.patchtext/x-patch; charset=UTF-8; name=0001-Allow-checking-base-backups-in-pg_checksums.patchDownload

From ac4ba90d319727570be590e1602f2d58f7fe9330 Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.banck@credativ.de>
Date: Mon, 6 Apr 2020 12:36:57 +0200
Subject: [PATCH] Allow checking base backups in pg_checksums.

If the cluster is still up according to pg_control, but a file backup_label
exists in the data directory, assume we are looking at a base backup and check
it just like a regular offline instance.

In order to avoid false-positives for concurrently written pages while taking
the base backup, a new function find_xlog_endptr() is added that searches the
transaction log directory for the last existing WAL file and returns the
start LSN of the next WAL file as an upper bound. Checksum failures in pages
with an LSN between the latest checkpoint and this upper bound are then ignored
as they will be replayed from WAL during startup anyway.

This also Adds a TAP tests for checking base backups with pg_checksums.
---
 src/bin/pg_checksums/pg_checksums.c   | 93 +++++++++++++++++++++++++--
 src/bin/pg_checksums/t/002_actions.pl | 16 ++++-
 2 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 9aa9f756f6..86501bb603 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -35,6 +35,8 @@ static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkPointLSN = InvalidXLogRecPtr;
+static XLogRecPtr insertLimitLSN = InvalidXLogRecPtr;
 
 static char *only_filenode = NULL;
 static bool do_sync = true;
@@ -184,6 +186,48 @@ skipfile(const char *fn)
 	return false;
 }
 
+/*
+ * Try to find the latest WAL file, starting from the latest checkpoint's WAL
+ * file. Returns the first XLogRecPtr for the following WAL file as an upper
+ * bound to the WAL insert record pointer. If no WAL files are found at all,
+ * still return the first XLogRecPtr for the WAL file following the checkpoint
+ * WAL segement according to pg_control.
+ */
+static XLogRecPtr
+find_xlog_endptr(const char *basedir)
+{
+	char		fpath[MAXPGPATH];
+	char		xlogfilename[MAXFNAMELEN];
+	struct stat st;
+	TimeLineID	timeline_id;
+	XLogSegNo	segno;
+	uint32		WalSegSz = ControlFile->xlog_seg_size;
+	XLogRecPtr	EndPtr = InvalidXLogRecPtr;
+
+	/* Get timeline ID and WAL segment size from checkpoint data */
+	timeline_id = ControlFile->checkPointCopy.ThisTimeLineID;
+	XLByteToSeg(ControlFile->checkPointCopy.redo, segno, WalSegSz);
+
+	for (;;)
+	{
+		/* Determine WAL file name an assemble its path */
+		XLogFileName(xlogfilename, timeline_id, segno, WalSegSz);
+		snprintf(fpath, MAXPGPATH, "%s/%s/%s", basedir, XLOGDIR, xlogfilename);
+		if (lstat(fpath, &st) < 0)
+		{
+			/*
+			 * No more WAL files, set EndPtr to start of next WAL
+			 * segment.
+			 */
+			XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, EndPtr);
+			break;
+		}
+		/* Increment WAL segment number */
+		segno++;
+	}
+	return EndPtr;
+}
+
 static void
 scan_file(const char *fn, BlockNumber segmentno)
 {
@@ -236,10 +280,27 @@ scan_file(const char *fn, BlockNumber segmentno)
 		{
 			if (csum != header->pd_checksum)
 			{
-				if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
-					pg_log_error("checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X",
-								 fn, blockno, csum, header->pd_checksum);
-				badblocks++;
+				/*
+				 * If the pg_control state is `in production', we are dealing
+				 * with a base backup. In this case, if the page's LSN is newer
+				 * than the latest checkpoint but older than the last LSN of
+				 * the last existing WAL file, we are looking at a changed page
+				 * which will be fixed by WAL replay, so ignore this checksum
+				 * failure.
+				 */
+				if (ControlFile->state == DB_IN_PRODUCTION &&
+				   (PageGetLSN(buf.data) > checkPointLSN) &&
+				   (PageGetLSN(buf.data) < insertLimitLSN))
+				{
+					pg_log_info("checksum verification failed in file \"%s\", block %u, but skipping block due to LSN", fn, blockno);
+				}
+				else
+				{
+					if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+						pg_log_error("checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X",
+									 fn, blockno, csum, header->pd_checksum);
+					badblocks++;
+				}
 			}
 		}
 		else if (mode == PG_MODE_ENABLE)
@@ -453,6 +514,7 @@ main(int argc, char *argv[])
 	};
 
 	char	   *DataDir = NULL;
+	char		backup_label_path[MAXPGPATH];
 	int			c;
 	int			option_index;
 	bool		crc_ok;
@@ -574,13 +636,30 @@ main(int argc, char *argv[])
 	/*
 	 * Check if cluster is running.  A clean shutdown is required to avoid
 	 * random checksum failures caused by torn pages.  Note that this doesn't
-	 * guard against someone starting the cluster concurrently.
+	 * guard against someone starting the cluster concurrently.  If
+	 * backup_label is present, we are looking at a base backup, so checking
+	 * that is fine.
 	 */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{
-		pg_log_error("cluster must be shut down");
-		exit(1);
+		snprintf(backup_label_path, sizeof(backup_label_path), "%s/%s",
+						 DataDir, "backup_label");
+		if (mode != PG_MODE_CHECK || access(backup_label_path, F_OK) == -1)
+		{
+			pg_log_error("cluster must be shut down");
+			exit(1);
+		}
+		else
+		{
+			pg_log_warning("cluster was not shut down but backup_label exists, assuming a base backup");
+			/*
+			 * Get checkpoint and insert limit LSNs as lower and
+			 * upper bound for base backup WAL records.
+			 */
+			checkPointLSN = ControlFile->checkPoint;
+			insertLimitLSN = find_xlog_endptr(DataDir);
+		}
 	}
 
 	if (ControlFile->data_checksum_version == 0 &&
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 4e4934532a..255b47f420 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 63;
+use Test::More tests => 65;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -178,6 +178,20 @@ $node->start;
 command_fails([ 'pg_checksums', '--check', '-D', $pgdata ],
 	"fails with online cluster");
 
+# Adjust config so that pg_basebackup can run
+open my $conf, '>>', "$pgdata/postgresql.conf";
+print $conf "max_replication_slots = 10\n";
+print $conf "max_wal_senders = 10\n";
+print $conf "wal_level = replica\n";
+close $conf;
+$node->restart;
+
+# Make a base backup of the cluster and verify checksums in it
+my $pgdata_backup = $pgdata."_backup";
+$node->command_ok(['pg_basebackup', '-D', $pgdata_backup, '-c', 'fast', '-X', 'none']);
+command_ok([ 'pg_checksums', '--check', '-D', $pgdata_backup ],
+        "succeeds with base backup");
+
 # Check corruption of table on default tablespace.
 check_relation_corruption($node, 'corrupt1', 'pg_default');
 
-- 
2.20.1

Michael Paquier

michael@paquier.xyz

almost 6 years ago

In reply to: Michael Banck (#1)

Re: [patch] Fix pg_checksums to allow checking of offline base backup directories

On Mon, Apr 06, 2020 at 01:26:17PM +0200, Michael Banck wrote:

I think we can allow checking of base backups if we make sure
backup_label exists in the data directory or am I missing something?
I think we need to have similar checks about pages changed during base
backup, so this patch ignores checksum failures between the checkpoint
LSN and (as a reasonable upper bound) the last LSN of the last existing
transaction log file. If no xlog files exist (the --wal-method=none
case), the last LSN of the checkpoint WAL segment is taken.

Have you considered that backup_label files can exist in the data
directory of a live cluster? That's not the case with pg_basebackup
or non-exclusive backups with the SQL interface, but that's possible
with the SQL interface and an exclusive backup running.

FWIW, my take on this matter is that you should consider checksum
verification as one step to check the sanity of a base backup, meaning
that you have to restore the base backup first, then let it reach its
consistent LSN position, and finally stop the cluster cleanly to make
sure that everything is safely flushed on disk and consistent.
Attempting to verify checksums from a raw base backup would most
likely lead to false positives, and my guess is that your patch has
issues in this area. Hint at quick glance: the code path setting
insertLimitLSN where you actually don't use any APIs from
xlogreader.h.
--
Michael

Michael Banck

michael.banck@credativ.de

almost 6 years ago

In reply to: Michael Paquier (#2)

Re: [patch] Fix pg_checksums to allow checking of offline base backup directories

Hi,

Am Dienstag, den 07.04.2020, 17:07 +0900 schrieb Michael Paquier:

On Mon, Apr 06, 2020 at 01:26:17PM +0200, Michael Banck wrote:

I think we can allow checking of base backups if we make sure
backup_label exists in the data directory or am I missing something?
I think we need to have similar checks about pages changed during base
backup, so this patch ignores checksum failures between the checkpoint
LSN and (as a reasonable upper bound) the last LSN of the last existing
transaction log file. If no xlog files exist (the --wal-method=none
case), the last LSN of the checkpoint WAL segment is taken.

Have you considered that backup_label files can exist in the data
directory of a live cluster? That's not the case with pg_basebackup
or non-exclusive backups with the SQL interface, but that's possible
with the SQL interface and an exclusive backup running.

I see, that's what I was missing. I think it is unfortunate that
pg_control does not record an ongoing (base)backup in the state or
elsewhere. Maybe one could look at the `BACKUP METHOD' field in
backup_label, which is (always?) `pg_start_backup' for the non-exclusive
backup and `streamed' for pg_basebackup.

FWIW, my take on this matter is that you should consider checksum
verification as one step to check the sanity of a base backup, meaning
that you have to restore the base backup first, then let it reach its
consistent LSN position, and finally stop the cluster cleanly to make
sure that everything is safely flushed on disk and consistent.

That's a full restore and it should certainly be encouraged that
organizations do full restore tests regularly, but (not only) if you
have lots of big instances, that is often not the case.

So I think making it easier to check plain base backups would be
helpful, even if some part of recently changed data might not get
checked.

Attempting to verify checksums from a raw base backup would most
likely lead to false positives, and my guess is that your patch has
issues in this area. Hint at quick glance: the code path setting
insertLimitLSN where you actually don't use any APIs from
xlogreader.h.

I evaluated using xlogreader to fetch the BACKUP STOP position from the
WAL but then discarded that for now as possibly being overkill and went
the route of a slightly larger upper bound by taking the following WAL
segment and not the BACKUP STOP position. But I can take a look at
implementing the more fine-grained method if needed.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz