[PATCH] Verify Checksums during Basebackups

Started by Michael Banckalmost 8 years ago37 messages
#1Michael Banck
michael.banck@credativ.de
1 attachment(s)

Hi,

some installations have data which is only rarerly read, and if they are
so large that dumps are not routinely taken, data corruption would only
be detected with some large delay even with checksums enabled.

The attached small patch verifies checksums (in case they are enabled)
during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

It probably makes sense to use the same approach for determining the
segment numbers as Magnus did in his patch, or refactor that out in a
utility function, but I'm sick right now so wanted to submit this for
v11 first.

I did some light benchmarking and it seems that the performance
degradation is minimal, but this could well be platform or
architecture-dependent. Right now, the checksums are always checked but
maybe this could be made optional, probably by extending the replication
protocol.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Attachments:

0001-basebackup-verify-checksum.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..841471cfce 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -30,6 +31,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -1199,10 +1202,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char		page[BLCKSZ];
 	size_t		pad;
+	PageHeader  phdr;
+	BlockNumber segno = 0;
+	bool		verify_checksum;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1214,10 +1225,54 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				 errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/').
+	 */
+	if (DataChecksumsEnabled() &&
+		(strncmp(readfilename, "./global/", 9) == 0 ||
+		strncmp(readfilename, "./base/", 7) == 0 ||
+		strncmp(readfilename, "/", 1) == 0))
+		verify_checksum = 1;
+	else
+		verify_checksum = 0;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Determine segment number for possible checksum verification, as block
+	 * numbers are ongoing. The initial block number of a multi-segment file is
+	 * the segment number times RELSEG_SIZE.
+	 */
+	filename = basename(readfilename);
+	if (strstr(filename, "."))
+		segno = atoi(strstr(filename, ".")+1);
+	blkno = segno * RELSEG_SIZE;
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ.
+			 */
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+				checksum = pg_checksum_page((char *) page, blkno);
+				phdr = (PageHeader) page;
+				if (phdr->pd_checksum != checksum)
+					ereport(WARNING,
+							(errmsg("checksum mismatch in file \"%s\", block %d: "
+							"expected %x, found %x", readfilename, blkno,
+							checksum, phdr->pd_checksum)));
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..b410311791 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 83;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -317,3 +317,19 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# induce corruption
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	0,
+	[qr{^$}],
+	[qr/^WARNING.*checksum.mismatch/s],
+	'pg_basebackup reports checksum mismatch'
+);
#2David Steele
david@pgmasters.net
In reply to: Michael Banck (#1)
Re: [PATCH] Verify Checksums during Basebackups

On 2/28/18 1:08 PM, Michael Banck wrote:

The attached small patch verifies checksums (in case they are enabled)
during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

+1. I've done some work in this area so I have signed up to review.

--
-David
david@pgmasters.net

#3Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#1)
Re: [PATCH] Verify Checksums during Basebackups

On Wed, Feb 28, 2018 at 7:08 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

some installations have data which is only rarerly read, and if they are
so large that dumps are not routinely taken, data corruption would only
be detected with some large delay even with checksums enabled.

I think this is a very common scenario. Particularly when you take into
account indexes and things like that.

The attached small patch verifies checksums (in case they are enabled)

during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

It probably makes sense to use the same approach for determining the
segment numbers as Magnus did in his patch, or refactor that out in a
utility function, but I'm sick right now so wanted to submit this for
v11 first.

I did some light benchmarking and it seems that the performance
degradation is minimal, but this could well be platform or
architecture-dependent. Right now, the checksums are always checked but
maybe this could be made optional, probably by extending the replication
protocol.

I think it should be.

I think it would also be a good idea to have this a three-mode setting,
with "no check", "check and warning", "check and error". Where "check and
error" should be the default, but you could turn off that in "save whatever
is left mode". But I think it's better if pg_basebackup simply fails on a
checksum error, because that will make it glaringly obvious that there is a
problem -- which is the main point of checksums in the first place. And
then an option to turn it off completely in cases where performance is the
thing.

Another quick note -- we need to assert that the size of the buffer is
actually divisible by BLCKSZ. I don't think it's a common scenario, but it
could break badly if somebody changes BLCKSZ. Either that or perhaps just
change the TARSENDSIZE to be a multiple of BLCKSZ.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#3)
Re: [PATCH] Verify Checksums during Basebackups

On Fri, Mar 2, 2018 at 6:23 AM, Magnus Hagander <magnus@hagander.net> wrote:

Another quick note -- we need to assert that the size of the buffer is
actually divisible by BLCKSZ. I don't think it's a common scenario, but it
could break badly if somebody changes BLCKSZ. Either that or perhaps just
change the TARSENDSIZE to be a multiple of BLCKSZ.

I think that this patch needs to support all block sizes that are
otherwise supported -- failing an assertion doesn't seem like a
reasonable option, unless it only happens for block sizes we don't
support anyway.

+1 for the feature in general. I think this would help a lot of peple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#4)
Re: [PATCH] Verify Checksums during Basebackups

On Fri, Mar 2, 2018 at 7:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 2, 2018 at 6:23 AM, Magnus Hagander <magnus@hagander.net>
wrote:

Another quick note -- we need to assert that the size of the buffer is
actually divisible by BLCKSZ. I don't think it's a common scenario, but

it

could break badly if somebody changes BLCKSZ. Either that or perhaps just
change the TARSENDSIZE to be a multiple of BLCKSZ.

I think that this patch needs to support all block sizes that are
otherwise supported -- failing an assertion doesn't seem like a
reasonable option, unless it only happens for block sizes we don't
support anyway.

That's not what I meant. What I meant is to fail on an assertion if
TARSENDSIZE is not evenly divisible by BLCKSZ. (Or well, maybe not an
assertion, but an actual compile time error). Since BLCKSZ is changed only
at compile time, we can either trap the case already at compile, or just
define it away. But we should handle it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#6Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#3)
Re: [PATCH] Verify Checksums during Basebackups

Greetings Magnus, all,

* Magnus Hagander (magnus@hagander.net) wrote:

I think it would also be a good idea to have this a three-mode setting,
with "no check", "check and warning", "check and error". Where "check and
error" should be the default, but you could turn off that in "save whatever
is left mode". But I think it's better if pg_basebackup simply fails on a
checksum error, because that will make it glaringly obvious that there is a
problem -- which is the main point of checksums in the first place. And
then an option to turn it off completely in cases where performance is the
thing.

When we implemented page-level checksum checking in pgBackRest, David
and I had a good long discussion about exactly this question of "warn"
vs. "error" and came to a different conclusion- you want a backup to
always back up as much as it can even in the face of corruption. If the
user has set up their backups in such a way that they don't see the
warnings being thrown, it's a good bet they won't see failed backups
happening either, in which case they might go from having "mostly" good
backups to not having any. Note that I *do* think a checksum failure
should result in an non-zero exit-code result from pg_basebackup,
indicating that there was something which went wrong.

One difference is that with pgBackRest, we manage the backups and a
backup with page-level checksums isn't considered "valid", so we won't
expire old backups if a new backup has a checksum failure, but I'm not
sure that's really enough to change my mind on if pg_basebackup should
outright fail on a checksum error or if it should throw big warnings but
still try to perform the backup. If the admin sets things up in a way
that a warning and error-exit code from pg_basebackup is ignored and
they still expire out their old backups, then even having an actual
error result wouldn't change that.

As an admin, the first thing I would want in a checksum failure scenario
is a backup of everything, even the blocks which failed (and then a
report of which blocks failed...). I'd rather we think about that
use-case than the use-case where the admin sets up backups in such a way
that they don't see warnings being thrown from the backup.

Thanks!

Stephen

#7Magnus Hagander
magnus@hagander.net
In reply to: Stephen Frost (#6)
Re: [PATCH] Verify Checksums during Basebackups

On Sun, Mar 4, 2018 at 3:49 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings Magnus, all,

* Magnus Hagander (magnus@hagander.net) wrote:

I think it would also be a good idea to have this a three-mode setting,
with "no check", "check and warning", "check and error". Where "check and
error" should be the default, but you could turn off that in "save

whatever

is left mode". But I think it's better if pg_basebackup simply fails on a
checksum error, because that will make it glaringly obvious that there

is a

problem -- which is the main point of checksums in the first place. And
then an option to turn it off completely in cases where performance is

the

thing.

When we implemented page-level checksum checking in pgBackRest, David
and I had a good long discussion about exactly this question of "warn"
vs. "error" and came to a different conclusion- you want a backup to
always back up as much as it can even in the face of corruption. If the
user has set up their backups in such a way that they don't see the
warnings being thrown, it's a good bet they won't see failed backups
happening either, in which case they might go from having "mostly" good
backups to not having any. Note that I *do* think a checksum failure
should result in an non-zero exit-code result from pg_basebackup,
indicating that there was something which went wrong.

I would argue that the likelihood of them seeing an error vs a warning is
orders of magnitude higher.

That said, if we want to exit pg_basebacukp with an exit code but still
complete the backup, that would also be a workable way I guess. But I
strongly feel that we should make pg_basebackup scream at the user and
actually exit with an error -- it's the exit with error that will cause
their cronjobs to fail, and thus somebody notice it.

One difference is that with pgBackRest, we manage the backups and a
backup with page-level checksums isn't considered "valid", so we won't
expire old backups if a new backup has a checksum failure, but I'm not
sure that's really enough to change my mind on if pg_basebackup should
outright fail on a checksum error or if it should throw big warnings but
still try to perform the backup. If the admin sets things up in a way
that a warning and error-exit code from pg_basebackup is ignored and
they still expire out their old backups, then even having an actual
error result wouldn't change that.

There is another important difference as well. In pgBackRest somebody will
have to explicitly enable checksum verification -- which already there
means that they are much more likely to actually check the logs from it.

As an admin, the first thing I would want in a checksum failure scenario

is a backup of everything, even the blocks which failed (and then a
report of which blocks failed...). I'd rather we think about that
use-case than the use-case where the admin sets up backups in such a way
that they don't see warnings being thrown from the backup.

I agree. But I absolutely don't want my backup to be listed as successful,
because then I might expire old ones.

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

That said, it probably still makes sense to implement all modes. Or at
least to implement a "don't bother verifying the checksums" mode. This
mainly controls what the default would be.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#8Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#7)
Re: [PATCH] Verify Checksums during Basebackups

Greetings Magnus, all,

* Magnus Hagander (magnus@hagander.net) wrote:

On Sun, Mar 4, 2018 at 3:49 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Magnus Hagander (magnus@hagander.net) wrote:

I think it would also be a good idea to have this a three-mode setting,
with "no check", "check and warning", "check and error". Where "check and
error" should be the default, but you could turn off that in "save

whatever

is left mode". But I think it's better if pg_basebackup simply fails on a
checksum error, because that will make it glaringly obvious that there

is a

problem -- which is the main point of checksums in the first place. And
then an option to turn it off completely in cases where performance is

the

thing.

When we implemented page-level checksum checking in pgBackRest, David
and I had a good long discussion about exactly this question of "warn"
vs. "error" and came to a different conclusion- you want a backup to
always back up as much as it can even in the face of corruption. If the
user has set up their backups in such a way that they don't see the
warnings being thrown, it's a good bet they won't see failed backups
happening either, in which case they might go from having "mostly" good
backups to not having any. Note that I *do* think a checksum failure
should result in an non-zero exit-code result from pg_basebackup,
indicating that there was something which went wrong.

I would argue that the likelihood of them seeing an error vs a warning is
orders of magnitude higher.

That said, if we want to exit pg_basebacukp with an exit code but still
complete the backup, that would also be a workable way I guess. But I
strongly feel that we should make pg_basebackup scream at the user and
actually exit with an error -- it's the exit with error that will cause
their cronjobs to fail, and thus somebody notice it.

Yes, we need to have it exit with a non-zero exit code, I definitely
agree with that. Any indication that the backup may not be valid should
do that, imv. I don't believe we should just abort the backup and throw
away whatever effort has gone into getting the data thus far and then
leave an incomplete backup in place- someone might think it's not
incomplete.. I certainly hope you weren't thinking that pg_basebackup
would then go through and remove the backup that it had been running
when it reached the checksum failure- that would be a dangerous and
rarely tested code path, after all.

One difference is that with pgBackRest, we manage the backups and a
backup with page-level checksums isn't considered "valid", so we won't
expire old backups if a new backup has a checksum failure, but I'm not
sure that's really enough to change my mind on if pg_basebackup should
outright fail on a checksum error or if it should throw big warnings but
still try to perform the backup. If the admin sets things up in a way
that a warning and error-exit code from pg_basebackup is ignored and
they still expire out their old backups, then even having an actual
error result wouldn't change that.

There is another important difference as well. In pgBackRest somebody will
have to explicitly enable checksum verification -- which already there
means that they are much more likely to actually check the logs from it.

That's actually not correct- we automatically check page-level checksums
when the C library is available (and it's now required as part of 2.0)
and the database has checksums enabled (that's required of both methods,
of course...), so I don't see the difference you're suggesting here.
pgBackRest does have an option to *require* checksum-checking be done,
and one to disable checksumming, but by default it's enabled. See:

https://pgbackrest.org/command.html#command-backup/category-command/option-checksum-page

As an admin, the first thing I would want in a checksum failure scenario
is a backup of everything, even the blocks which failed (and then a
report of which blocks failed...). I'd rather we think about that
use-case than the use-case where the admin sets up backups in such a way
that they don't see warnings being thrown from the backup.

I agree. But I absolutely don't want my backup to be listed as successful,
because then I might expire old ones.

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

Right, that's what we settled on for pgBackRest also and definitely
makes the most sense to me.

That said, it probably still makes sense to implement all modes. Or at
least to implement a "don't bother verifying the checksums" mode. This
mainly controls what the default would be.

Yes, I'm fine with a "don't verify checksums" option, but I believe the
default should be to verify checksums when the database is configured
with them and, on a checksum failure, throw warnings and exit with an
exit-code that's non-zero and means "page-level checksum verification
failed."

Thanks!

Stephen

#9Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#7)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

I had a look at how to go about this, but it appears to be a bit
complicated; the first problem is that sendFile() and sendDir() don't
have status return codes that could be set on checksum verifcation
failure. So I added a global variable and threw an ereport(ERROR) at the
end of perform_base_backup(), but then I realized that `pg_basebackup'
the client program purges the datadir it created if it gets an error:

|pg_basebackup: final receive failed: ERROR: Checksum mismatch during
|basebackup
|
|pg_basebackup: removing data directory "data2"

So I guess this would have to be sent back via the replication protocol,
but I don't see an off-hand way to do this easily?

Another option would be to see whether it is possible to verify the
checksum on the client side, but then only pg_basebackup (and no other
possible external tools using BASE_BACKUP) would profit.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

#10Stephen Frost
sfrost@snowman.net
In reply to: Michael Banck (#9)
Re: [PATCH] Verify Checksums during Basebackups

Michael,

* Michael Banck (michael.banck@credativ.de) wrote:

On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

I had a look at how to go about this, but it appears to be a bit
complicated; the first problem is that sendFile() and sendDir() don't
have status return codes that could be set on checksum verifcation
failure. So I added a global variable and threw an ereport(ERROR) at the
end of perform_base_backup(), but then I realized that `pg_basebackup'
the client program purges the datadir it created if it gets an error:

|pg_basebackup: final receive failed: ERROR: Checksum mismatch during
|basebackup
|
|pg_basebackup: removing data directory "data2"

Oh, ugh.

So I guess this would have to be sent back via the replication protocol,
but I don't see an off-hand way to do this easily?

The final ordinary result set could be extended to include the
information about checksum failures..? I'm a bit concerned about what
to do when there are a lot of checksum failures though.. Ideally, you'd
identify all of the pages in all of the files where a checksum failed
(just throwing an error such as the one proposed above is really rather
terrible since you have no idea what block, or even what table, failed
the checksum...).

I realize this is moving the goalposts a long way, but I had actually
always envisioned having file-by-file pg_basebackup being put in at some
point, instead of tablespace-by-tablespace, which would allow for both
an ordinary result set being returned for each file that could contain
information such as the checksum failure and what pages failed, and be a
stepping stone for parallel pg_basebackup..

Another option would be to see whether it is possible to verify the
checksum on the client side, but then only pg_basebackup (and no other
possible external tools using BASE_BACKUP) would profit.

Reviewing the original patch and considering this issue, I believe there
may be a larger problem- while very unlikely, there's been concern that
it's possible to read a half-written page (and possibly only the second
half) and end up with a checksum failure due to that. In pgBackRest, we
address that by doing another read of the page and by checking the LSN
vs. where we started the backup (if the LSN is more recent than when the
backup started then we don't have to care about the page- it'll be in
the WAL).

If we're going to solve that issue the same way pgBackRest does, then
you'd really have to do it server-side, I'm afraid.. Either that, or
add a way for the client to request individual blocks be re-sent, but
that would be awful difficult for pg_basebackup to update in the
resulting tar file if it was compressed..

Thanks!

Stephen

#11Michael Banck
michael.banck@credativ.de
In reply to: Stephen Frost (#10)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

Am Montag, den 05.03.2018, 06:36 -0500 schrieb Stephen Frost:

Michael,

* Michael Banck (michael.banck@credativ.de) wrote:

On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

I had a look at how to go about this, but it appears to be a bit
complicated; the first problem is that sendFile() and sendDir() don't
have status return codes that could be set on checksum verifcation
failure. So I added a global variable and threw an ereport(ERROR) at the
end of perform_base_backup(), but then I realized that `pg_basebackup'
the client program purges the datadir it created if it gets an error:

pg_basebackup: final receive failed: ERROR: Checksum mismatch during
basebackup

pg_basebackup: removing data directory "data2"

Oh, ugh.

I came up with the attached patch, which sets a checksum_failure
variable in both basebackup.c and pg_basebackup.c, and emits an ereport
with (for now) ERRCODE_DATA_CORRUPTED at the end of
perform_base_backup(), which gets caught in pg_basebackup and then used
to not cleanup the datadir, but exit with a non-zero exit code.

Does that seem feasible?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

checksum_basebackup_error_checking.patchtext/x-patch; charset=UTF-8; name=checksum_basebackup_error_checking.patchDownload
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 361cde3291..11fb009b59 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -100,6 +100,9 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* Whether a checksum failure occured. */
+static bool checksum_failure;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -208,6 +211,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	checksum_failure = false;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -566,6 +571,12 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (checksum_failure == true)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("Checksum mismatch during basebackup\n")));
+
 }
 
 /*
@@ -1295,10 +1306,13 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
 				phdr = (PageHeader) page;
 				if (phdr->pd_checksum != checksum)
+				{
 					ereport(WARNING,
-							(errmsg("checksum mismatch in file \"%s\", block %d: "
-							"expected %x, found %x", readfilename, blkno,
+							(errmsg("checksum mismatch in file \"%s\", segment %d, block %d: "
+							"expected %x, found %x", readfilename, segmentno, blkno,
 							phdr->pd_checksum, checksum)));
+					checksum_failure = true;
+				}
 				blkno++;
 			}
 		}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..e1b364b4ef 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -155,7 +157,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +197,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -1970,8 +1972,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
#12Stephen Frost
sfrost@snowman.net
In reply to: Michael Banck (#11)
Re: [PATCH] Verify Checksums during Basebackups

Michael,

* Michael Banck (michael.banck@credativ.de) wrote:

Am Montag, den 05.03.2018, 06:36 -0500 schrieb Stephen Frost:

* Michael Banck (michael.banck@credativ.de) wrote:

On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:

So sure, if we go with WARNING + exit with an errorcode, that is perhaps
the best combination of the two.

I had a look at how to go about this, but it appears to be a bit
complicated; the first problem is that sendFile() and sendDir() don't
have status return codes that could be set on checksum verifcation
failure. So I added a global variable and threw an ereport(ERROR) at the
end of perform_base_backup(), but then I realized that `pg_basebackup'
the client program purges the datadir it created if it gets an error:

pg_basebackup: final receive failed: ERROR: Checksum mismatch during
basebackup

pg_basebackup: removing data directory "data2"

Oh, ugh.

I came up with the attached patch, which sets a checksum_failure
variable in both basebackup.c and pg_basebackup.c, and emits an ereport
with (for now) ERRCODE_DATA_CORRUPTED at the end of
perform_base_backup(), which gets caught in pg_basebackup and then used
to not cleanup the datadir, but exit with a non-zero exit code.

Does that seem feasible?

Ah, yes, I had thought about using a WARNING or NOTICE or similar also
to pass back the info about the checksum failure during the backup, that
seems like it would work as long as pg_basebackup captures that
information and puts it into a log or on stdout or similar.

I'm a bit on the fence about if we shouldn't just have pg_basebackup
always return a non-zero exit code on a WARNING being seen during the
backup instead. Given that there's a pretty clear SQL code for this
case, perhaps throwing an ERROR and then checking the SQL code isn't
an issue though.

Thanks!

Stephen

#13David Steele
david@pgmasters.net
In reply to: Stephen Frost (#10)
Re: [PATCH] Verify Checksums during Basebackups

Hi Michael,

On 3/5/18 6:36 AM, Stephen Frost wrote:

* Michael Banck (michael.banck@credativ.de) wrote:

So I guess this would have to be sent back via the replication protocol,
but I don't see an off-hand way to do this easily?

The final ordinary result set could be extended to include the
information about checksum failures..? I'm a bit concerned about what
to do when there are a lot of checksum failures though.. Ideally, you'd
identify all of the pages in all of the files where a checksum failed
(just throwing an error such as the one proposed above is really rather
terrible since you have no idea what block, or even what table, failed
the checksum...).

I agree that knowing the name of the file that failed validation is
really important, with a list of the pages that failed validation being
a nice thing to have as well, though I would be fine having the latter
added in a future version.

For instance, in pgBackRest we output validation failures this way:

[from a regression test]
WARN: invalid page checksums found in file
[TEST_PATH]/db-primary/db/base/base/32768/33001 at pages 0, 3-5, 7

Note that we collate ranges of errors to keep the output from being too
overwhelming.

I think the file names are very important because there's a rather large
chance that corruption may happen in an index, unlogged table, or
something else that can be rebuilt or reloaded. Knowing where the
corruption is can save a lot of headaches.

Reviewing the original patch and considering this issue, I believe there
may be a larger problem- while very unlikely, there's been concern that
it's possible to read a half-written page (and possibly only the second
half) and end up with a checksum failure due to that. In pgBackRest, we
address that by doing another read of the page and by checking the LSN
vs. where we started the backup (if the LSN is more recent than when the
backup started then we don't have to care about the page- it'll be in
the WAL).

The need to reread pages can be drastically reduced by skipping
validation of any page that has an LSN >= the backup start LSN because
they will be replayed from WAL during recovery.

The rereads are still necessary because of the possible transposition of
page read vs. page write as Stephen notes above. We have not been able
to reproduce this case but can't discount it.

Regards,
--
-David
david@pgmasters.net

#14Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#1)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

Am Mittwoch, den 28.02.2018, 19:08 +0100 schrieb Michael Banck:

some installations have data which is only rarerly read, and if they are
so large that dumps are not routinely taken, data corruption would only
be detected with some large delay even with checksums enabled.

The attached small patch verifies checksums (in case they are enabled)
during a basebackup. The rationale is that we are reading every block in
this case anyway, so this is a good opportunity to check them as well.
Other and complementary ways of checking the checksums are possible of
course, like the offline checking tool that Magnus just submitted.

I've attached a second version of this patch. Changes are:

1. I've included some code from Magnus' patch, notably the way the
segment numbers are determined and the skipfile() function, along with
the array of files to skip.

2. I am now checking the LSN in the pageheader and compare it against
the LSN of the basebackup start, so that no checksums are verified for
pages changed after basebackup start. I am not sure whether this
addresses all concerns by Stephen and David, as I am not re-reading the
page on a checksum mismatch as they are doing in pgbackrest.

3. pg_basebackup now exits with 1 if a checksum mismatch occured, but it
keeps the data around.

4. I added an Assert() that the TAR_SEND_SIZE is a multiple of BLCKSZ.
AFAICT we support block sizes of 1, 2, 4, 8, 16 and 32 KB, while
TAR_SEND_SIZE is set to 32 KB, so this should be fine unless somebody
mucks around with BLCKSZ manually, in which case the Assert should fire.
I compiled --with-blocksize=32 and checked that this still works as
intended.

5. I also check that the buffer we read is a multiple of BLCKSZ. If that
is not the case I emit a WARNING that the checksum cannot be checked and
pg_basebackup will exit with 1 as well.

This is how it looks like right now from pg_basebackup's POV:

postgres@fock:~$ initdb -k --pgdata=data1 1> /dev/null

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
postgres@fock:~$ pg_ctl --pgdata=data1 --log=pg1.log start
waiting for server to start.... done
server started
postgres@fock:~$ psql -h /tmp -c "SELECT pg_relation_filepath('pg_class')"
pg_relation_filepath
----------------------
base/12367/1259
(1 row)

postgres@fock:~$ echo -n "Bang!" | dd conv=notrunc oflag=seek_bytes seek=4000 bs=9 count=1 of=data1/base/12367/1259
0+1 records in
0+1 records out
5 bytes copied, 3.7487e-05 s, 133 kB/s
postgres@fock:~$ pg_basebackup --pgdata=data2 -h /tmp
WARNING: checksum mismatch in file "./base/12367/1259", segment 0, block 0: expected CC05, found CA4D
pg_basebackup: checksum error occured
postgres@fock:~$ echo $?
1
postgres@fock:~$ ls data2
backup_label pg_dynshmem pg_multixact pg_snapshots pg_tblspc pg_xact
base pg_hba.conf pg_notify pg_stat pg_twophase postgresql.auto.conf
global pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf
pg_commit_ts pg_logical pg_serial pg_subtrans pg_wal
postgres@fock:~$

Possibly open questions:

1. I have not so far changed the replication protocol to make verifying
checksums optional. I can go about that next if the consensus is that we
need such an option (and cannot just check it everytime)?

2. The isolation tester test uses dd (similar to the above), is that
allowed, or do I have to come up with some internal Perl thing that also
works on Windows?

3. I am using basename() to get the filename, I haven't seen that used a
lot in the codebase (nor did I find an obvious internal implementation),
is that fine?

Cheers,

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum-V2.patchtext/x-patch; charset=UTF-8; name=basebackup-verify-checksum-V2.patchDownload
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..8e84c7c38e 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -30,6 +31,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -75,6 +78,9 @@ static bool backup_started_in_recovery = false;
 /* Relative path of temporary statistics directory */
 static char *statrelpath = NULL;
 
+/* The starting XLOG position of the base backup */
+static XLogRecPtr startptr;
+
 /*
  * Size of each block sent into the tar stream for larger files.
  */
@@ -97,6 +103,9 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* Whether a checksum failure occured. */
+static bool checksum_failure;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -189,7 +198,6 @@ base_backup_cleanup(int code, Datum arg)
 static void
 perform_base_backup(basebackup_options *opt)
 {
-	XLogRecPtr	startptr;
 	TimeLineID	starttli;
 	XLogRecPtr	endptr;
 	TimeLineID	endtli;
@@ -205,6 +213,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	checksum_failure = false;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -563,6 +573,12 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (checksum_failure == true)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("checksum mismatch during basebackup")));
+
 }
 
 /*
@@ -1185,6 +1201,28 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
  * Copied from pg_dump, but modified to work with libpq for sending
  */
 
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
 
 /*
  * Given the member, write the TAR header & send the file.
@@ -1199,10 +1237,19 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char		page[BLCKSZ];
 	size_t		pad;
+	PageHeader  phdr;
+	int			segmentno = 0;
+	char	   *segmentpath;
+	bool		verify_checksum;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1214,10 +1261,87 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				 errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/') and is not in the list of non-heap files to be
+	 * skipped.
+	 */
+	filename = basename(pstrdup(readfilename));
+	if (DataChecksumsEnabled() &&
+		!skipfile(filename) &&
+		(strncmp(readfilename, "./global/", 9) == 0 ||
+		strncmp(readfilename, "./base/", 7) == 0 ||
+		strncmp(readfilename, "/", 1) == 0))
+		verify_checksum = true;
+	else
+		verify_checksum = false;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Cut off at the segment boundary (".") to get the segment number in order
+	 * to mix it into the checksum.
+	 */
+	segmentpath = strstr(filename, ".");
+	if (verify_checksum && segmentpath != NULL)
+	{
+		*segmentpath++ = '\0';
+		segmentno = atoi(segmentpath);
+		if (segmentno == 0)
+			ereport(ERROR,
+					(errmsg("invalid segment number %d in filename %s\n",
+					segmentno, filename)));
+	}
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ, after making sure that
+			 * TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read
+			 * a multiple of BLCKSZ bytes.
+			 */
+			Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+			if (cnt % BLCKSZ != 0)
+			{
+				ereport(WARNING,
+						(errmsg("cannot verify checksum in file \"%s\", block "
+								"%d: read buffer size %d and page size %d "
+								"differ",
+								readfilename, blkno, (int)cnt, BLCKSZ)));
+				checksum_failure = true;
+				verify_checksum = false;
+				continue;
+			}
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+				/*
+				 * Only check pages which have not been modified since the
+				 * start of the base backup.
+				 */
+				if (PageGetLSN(page) < startptr)
+				{
+					checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
+					phdr = (PageHeader) page;
+					if (phdr->pd_checksum != checksum)
+					{
+						ereport(WARNING,
+								(errmsg("checksum mismatch in file \"%s\", "
+										"block %d: expected %X, found %X",
+										readfilename, blkno, phdr->pd_checksum,
+										checksum)));
+						checksum_failure = true;
+					}
+				}
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..e1b364b4ef 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -155,7 +157,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +197,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -1970,8 +1972,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..60561362e1 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 83;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -317,3 +317,19 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# induce corruption
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum.mismatch/s],
+	'pg_basebackup reports checksum mismatch'
+);
#15Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#14)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:

Possibly open questions:

1. I have not so far changed the replication protocol to make verifying
checksums optional. I can go about that next if the consensus is that we
need such an option (and cannot just check it everytime)?

I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum-V3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 4fd61d7c2d..9c907db262 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2382,7 +2382,7 @@ The commands accepted in replication mode are:
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2481,6 +2481,17 @@ The commands accepted in replication mode are:
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+        <listitem>
+         <para>
+          By default, checksums are verified during a base backup if they are
+          enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+          this verification.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e8921b1bb4..95045669c9 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -506,6 +506,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-k</option></term>
+      <term><option>--no-verify-checksums</option></term>
+      <listitem>
+       <para>
+        Disables verification of checksums, if they are enabled on the server
+        the base backup is taken from. 
+       </para>
+       <para>
+        By default, checksums are verified and checksum failures will result in
+        a non-zero exit status. However, the base backup will not be removed in
+        this case, as if the <literal>--no-clean</literal> option was used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..9f735a2c07 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -30,6 +31,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -97,6 +100,15 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* The starting XLOG position of the base backup. */
+static XLogRecPtr startptr;
+
+/* Whether a checksum failure occured. */
+static bool checksum_failure;
+
+/* Do not verify checksums. */
+static bool noverify_checksums = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -189,7 +201,6 @@ base_backup_cleanup(int code, Datum arg)
 static void
 perform_base_backup(basebackup_options *opt)
 {
-	XLogRecPtr	startptr;
 	TimeLineID	starttli;
 	XLogRecPtr	endptr;
 	TimeLineID	endtli;
@@ -205,6 +216,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	checksum_failure = false;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -563,6 +576,12 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (checksum_failure == true)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("checksum mismatch during basebackup")));
+
 }
 
 /*
@@ -592,6 +611,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_wal = false;
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
+	bool		o_noverify_checksums = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -671,6 +691,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->sendtblspcmapfile = true;
 			o_tablespace_map = true;
 		}
+		else if (strcmp(defel->defname, "noverify_checksums") == 0)
+		{
+			if (o_noverify_checksums)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			noverify_checksums = true;
+			o_noverify_checksums = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1185,6 +1214,28 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
  * Copied from pg_dump, but modified to work with libpq for sending
  */
 
+static const char *skip[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+skipfile(char *fn)
+{
+	const char **f;
+
+	if (strcmp(fn, ".") == 0 ||
+		strcmp(fn, "..") == 0)
+		return true;
+
+	for (f = skip; *f; f++)
+		if (strcmp(*f, fn) == 0)
+			return true;
+	return false;
+}
 
 /*
  * Given the member, write the TAR header & send the file.
@@ -1199,10 +1250,19 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char		page[BLCKSZ];
 	size_t		pad;
+	PageHeader  phdr;
+	int			segmentno = 0;
+	char	   *segmentpath;
+	bool		verify_checksum = false;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1214,10 +1274,86 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				 errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/') and is not in the list of non-heap files to be
+	 * skipped.
+	 */
+	filename = basename(pstrdup(readfilename));
+	if (!noverify_checksums && DataChecksumsEnabled() &&
+		!skipfile(filename) &&
+		(strncmp(readfilename, "./global/", 9) == 0 ||
+		strncmp(readfilename, "./base/", 7) == 0 ||
+		strncmp(readfilename, "/", 1) == 0))
+		verify_checksum = true;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Cut off at the segment boundary (".") to get the segment number in order
+	 * to mix it into the checksum.
+	 */
+	segmentpath = strstr(filename, ".");
+	if (verify_checksum && segmentpath != NULL)
+	{
+		*segmentpath++ = '\0';
+		segmentno = atoi(segmentpath);
+		if (segmentno == 0)
+			ereport(ERROR,
+					(errmsg("invalid segment number %d in filename %s\n",
+					segmentno, filename)));
+	}
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ, after making sure that
+			 * TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read
+			 * a multiple of BLCKSZ bytes.
+			 */
+			Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+			if (cnt % BLCKSZ != 0)
+			{
+				ereport(WARNING,
+						(errmsg("cannot verify checksum in file \"%s\", block "
+								"%d: read buffer size %d and page size %d "
+								"differ",
+								readfilename, blkno, (int)cnt, BLCKSZ)));
+				checksum_failure = true;
+				verify_checksum = false;
+				continue;
+			}
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+				/*
+				 * Only check pages which have not been modified since the
+				 * start of the base backup.
+				 */
+				if (PageGetLSN(page) < startptr)
+				{
+					checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
+					phdr = (PageHeader) page;
+					if (phdr->pd_checksum != checksum)
+					{
+						ereport(WARNING,
+								(errmsg("checksum verification failed in file "
+										"\"%s\", block %d: calculated %X but "
+										"expected %X",
+										readfilename, blkno, checksum,
+										phdr->pd_checksum)));
+						checksum_failure = true;
+					}
+				}
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index beb2c2877b..843a878ff3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -77,6 +77,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_MAX_RATE
 %token K_WAL
 %token K_TABLESPACE_MAP
+%token K_NOVERIFY_CHECKSUMS
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,7 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 
 /*
  * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -208,6 +209,11 @@ base_backup_opt:
 				  $$ = makeDefElem("tablespace_map",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_NOVERIFY_CHECKSUMS
+				{
+				  $$ = makeDefElem("noverify_checksums",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index fb48241e48..7bb501f594 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -92,6 +92,7 @@ PROGRESS			{ return K_PROGRESS; }
 MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
+NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..63ffa76b33 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -95,6 +97,7 @@ static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
+static bool verify_checksums = true;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -155,7 +158,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +198,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -360,6 +363,8 @@ usage(void)
 	printf(_("  -P, --progress         show progress information\n"));
 	printf(_("  -S, --slot=SLOTNAME    replication slot to use\n"));
 	printf(_("      --no-slot          prevent creation of temporary replication slot\n"));
+	printf(_("  -k, --no-verify-checksums\n"
+			 "                         do not verify checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -?, --help             show this help, then exit\n"));
@@ -1808,14 +1813,15 @@ BaseBackup(void)
 	}
 
 	basebkp =
-		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
+		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
 				 escaped_label,
 				 showprogress ? "PROGRESS" : "",
 				 includewal == FETCH_WAL ? "WAL" : "",
 				 fastcheckpoint ? "FAST" : "",
 				 includewal == NO_WAL ? "" : "NOWAIT",
 				 maxrate_clause ? maxrate_clause : "",
-				 format == 't' ? "TABLESPACE_MAP" : "");
+				 format == 't' ? "TABLESPACE_MAP" : "",
+				 verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
@@ -1970,8 +1976,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
@@ -2140,6 +2155,7 @@ main(int argc, char **argv)
 		{"progress", no_argument, NULL, 'P'},
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
+		{"no-verify-checksums", no_argument, NULL, 'k'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -2166,7 +2182,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2308,6 +2324,9 @@ main(int argc, char **argv)
 			case 'P':
 				showprogress = true;
 				break;
+			case 'k':
+				verify_checksums = false;
+				break;
 			default:
 
 				/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..0803b1e702 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 84;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -317,3 +317,24 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# induce corruption
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum.verification.failed/s],
+	'pg_basebackup reports checksum mismatch'
+);
+
+# do not verify checksums, should return ok
+$node->command_ok(
+	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-k' ],
+	'pg_basebackup with -k does not report checksum mismatch');
#16Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#15)
Re: [PATCH] Verify Checksums during Basebackups

On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:

Possibly open questions:

1. I have not so far changed the replication protocol to make verifying
checksums optional. I can go about that next if the consensus is that we
need such an option (and cannot just check it everytime)?

I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached.

Notes:

+ if (checksum_failure == true)

Really, just if(checksum_failure)

+ errmsg("checksum mismatch during basebackup")));

Should be "base backup" in messages.

+static const char *skip[] = {

I think that needs a much better name than just "skip". Skip for what? In
particular since we are just skipping it for checksums, and not for the
actual basebackup, that name is actively misinforming.

+   filename = basename(pstrdup(readfilename));
+   if (!noverify_checksums && DataChecksumsEnabled() &&
+       !skipfile(filename) &&
+       (strncmp(readfilename, "./global/", 9) == 0 ||
+       strncmp(readfilename, "./base/", 7) == 0 ||
+       strncmp(readfilename, "/", 1) == 0))
+       verify_checksum = true;

I would include the checks for global, base etc into the skipfile()
function as well (also renamed).

+                * Only check pages which have not been modified since the
+                * start of the base backup.

I think this needs a description of why, as well (without having read this
thread, this is a pretty subtle case).

+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000',
'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";

This part of the test will surely fail on Windows, not having a /dev/zero.
Can we easily implement this part natively in perl perhaps? Somebody who
knows more about which functionality is OK to use within this system can
perhaps comment?

Most of that stuff is trivial and can be cleaned up at commit time. Do you
want to send an updated patch with a few of those fixes, or should I clean
it?

The test thing is a stopper until we figure that one out though. And while
at it -- it seems we don't have any tests for the checksum feature in
general. It would probably make sense to consider that at the same time as
figuring out the right way to do this one.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#17David Steele
david@pgmasters.net
In reply to: Michael Banck (#15)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi Michael,

On 3/17/18 5:34 PM, Michael Banck wrote:

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:

I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached.

+ memcpy(page, (buf + BLCKSZ * i), BLCKSZ);

Why make a copy here? How about:

char *page = buf + BLCKSZ * i

I know pg_checksum_page manipulates the checksum field but I have found
it to be safe.

+ if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality. It's not
terrible efficient since it rereads the entire buffer for any block
error. A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps. If the block is still bad, report it.
If the block was corrected moved on. If a block was good before but is
bad on retry it can be ignored.

+    ereport(WARNING,
+        (errmsg("checksum verification failed in file "

I'm worried about how verbose this warning could be since there are
131,072 blocks per segment. It's unlikely to have that many block
errors, but users do sometimes put files in PGDATA which look like they
should be validated. Since these warnings all go to the server log it
could get pretty bad.

We should either stop warning after the first failure, or aggregate the
failures for a file into a single message.

Some tests with multi-block errors should be added to test these scenarios.

Thanks,
--
-David
david@pgmasters.net

Attachments:

reread.patchtext/plain; charset=UTF-8; name=reread.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 9f735a2c07..48bacfe5dd 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1258,6 +1258,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 	int			i;
 	pgoff_t		len = 0;
 	char		page[BLCKSZ];
+	int			block_error = -1;
 	size_t		pad;
 	PageHeader  phdr;
 	int			segmentno = 0;
@@ -1328,9 +1329,16 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				verify_checksum = false;
 				continue;
 			}
-			for (i = 0; i < cnt / BLCKSZ; i++)
+
+			/*
+			 * Validate all blocks in the buffer.  If checking after the buffer
+			 * was reread then start at the block that caused the reread.
+             */
+			for (i = block_error == -1 ? 0 : block_error; i < cnt / BLCKSZ; i++)
 			{
+				blkno = len / BLCKSZ + i;
 				memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+
 				/*
 				 * Only check pages which have not been modified since the
 				 * start of the base backup.
@@ -1341,6 +1349,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
+						/*
+						 * If first time encountering an error on this block
+						 * then read the last buffer again to see if the
+						 * checksum is corrected.
+						 */
+						if (block_error == -1)
+						{
+							block_error = i;
+							fseek(fp, -cnt, SEEK_CUR);
+							break;
+						}
+
 						ereport(WARNING,
 								(errmsg("checksum verification failed in file "
 										"\"%s\", block %d: calculated %X but "
@@ -1350,8 +1370,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 						checksum_failure = true;
 					}
 				}
-				blkno++;
+				block_error = -1;
 			}
+
+			/* Read the buffer again if there were checksum errors */
+			if (block_error != -1)
+				continue;
 		}
 
 		/* Send the chunk as a CopyData message */
#18Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#16)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi Magnus,

thanks a lot for looking at my patch!

Am Donnerstag, den 22.03.2018, 15:07 +0100 schrieb Magnus Hagander:

On Sat, Mar 17, 2018 at 10:34 PM, Michael Banck <michael.banck@credativ.de> wrote:

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:

Possibly open questions:

1. I have not so far changed the replication protocol to make verifying
checksums optional. I can go about that next if the consensus is that we
need such an option (and cannot just check it everytime)?

I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached

Notes:

+   if (checksum_failure == true)

Really, just if(checksum_failure)

+           errmsg("checksum mismatch during basebackup")));

Should be "base backup" in messages.

I've changed both.

+static const char *skip[] = {

I think that needs a much better name than just "skip". Skip for what?
In particular since we are just skipping it for checksums, and not for
the actual basebackup, that name is actively misinforming.

I have copied that verbatim from the online checksum patch, but of
course this is in src/backend/replication and not src/bin so warrants
more scrutiny. If you plan to commit both for v11, it might make sense
to have that separated out to a more central place?

But I guess what we mean is a test for "is a heap file". Do you have a
good suggestion where it should end up so that pg_verify_checksums can
use it as well?

In the meantime, I've changed the skip[] array to no_heap_files[] and
the skipfile() function to is_heap_file(), also reversing the logic. If
it helps pg_verify_checksums, we could make is_not_a_heap_file()
instead.

+   filename = basename(pstrdup(readfilename));
+   if (!noverify_checksums && DataChecksumsEnabled() &&
+       !skipfile(filename) &&
+       (strncmp(readfilename, "./global/", 9) == 0 ||
+       strncmp(readfilename, "./base/", 7) == 0 ||
+       strncmp(readfilename, "/", 1) == 0))
+       verify_checksum = true;

I would include the checks for global, base etc into the skipfile()
function as well (also renamed).

Check. I had to change the way (the previous) skipfile() works a bit,
because it was expecting a filename as argument, while we check
pathnames in the above.

+                * Only check pages which have not been modified since the
+                * start of the base backup.

I think this needs a description of why, as well (without having read
this thread, this is a pretty subtle case).

I tried to expand on this some more.

+system_or_bail 'dd', 'conv=notrunc', 'oflag=seek_bytes', 'seek=4000', 'bs=9', 'count=1', 'if=/dev/zero', "of=$pgdata/$pg_class";

This part of the test will surely fail on Windows, not having a
/dev/zero. Can we easily implement this part natively in perl perhaps?

Right, this was one of the open questions. I now came up with a perl 4-
liner that seems to do the trick, but I can't test it on Windows.

Most of that stuff is trivial and can be cleaned up at commit time. Do
you want to send an updated patch with a few of those fixes, or should
I clean it?

I've attached a new patch, but I have not addressed the question whether
skipfile()/is_heap_file() should be moved somewhere else yet.

I found one more cosmetic issue: if there is an external tablespace, and
pg_basebackup encounters corruption, you would get the message
"pg_basebackup: changes to tablespace directories will not be undone"
from cleanup_directories_atexit(), which I now also suppress in case of
checksum failures.

The test thing is a stopper until we figure that one out though. And
while at it -- it seems we don't have any tests for the checksum
feature in general. It would probably make sense to consider that at
the same time as figuring out the right way to do this one.

I don't want to deflect work, but it seems to me the online checksums
patch would be in a better position to generally test checksums while
it's at it. Or did you mean something related to pg_basebackup?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum-V4.patchtext/x-patch; charset=UTF-8; name=basebackup-verify-checksum-V4.patchDownload
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 366b684293..ab9f2fb93a 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2382,7 +2382,7 @@ The commands accepted in replication mode are:
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2481,6 +2481,17 @@ The commands accepted in replication mode are:
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+        <listitem>
+         <para>
+          By default, checksums are verified during a base backup if they are
+          enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+          this verification.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e8921b1bb4..95045669c9 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -506,6 +506,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-k</option></term>
+      <term><option>--no-verify-checksums</option></term>
+      <listitem>
+       <para>
+        Disables verification of checksums, if they are enabled on the server
+        the base backup is taken from. 
+       </para>
+       <para>
+        By default, checksums are verified and checksum failures will result in
+        a non-zero exit status. However, the base backup will not be removed in
+        this case, as if the <literal>--no-clean</literal> option was used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 185f32a5f9..8ea6682a45 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -30,6 +31,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -97,6 +100,15 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* The starting XLOG position of the base backup. */
+static XLogRecPtr startptr;
+
+/* Whether a checksum failure occured. */
+static bool checksum_failure;
+
+/* Do not verify checksums. */
+static bool noverify_checksums = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -189,7 +201,6 @@ base_backup_cleanup(int code, Datum arg)
 static void
 perform_base_backup(basebackup_options *opt)
 {
-	XLogRecPtr	startptr;
 	TimeLineID	starttli;
 	XLogRecPtr	endptr;
 	TimeLineID	endtli;
@@ -205,6 +216,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	checksum_failure = false;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -563,6 +576,12 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (checksum_failure)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("checksum mismatch during base backup")));
+
 }
 
 /*
@@ -592,6 +611,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_wal = false;
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
+	bool		o_noverify_checksums = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -671,6 +691,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->sendtblspcmapfile = true;
 			o_tablespace_map = true;
 		}
+		else if (strcmp(defel->defname, "noverify_checksums") == 0)
+		{
+			if (o_noverify_checksums)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			noverify_checksums = true;
+			o_noverify_checksums = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1179,13 +1208,47 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 	return size;
 }
 
+static const char *no_heap_files[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+is_heap_file(const char *pn)
+{
+	const char **f;
+	const char *fn = basename(pstrdup(pn));
+
+	/* Skip current and parent directory */
+	if (strcmp(pn, ".") == 0 ||
+		strcmp(pn, "..") == 0)
+		return false;
+
+	/* Check that the file is in a tablespace */
+	if (strncmp(pn, "./global/", 9) == 0 ||
+		strncmp(pn, "./base/", 7) == 0 ||
+		strncmp(pn, "/", 1) == 0)
+	{
+		/* Compare file against no_heap_files skiplist */
+		for (f = no_heap_files; *f; f++)
+			if (strcmp(*f, fn) == 0)
+				return false;
+
+		return true;
+	}
+	else
+		return false;
+}
+
 /*****
  * Functions for handling tar file format
  *
  * Copied from pg_dump, but modified to work with libpq for sending
  */
 
-
 /*
  * Given the member, write the TAR header & send the file.
  *
@@ -1199,10 +1262,19 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char		page[BLCKSZ];
 	size_t		pad;
+	PageHeader  phdr;
+	int			segmentno = 0;
+	char	   *segmentpath;
+	bool		verify_checksum = false;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1214,10 +1286,86 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				 errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/') and is not in the list of non-heap files to be
+	 * skipped.
+	 */
+	if (!noverify_checksums && DataChecksumsEnabled() &&
+		is_heap_file(readfilename))
+		verify_checksum = true;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Cut off at the segment boundary (".") to get the segment number in order
+	 * to mix it into the checksum.
+	 */
+	filename = basename(pstrdup(readfilename));
+	segmentpath = strstr(filename, ".");
+	if (verify_checksum && segmentpath != NULL)
+	{
+		*segmentpath++ = '\0';
+		segmentno = atoi(segmentpath);
+		if (segmentno == 0)
+			ereport(ERROR,
+					(errmsg("invalid segment number %d in filename %s\n",
+					segmentno, filename)));
+	}
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ, after making sure that
+			 * TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read
+			 * a multiple of BLCKSZ bytes.
+			 */
+			Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+			if (cnt % BLCKSZ != 0)
+			{
+				ereport(WARNING,
+						(errmsg("cannot verify checksum in file \"%s\", block "
+								"%d: read buffer size %d and page size %d "
+								"differ",
+								readfilename, blkno, (int)cnt, BLCKSZ)));
+				checksum_failure = true;
+				verify_checksum = false;
+				continue;
+			}
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+				/*
+				 * Only check pages which have not been modified since the
+				 * start of the base backup. Otherwise, they might have been
+				 * written only halfway and the checksum would not be valid.
+				 * However, replaying WAL would reinstate the correct page in
+				 * this case.
+				 */
+				if (PageGetLSN(page) < startptr)
+				{
+					checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
+					phdr = (PageHeader) page;
+					if (phdr->pd_checksum != checksum)
+					{
+						ereport(WARNING,
+								(errmsg("checksum verification failed in file "
+										"\"%s\", block %d: calculated %X but "
+										"expected %X",
+										readfilename, blkno, checksum,
+										phdr->pd_checksum)));
+						checksum_failure = true;
+					}
+				}
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index beb2c2877b..843a878ff3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -77,6 +77,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_MAX_RATE
 %token K_WAL
 %token K_TABLESPACE_MAP
+%token K_NOVERIFY_CHECKSUMS
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,7 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 
 /*
  * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -208,6 +209,11 @@ base_backup_opt:
 				  $$ = makeDefElem("tablespace_map",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_NOVERIFY_CHECKSUMS
+				{
+				  $$ = makeDefElem("noverify_checksums",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index fb48241e48..7bb501f594 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -92,6 +92,7 @@ PROGRESS			{ return K_PROGRESS; }
 MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
+NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..39013ff7e0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -95,6 +97,7 @@ static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
+static bool verify_checksums = true;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -155,7 +158,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +198,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -206,7 +209,7 @@ cleanup_directories_atexit(void)
 					progname, xlog_dir);
 	}
 
-	if (made_tablespace_dirs || found_tablespace_dirs)
+	if ((made_tablespace_dirs || found_tablespace_dirs) && !checksum_failure)
 		fprintf(stderr,
 				_("%s: changes to tablespace directories will not be undone\n"),
 				progname);
@@ -360,6 +363,8 @@ usage(void)
 	printf(_("  -P, --progress         show progress information\n"));
 	printf(_("  -S, --slot=SLOTNAME    replication slot to use\n"));
 	printf(_("      --no-slot          prevent creation of temporary replication slot\n"));
+	printf(_("  -k, --no-verify-checksums\n"
+			 "                         do not verify checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -?, --help             show this help, then exit\n"));
@@ -1808,14 +1813,15 @@ BaseBackup(void)
 	}
 
 	basebkp =
-		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
+		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
 				 escaped_label,
 				 showprogress ? "PROGRESS" : "",
 				 includewal == FETCH_WAL ? "WAL" : "",
 				 fastcheckpoint ? "FAST" : "",
 				 includewal == NO_WAL ? "" : "NOWAIT",
 				 maxrate_clause ? maxrate_clause : "",
-				 format == 't' ? "TABLESPACE_MAP" : "");
+				 format == 't' ? "TABLESPACE_MAP" : "",
+				 verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
@@ -1970,8 +1976,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
@@ -2140,6 +2155,7 @@ main(int argc, char **argv)
 		{"progress", no_argument, NULL, 'P'},
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
+		{"no-verify-checksums", no_argument, NULL, 'k'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -2166,7 +2182,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2308,6 +2324,9 @@ main(int argc, char **argv)
 			case 'P':
 				showprogress = true;
 				break;
+			case 'k':
+				verify_checksums = false;
+				break;
 			default:
 
 				/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index cdf4f5be37..d4c47f1377 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 79;
+use Test::More tests => 84;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -317,3 +317,27 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# induce corruption
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+open my $file, '+<', "$pgdata/$pg_class";
+seek($file, 4000, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum.verification.failed/s],
+	'pg_basebackup reports checksum mismatch'
+);
+
+# do not verify checksums, should return ok
+$node->command_ok(
+	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-k' ],
+	'pg_basebackup with -k does not report checksum mismatch');
#19Michael Banck
michael.banck@credativ.de
In reply to: David Steele (#17)
Re: [PATCH] Verify Checksums during Basebackups

Hi David,

thanks for the review!

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:

On 3/17/18 5:34 PM, Michael Banck wrote:

On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
I think most people (including those I had off-list discussions about
this with) were of the opinion that such an option should be there, so I
added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
replication command and also an option -k / --no-verify-checksums to
pg_basebackup to trigger this.

Updated patch attached.

+ memcpy(page, (buf + BLCKSZ * i), BLCKSZ);

Why make a copy here? How about:

char *page = buf + BLCKSZ * i

Right, ok.

I know pg_checksum_page manipulates the checksum field but I have found
it to be safe.

+ if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality. It's not
terrible efficient since it rereads the entire buffer for any block
error. A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps. If the block is still bad, report it.
If the block was corrected moved on. If a block was good before but is
bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?

+    ereport(WARNING,
+        (errmsg("checksum verification failed in file "

I'm worried about how verbose this warning could be since there are
131,072 blocks per segment. It's unlikely to have that many block
errors, but users do sometimes put files in PGDATA which look like they
should be validated. Since these warnings all go to the server log it
could get pretty bad.

We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.

We should either stop warning after the first failure, or aggregate the
failures for a file into a single message.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

#20David Steele
david@pgmasters.net
In reply to: Michael Banck (#19)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi Michael,

On 3/23/18 5:36 AM, Michael Banck wrote:

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:

+ if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality. It's not
terrible efficient since it rereads the entire buffer for any block
error. A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps. If the block is still bad, report it.
If the block was corrected moved on. If a block was good before but is
bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

Yeah, I think I was influenced too much by how pgBackRest does things,
which doesn't work as well here. Attached is a simpler version.

I think it would be very useful if we could come up with a testcase
showing this problem, but I guess this will be quite hard to hit
reproducibly, right?

This was brought up by Robert in [1]/messages/by-id/CA+TgmobHd+-yVJHofSWg=g+=A3EiCN2wsAiEyj7dj1hhevNq9Q@mail.gmail.com when discussing validating
checksums during backup. I don't know of any way to reproduce this
issue but it seems perfectly possible, if highly unlikely.

+    ereport(WARNING,
+        (errmsg("checksum verification failed in file "

I'm worried about how verbose this warning could be since there are
131,072 blocks per segment. It's unlikely to have that many block
errors, but users do sometimes put files in PGDATA which look like they
should be validated. Since these warnings all go to the server log it
could get pretty bad.

We only verify checksums of files in tablespaces, and I don't think
dropping random files in those is supported in any way, as opposed to
maybe the top-level PGDATA directory. So I would say that this is not a
real concern.

Perhaps, but a very corrupt file is still going to spew lots of warnings
into the server log.

We should either stop warning after the first failure, or aggregate the
failures for a file into a single message.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

Regards,
--
-David
david@pgmasters.net

[1]: /messages/by-id/CA+TgmobHd+-yVJHofSWg=g+=A3EiCN2wsAiEyj7dj1hhevNq9Q@mail.gmail.com
/messages/by-id/CA+TgmobHd+-yVJHofSWg=g+=A3EiCN2wsAiEyj7dj1hhevNq9Q@mail.gmail.com

Attachments:

reread-v2.patchtext/plain; charset=UTF-8; name=reread-v2.patch; x-mac-creator=0; x-mac-type=0Download
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 9f735a2c07..fe7285f7a2 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1258,6 +1258,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 	int			i;
 	pgoff_t		len = 0;
 	char		page[BLCKSZ];
+	bool		block_retry = false;
 	size_t		pad;
 	PageHeader  phdr;
 	int			segmentno = 0;
@@ -1341,6 +1342,50 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
+						/*
+						 * Retry the block on the first failure.  It's possible
+						 * that we read the first 4K page of the block just
+						 * before postgres updated the entire block so it ends
+						 * up looking torn to us.  We only need to retry once
+						 * because the LSN should be updated to something we can
+						 * ignore on the next pass.  If the error happens again
+						 * then it is a true validation failure.
+						 */
+						if (block_retry == false)
+						{
+							/* Reread the failed block */
+							if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not reread block %d of file \"%s\": %m",
+										 blkno, readfilename)));
+							}
+
+							if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							/* Set flag so we know a retry was attempted */
+							block_retry = true;
+
+							/* Reset loop to validate the block again */
+							i--;
+							continue;
+						}
+
 						ereport(WARNING,
 								(errmsg("checksum verification failed in file "
 										"\"%s\", block %d: calculated %X but "
@@ -1350,6 +1395,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 						checksum_failure = true;
 					}
 				}
+				block_retry = false;
 				blkno++;
 			}
 		}
#21Michael Banck
michael.banck@credativ.de
In reply to: David Steele (#20)
Re: [PATCH] Verify Checksums during Basebackups

Hi David,

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

On 3/23/18 5:36 AM, Michael Banck wrote:

Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele:

+ if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality. It's not
terrible efficient since it rereads the entire buffer for any block
error. A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps. If the block is still bad, report it.
If the block was corrected moved on. If a block was good before but is
bad on retry it can be ignored.

I have to admit I find it a bit convoluted and non-obvious on first
reading, but I'll try to check it out some more.

Yeah, I think I was influenced too much by how pgBackRest does things,
which doesn't work as well here. Attached is a simpler version.

This looks much cleaner to me, yeah.

I agree that major corruption could make the whole output blow up but I
would prefer to keep this feature simple for now, which implies possibly
printing out a lot of WARNING or maybe just stopping after the first
one (or first few, dunno).

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

#22Michael Banck
michael.banck@credativ.de
In reply to: Michael Banck (#21)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

I tried to have look around and couldn't find any examples, so I'm not
sure that patch should go in. On the other hand, we abort on checksum
failures usually (in pg_dump e.g.), so limiting the number of warnings
does makes sense.

I guess we need to see what others think.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum_suppress_warnings.patchtext/x-patch; charset=UTF-8; name=basebackup-verify-checksum_suppress_warnings.patchDownload
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 8ea6682a45..1cb17636e6 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1265,6 +1265,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 	BlockNumber blkno = 0;
 	char		buf[TAR_SEND_SIZE];
 	uint16		checksum;
+	int			checksum_failures = 0;
 	size_t		cnt;
 	char	   *filename;
 	int			i;
@@ -1353,13 +1354,21 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 					phdr = (PageHeader) page;
 					if (phdr->pd_checksum != checksum)
 					{
-						ereport(WARNING,
-								(errmsg("checksum verification failed in file "
-										"\"%s\", block %d: calculated %X but "
-										"expected %X",
-										readfilename, blkno, checksum,
-										phdr->pd_checksum)));
+						if (checksum_failures <= 5)
+							ereport(WARNING,
+									(errmsg("checksum verification failed in "
+											"file \"%s\", block %d: calculated "
+											"%X but expected %X",
+											readfilename, blkno, checksum,
+											phdr->pd_checksum)));
+						if (checksum_failures == 5)
+							ereport(WARNING,
+									(errmsg("further checksum verification "
+											"failures in file \"%s\" will not "
+											"be reported", readfilename)));
+
 						checksum_failure = true;
+						checksum_failures++;
 					}
 				}
 				blkno++;
@@ -1383,6 +1392,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 			 */
 			break;
 		}
+
 	}
 
 	/* If the file was truncated while we were sending it, pad it with zeros */
@@ -1411,6 +1421,11 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	FreeFile(fp);
 
+	if (checksum_failures > 5)
+		ereport(WARNING,
+				(errmsg("file \"%s\" has a total of %d checksum verification "
+						"failures", readfilename, checksum_failures)));
+
 	return true;
 }
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index d4c47f1377..7955747ae0 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -4,7 +4,7 @@ use Cwd;
 use Config;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 84;
+use Test::More tests => 87;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -337,7 +337,22 @@ $node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
 	'pg_basebackup reports checksum mismatch'
 );
 
+# induce further corruption
+open $file, '+<', "$pgdata/$pg_class";
+for (my $offset = 4000 + 8*1024; $offset < 50000; $offset += 8*1024) {
+  seek($file, $offset, 0);
+  syswrite($file, '\0\0\0\0\0\0\0\0\0');
+}
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
+        1,
+        [qr{^$}],
+        [qr/^WARNING.*further.*failures.will.not.be.reported/s],
+        'pg_basebackup does not report more than 5 checksum mismatches'
+);
+
 # do not verify checksums, should return ok
 $node->command_ok(
-	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt2", '-k' ],
+	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt3", '-k' ],
 	'pg_basebackup with -k does not report checksum mismatch');
#23David Steele
david@pgmasters.net
In reply to: Michael Banck (#22)
Re: [PATCH] Verify Checksums during Basebackups

On 3/24/18 10:32 AM, Michael Banck wrote:

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

I tried to have look around and couldn't find any examples, so I'm not
sure that patch should go in. On the other hand, we abort on checksum
failures usually (in pg_dump e.g.), so limiting the number of warnings
does makes sense.

I guess we need to see what others think.

Well, at this point I would say silence more or less gives consent.

Can you provide a rebased patch with the validation retry and warning
limiting logic added? I would like to take another pass through it but I
think this is getting close.

Thanks,
--
-David
david@pgmasters.net

#24Magnus Hagander
magnus@hagander.net
In reply to: David Steele (#23)
Re: [PATCH] Verify Checksums during Basebackups

On Fri, Mar 30, 2018 at 5:35 AM, David Steele <david@pgmasters.net> wrote:

On 3/24/18 10:32 AM, Michael Banck wrote:

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

I tried to have look around and couldn't find any examples, so I'm not
sure that patch should go in. On the other hand, we abort on checksum
failures usually (in pg_dump e.g.), so limiting the number of warnings
does makes sense.

I guess we need to see what others think.

Well, at this point I would say silence more or less gives consent.

Can you provide a rebased patch with the validation retry and warning
limiting logic added? I would like to take another pass through it but I
think this is getting close.

I was meaning to mention it, but ran out of cycles.

I think this is the right way to do it, except the 5 should be a #define
and not an inline hardcoded value :) We could argue whether it should be "5
total" or "5 per file". When I read the emails I thought it was going to be
5 total, but I see the implementation does 5 / file. In a super-damanged
system that will still lead to horrible amounts of logging, but I think
maybe if your system is in that bad shape, then it's a lost cause anyway.

I also think the "total number of checksum errors" should be logged if
they're >0, not >5. And I think *that* one should be logged at the end of
the entire process, not per file. That'd be the kind of output that would
be the most interesting, I think (e.g. if I have it spread out with 1 block
each across 4 files, I want that logged at the end because it's easy to
otherwise miss one or two of them that may have happened a long time apart).

I don't think we have a good comparison elsewhere, and that is as David
mention because other codepaths fail hard when they run into something like
that. And we explicitly want to *not* fail hard, per previous discussion.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#25Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#24)
Re: [PATCH] Verify Checksums during Basebackups

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

On Fri, Mar 30, 2018 at 5:35 AM, David Steele <david@pgmasters.net> wrote:

On 3/24/18 10:32 AM, Michael Banck wrote:

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

I tried to have look around and couldn't find any examples, so I'm not
sure that patch should go in. On the other hand, we abort on checksum
failures usually (in pg_dump e.g.), so limiting the number of warnings
does makes sense.

I guess we need to see what others think.

Well, at this point I would say silence more or less gives consent.

Can you provide a rebased patch with the validation retry and warning
limiting logic added? I would like to take another pass through it but I
think this is getting close.

I was meaning to mention it, but ran out of cycles.

I think this is the right way to do it, except the 5 should be a #define
and not an inline hardcoded value :) We could argue whether it should be "5
total" or "5 per file". When I read the emails I thought it was going to be
5 total, but I see the implementation does 5 / file. In a super-damanged
system that will still lead to horrible amounts of logging, but I think
maybe if your system is in that bad shape, then it's a lost cause anyway.

5/file seems reasonable to me as well.

I also think the "total number of checksum errors" should be logged if
they're >0, not >5. And I think *that* one should be logged at the end of
the entire process, not per file. That'd be the kind of output that would
be the most interesting, I think (e.g. if I have it spread out with 1 block
each across 4 files, I want that logged at the end because it's easy to
otherwise miss one or two of them that may have happened a long time apart).

I definitely like having a total # of checksum errors included at the
end, if there are any at all. When someone is looking to see why the
process returned a non-zero exit code, they're likely to start looking
at the end of the log, so having that easily available and clear as to
why the backup failed is definitely valuable.

I don't think we have a good comparison elsewhere, and that is as David
mention because other codepaths fail hard when they run into something like
that. And we explicitly want to *not* fail hard, per previous discussion.

Agreed.

Thanks!

Stephen

#26Michael Banck
michael.banck@credativ.de
In reply to: Stephen Frost (#25)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi,

On Fri, Mar 30, 2018 at 07:46:02AM -0400, Stephen Frost wrote:

* Magnus Hagander (magnus@hagander.net) wrote:

On Fri, Mar 30, 2018 at 5:35 AM, David Steele <david@pgmasters.net> wrote:

On 3/24/18 10:32 AM, Michael Banck wrote:

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so there
aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These produce a lot
of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a very
common pattern in Postgres, or might not be project style at all. I
can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion elsewhere?

I tried to have look around and couldn't find any examples, so I'm not
sure that patch should go in. On the other hand, we abort on checksum
failures usually (in pg_dump e.g.), so limiting the number of warnings
does makes sense.

I guess we need to see what others think.

Well, at this point I would say silence more or less gives consent.

Can you provide a rebased patch with the validation retry and warning
limiting logic added? I would like to take another pass through it but I
think this is getting close.

I was meaning to mention it, but ran out of cycles.

I think this is the right way to do it, except the 5 should be a #define
and not an inline hardcoded value :) We could argue whether it should be "5
total" or "5 per file". When I read the emails I thought it was going to be
5 total, but I see the implementation does 5 / file. In a super-damanged
system that will still lead to horrible amounts of logging, but I think
maybe if your system is in that bad shape, then it's a lost cause anyway.

5/file seems reasonable to me as well.

I also think the "total number of checksum errors" should be logged if
they're >0, not >5. And I think *that* one should be logged at the end of
the entire process, not per file. That'd be the kind of output that would
be the most interesting, I think (e.g. if I have it spread out with 1 block
each across 4 files, I want that logged at the end because it's easy to
otherwise miss one or two of them that may have happened a long time apart).

I definitely like having a total # of checksum errors included at the
end, if there are any at all. When someone is looking to see why the
process returned a non-zero exit code, they're likely to start looking
at the end of the log, so having that easily available and clear as to
why the backup failed is definitely valuable.

I don't think we have a good comparison elsewhere, and that is as David
mention because other codepaths fail hard when they run into something like
that. And we explicitly want to *not* fail hard, per previous discussion.

Agreed.

Attached is a new and rebased patch which does the above, plus
integrates the suggested changes by David Steele. The output is now:

$ initdb -k --pgdata=data1 1> /dev/null 2> /dev/null
$ pg_ctl --pgdata=data1 --log=pg1.log start > /dev/null
$ dd conv=notrunc oflag=seek_bytes seek=4000 bs=8 count=1 if=/dev/zero of=data1/base/12374/2610 2> /dev/null
$ for i in 4000 13000 21000 29000 37000 43000; do dd conv=notrunc oflag=seek_bytes seek=$i bs=8 count=1 if=/dev/zero of=data1/base/12374/1259; done 2> /dev/null
$ pg_basebackup -v -h /tmp --pgdata=data2
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000060 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_13882"
WARNING: checksum verification failed in file "./base/12374/2610", block 0: calculated C2C9 but expected EC78
WARNING: checksum verification failed in file "./base/12374/1259", block 0: calculated 8BAE but expected 46B8
WARNING: checksum verification failed in file "./base/12374/1259", block 1: calculated E413 but expected 7701
WARNING: checksum verification failed in file "./base/12374/1259", block 2: calculated 5DA9 but expected D5AA
WARNING: checksum verification failed in file "./base/12374/1259", block 3: calculated 5651 but expected 4F5E
WARNING: checksum verification failed in file "./base/12374/1259", block 4: calculated DF39 but expected DF00
WARNING: further checksum verification failures in file "./base/12374/1259" will not be reported
WARNING: file "./base/12374/1259" has a total of 6 checksum verification failures
WARNING: 7 total checksum verification failures
pg_basebackup: write-ahead log end point: 0/2000130
pg_basebackup: checksum error occured
$ echo $?
1
$ du -s data2
38820 data2

I ommitted the 'file "foo" has a total of X checksum verification
failures' if there is only one, as seen with file "./base/12374/2610"
above. Same with the "X total checksum verification failures" if there
was only one.

Is that ok with everybody?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum-V5.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 8c488506fa..b94dd4ac65 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2382,7 +2382,7 @@ The commands accepted in replication mode are:
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2481,6 +2481,17 @@ The commands accepted in replication mode are:
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+        <listitem>
+         <para>
+          By default, checksums are verified during a base backup if they are
+          enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+          this verification.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e8921b1bb4..95045669c9 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -506,6 +506,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-k</option></term>
+      <term><option>--no-verify-checksums</option></term>
+      <listitem>
+       <para>
+        Disables verification of checksums, if they are enabled on the server
+        the base backup is taken from. 
+       </para>
+       <para>
+        By default, checksums are verified and checksum failures will result in
+        a non-zero exit status. However, the base backup will not be removed in
+        this case, as if the <literal>--no-clean</literal> option was used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 516eea57f8..e2063571c8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -31,6 +32,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -99,6 +102,15 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* The starting XLOG position of the base backup. */
+static XLogRecPtr startptr;
+
+/* Total number of checksum failures during base backup. */
+static int64 total_checksum_failures;
+
+/* Do not verify checksums. */
+static bool noverify_checksums = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -194,7 +206,6 @@ base_backup_cleanup(int code, Datum arg)
 static void
 perform_base_backup(basebackup_options *opt)
 {
-	XLogRecPtr	startptr;
 	TimeLineID	starttli;
 	XLogRecPtr	endptr;
 	TimeLineID	endtli;
@@ -210,6 +221,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	total_checksum_failures = 0;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -568,6 +581,17 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (total_checksum_failures)
+	{
+		if (total_checksum_failures > 1)
+			ereport(WARNING,
+				(errmsg("%ld total checksum verification failures", total_checksum_failures)));
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("checksum verification failure during base backup")));
+	}
+
 }
 
 /*
@@ -597,6 +621,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_wal = false;
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
+	bool		o_noverify_checksums = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -676,6 +701,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->sendtblspcmapfile = true;
 			o_tablespace_map = true;
 		}
+		else if (strcmp(defel->defname, "noverify_checksums") == 0)
+		{
+			if (o_noverify_checksums)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			noverify_checksums = true;
+			o_noverify_checksums = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1257,6 +1291,41 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 	return size;
 }
 
+static const char *no_heap_files[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+is_heap_file(const char *pn)
+{
+	const char **f;
+	const char *fn = basename(pstrdup(pn));
+
+	/* Skip current and parent directory */
+	if (strcmp(pn, ".") == 0 ||
+		strcmp(pn, "..") == 0)
+		return false;
+
+	/* Check that the file is in a tablespace */
+	if (strncmp(pn, "./global/", 9) == 0 ||
+		strncmp(pn, "./base/", 7) == 0 ||
+		strncmp(pn, "/", 1) == 0)
+	{
+		/* Compare file against no_heap_files skiplist */
+		for (f = no_heap_files; *f; f++)
+			if (strcmp(*f, fn) == 0)
+				return false;
+
+		return true;
+	}
+	else
+		return false;
+}
+
 /*****
  * Functions for handling tar file format
  *
@@ -1277,10 +1346,21 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
+	bool		block_retry = false;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
+	int			checksum_failures = 0;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char	   *page;
 	size_t		pad;
+	PageHeader  phdr;
+	int			segmentno = 0;
+	char	   *segmentpath;
+	bool		verify_checksum = false;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1292,10 +1372,137 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 				 errmsg("could not open file \"%s\": %m", readfilename)));
 	}
 
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/') and is not in the list of non-heap files to be
+	 * skipped.
+	 */
+	if (!noverify_checksums && DataChecksumsEnabled() &&
+		is_heap_file(readfilename))
+		verify_checksum = true;
+
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Cut off at the segment boundary (".") to get the segment number in order
+	 * to mix it into the checksum.
+	 */
+	filename = basename(pstrdup(readfilename));
+	segmentpath = strstr(filename, ".");
+	if (verify_checksum && segmentpath != NULL)
+	{
+		*segmentpath++ = '\0';
+		segmentno = atoi(segmentpath);
+		if (segmentno == 0)
+			ereport(ERROR,
+					(errmsg("invalid segment number %d in filename %s\n",
+					segmentno, filename)));
+	}
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ, after making sure that
+			 * TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read
+			 * a multiple of BLCKSZ bytes.
+			 */
+			Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+			if (cnt % BLCKSZ != 0)
+			{
+				ereport(WARNING,
+						(errmsg("cannot verify checksum in file \"%s\", block "
+								"%d: read buffer size %d and page size %d "
+								"differ",
+								readfilename, blkno, (int)cnt, BLCKSZ)));
+				verify_checksum = false;
+				continue;
+			}
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				page = buf + BLCKSZ * i;
+				/*
+				 * Only check pages which have not been modified since the
+				 * start of the base backup. Otherwise, they might have been
+				 * written only halfway and the checksum would not be valid.
+				 * However, replaying WAL would reinstate the correct page in
+				 * this case.
+				 */
+				if (PageGetLSN(page) < startptr)
+				{
+					checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
+					phdr = (PageHeader) page;
+					if (phdr->pd_checksum != checksum)
+					{
+						/*
+						 * Retry the block on the first failure.  It's possible
+						 * that we read the first 4K page of the block just
+						 * before postgres updated the entire block so it ends
+						 * up looking torn to us.  We only need to retry once
+						 * because the LSN should be updated to something we can
+						 * ignore on the next pass.  If the error happens again
+						 * then it is a true validation failure.
+						 */
+						if (block_retry == false)
+						{
+							/* Reread the failed block */
+							if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not reread block %d of file \"%s\": %m",
+										 blkno, readfilename)));
+							}
+
+							if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							/* Set flag so we know a retry was attempted */
+							block_retry = true;
+
+							/* Reset loop to validate the block again */
+							i--;
+							continue;
+						}
+
+						checksum_failures++;
+
+						if (checksum_failures <= 5)
+							ereport(WARNING,
+									(errmsg("checksum verification failed in "
+											"file \"%s\", block %d: calculated "
+											"%X but expected %X",
+											readfilename, blkno, checksum,
+											phdr->pd_checksum)));
+						if (checksum_failures == 5)
+							ereport(WARNING,
+									(errmsg("further checksum verification "
+											"failures in file \"%s\" will not "
+											"be reported", readfilename)));
+					}
+				}
+				block_retry = false;
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
@@ -1313,6 +1520,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 			 */
 			break;
 		}
+
 	}
 
 	/* If the file was truncated while we were sending it, pad it with zeros */
@@ -1341,6 +1549,13 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	FreeFile(fp);
 
+	if (checksum_failures > 1) {
+		ereport(WARNING,
+				(errmsg("file \"%s\" has a total of %d checksum verification "
+						"failures", readfilename, checksum_failures)));
+	}
+	total_checksum_failures += checksum_failures;
+
 	return true;
 }
 
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index beb2c2877b..843a878ff3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -77,6 +77,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_MAX_RATE
 %token K_WAL
 %token K_TABLESPACE_MAP
+%token K_NOVERIFY_CHECKSUMS
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,7 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 
 /*
  * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -208,6 +209,11 @@ base_backup_opt:
 				  $$ = makeDefElem("tablespace_map",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_NOVERIFY_CHECKSUMS
+				{
+				  $$ = makeDefElem("noverify_checksums",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index fb48241e48..7bb501f594 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -92,6 +92,7 @@ PROGRESS			{ return K_PROGRESS; }
 MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
+NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..39013ff7e0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -95,6 +97,7 @@ static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
+static bool verify_checksums = true;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -155,7 +158,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +198,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -206,7 +209,7 @@ cleanup_directories_atexit(void)
 					progname, xlog_dir);
 	}
 
-	if (made_tablespace_dirs || found_tablespace_dirs)
+	if ((made_tablespace_dirs || found_tablespace_dirs) && !checksum_failure)
 		fprintf(stderr,
 				_("%s: changes to tablespace directories will not be undone\n"),
 				progname);
@@ -360,6 +363,8 @@ usage(void)
 	printf(_("  -P, --progress         show progress information\n"));
 	printf(_("  -S, --slot=SLOTNAME    replication slot to use\n"));
 	printf(_("      --no-slot          prevent creation of temporary replication slot\n"));
+	printf(_("  -k, --no-verify-checksums\n"
+			 "                         do not verify checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -?, --help             show this help, then exit\n"));
@@ -1808,14 +1813,15 @@ BaseBackup(void)
 	}
 
 	basebkp =
-		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
+		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
 				 escaped_label,
 				 showprogress ? "PROGRESS" : "",
 				 includewal == FETCH_WAL ? "WAL" : "",
 				 fastcheckpoint ? "FAST" : "",
 				 includewal == NO_WAL ? "" : "NOWAIT",
 				 maxrate_clause ? maxrate_clause : "",
-				 format == 't' ? "TABLESPACE_MAP" : "");
+				 format == 't' ? "TABLESPACE_MAP" : "",
+				 verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
@@ -1970,8 +1976,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
@@ -2140,6 +2155,7 @@ main(int argc, char **argv)
 		{"progress", no_argument, NULL, 'P'},
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
+		{"no-verify-checksums", no_argument, NULL, 'k'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -2166,7 +2182,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2308,6 +2324,9 @@ main(int argc, char **argv)
 			case 'P':
 				showprogress = true;
 				break;
+			case 'k':
+				verify_checksums = false;
+				break;
 			default:
 
 				/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 32d21ce644..3162cdcd01 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -5,7 +5,7 @@ use Config;
 use File::Basename qw(basename dirname);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 93;
+use Test::More tests => 104;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -16,7 +16,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -402,3 +402,61 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# get relfilenodes of relations to corrupt
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+my $pg_index = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_index')}
+);
+
+# induce corruption
+open $file, '+<', "$pgdata/$pg_class";
+seek($file, 4000, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum verification failed/s],
+	'pg_basebackup reports checksum mismatch'
+);
+
+# induce further corruption in 5 more blocks
+open $file, '+<', "$pgdata/$pg_class";
+my @offsets = (12192, 20384, 28576, 36768, 44960);
+foreach my $offset (@offsets) {
+  seek($file, $offset, 0);
+  syswrite($file, '\0\0\0\0\0\0\0\0\0');
+}
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
+        1,
+        [qr{^$}],
+        [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+        'pg_basebackup does not report more than 5 checksum mismatches'
+);
+
+# induce corruption in a second file
+open $file, '+<', "$pgdata/$pg_index";
+seek($file, 4000, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
+        1,
+        [qr{^$}],
+        [qr/^WARNING.*7 total checksum verification failures/s],
+        'pg_basebackup correctly report the total number of checksum mismatches'
+);
+
+# do not verify checksums, should return ok
+$node->command_ok(
+	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt4", '-k' ],
+	'pg_basebackup with -k does not report checksum mismatch');
#27Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#26)
Re: [PATCH] Verify Checksums during Basebackups

On Sat, Mar 31, 2018 at 2:54 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi,

On Fri, Mar 30, 2018 at 07:46:02AM -0400, Stephen Frost wrote:

* Magnus Hagander (magnus@hagander.net) wrote:

On Fri, Mar 30, 2018 at 5:35 AM, David Steele <david@pgmasters.net>

wrote:

On 3/24/18 10:32 AM, Michael Banck wrote:

Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:

Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:

In my experience actual block errors are relatively rare, so

there

aren't likely to be more than a few in a file. More common are
overwritten or transposed files, rogue files, etc. These

produce a lot

of output.

Maybe stop after five?

The attached patch does that, and outputs the total number of
verification failures of that file after it got sent.

I'm on board with this, but I have the feeling that this is not a

very

common pattern in Postgres, or might not be project style at

all. I

can't remember even seen an error message like that.

Anybody know whether we're doing this in a similar fashion

elsewhere?

I tried to have look around and couldn't find any examples, so I'm

not

sure that patch should go in. On the other hand, we abort on

checksum

failures usually (in pg_dump e.g.), so limiting the number of

warnings

does makes sense.

I guess we need to see what others think.

Well, at this point I would say silence more or less gives consent.

Can you provide a rebased patch with the validation retry and warning
limiting logic added? I would like to take another pass through it

but I

think this is getting close.

I was meaning to mention it, but ran out of cycles.

I think this is the right way to do it, except the 5 should be a

#define

and not an inline hardcoded value :) We could argue whether it should

be "5

total" or "5 per file". When I read the emails I thought it was going

to be

5 total, but I see the implementation does 5 / file. In a

super-damanged

system that will still lead to horrible amounts of logging, but I think
maybe if your system is in that bad shape, then it's a lost cause

anyway.

5/file seems reasonable to me as well.

I also think the "total number of checksum errors" should be logged if
they're >0, not >5. And I think *that* one should be logged at the end

of

the entire process, not per file. That'd be the kind of output that

would

be the most interesting, I think (e.g. if I have it spread out with 1

block

each across 4 files, I want that logged at the end because it's easy to
otherwise miss one or two of them that may have happened a long time

apart).

I definitely like having a total # of checksum errors included at the
end, if there are any at all. When someone is looking to see why the
process returned a non-zero exit code, they're likely to start looking
at the end of the log, so having that easily available and clear as to
why the backup failed is definitely valuable.

I don't think we have a good comparison elsewhere, and that is as David
mention because other codepaths fail hard when they run into something

like

that. And we explicitly want to *not* fail hard, per previous

discussion.

Agreed.

Attached is a new and rebased patch which does the above, plus
integrates the suggested changes by David Steele. The output is now:

$ initdb -k --pgdata=data1 1> /dev/null 2> /dev/null
$ pg_ctl --pgdata=data1 --log=pg1.log start > /dev/null
$ dd conv=notrunc oflag=seek_bytes seek=4000 bs=8 count=1 if=/dev/zero
of=data1/base/12374/2610 2> /dev/null
$ for i in 4000 13000 21000 29000 37000 43000; do dd conv=notrunc
oflag=seek_bytes seek=$i bs=8 count=1 if=/dev/zero
of=data1/base/12374/1259; done 2> /dev/null
$ pg_basebackup -v -h /tmp --pgdata=data2
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000060 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_13882"
WARNING: checksum verification failed in file "./base/12374/2610", block
0: calculated C2C9 but expected EC78
WARNING: checksum verification failed in file "./base/12374/1259", block
0: calculated 8BAE but expected 46B8
WARNING: checksum verification failed in file "./base/12374/1259", block
1: calculated E413 but expected 7701
WARNING: checksum verification failed in file "./base/12374/1259", block
2: calculated 5DA9 but expected D5AA
WARNING: checksum verification failed in file "./base/12374/1259", block
3: calculated 5651 but expected 4F5E
WARNING: checksum verification failed in file "./base/12374/1259", block
4: calculated DF39 but expected DF00
WARNING: further checksum verification failures in file
"./base/12374/1259" will not be reported
WARNING: file "./base/12374/1259" has a total of 6 checksum verification
failures
WARNING: 7 total checksum verification failures
pg_basebackup: write-ahead log end point: 0/2000130
pg_basebackup: checksum error occured
$ echo $?
1
$ du -s data2
38820 data2

I ommitted the 'file "foo" has a total of X checksum verification
failures' if there is only one, as seen with file "./base/12374/2610"
above. Same with the "X total checksum verification failures" if there
was only one.

Is that ok with everybody?

I like most of this patch now :)

It might be a micro-optimisation, but ISTM that we shouldn't do the
basename(palloc(fn)) in is_heap_file() unless we actually need it -- so not
at the top of the function. (And surely "." and ".." should not occur here?
I think that's a result of copy/paste from the online checksum patch?

We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
find a way to reuse that duplication? Perhaps we want to split it into the
two pieces already out in sendFile() and pass it in as separate parameters?

If not then this second one should at least only be done inside the if
(verify_checksums).

There is a bigger problem next to that though -- I believe basename() does
not exist on Win32. I haven't tested it, but there is zero documentation of
it existing, which usually indicates it doesn't. That's the part that
definitely needs to get fixed.

I think you need to look into the functionality in port/path.c, in
particular last_dir_separator()?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#28Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#27)
1 attachment(s)
Re: [PATCH] Verify Checksums during Basebackups

Hi!

On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:

It might be a micro-optimisation, but ISTM that we shouldn't do the
basename(palloc(fn)) in is_heap_file() unless we actually need it -- so not
at the top of the function. (And surely "." and ".." should not occur here?
I think that's a result of copy/paste from the online checksum patch?

We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
find a way to reuse that duplication? Perhaps we want to split it into the
two pieces already out in sendFile() and pass it in as separate parameters?

I've done the latter now, although it looks a bit weird that the second
argument data is a subset of the first. I couldn't find another way to
not have it done twice, though.

If not then this second one should at least only be done inside the if
(verify_checksums).

We can't have both, as we need to call the is_heap_file() function in
order to determine whether we should verify the checksums.

There is a bigger problem next to that though -- I believe basename() does
not exist on Win32. I haven't tested it, but there is zero documentation of
it existing, which usually indicates it doesn't. That's the part that
definitely needs to get fixed.

I think you need to look into the functionality in port/path.c, in
particular last_dir_separator()?

Thanks for the pointer, I've used that now; I mentioned before that
basename() might be a portability hazard, but couldn't find a good
substitute myself.

V6 of the patch is attached.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Attachments:

basebackup-verify-checksum-V6.patchtext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 8c488506fa..b94dd4ac65 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2382,7 +2382,7 @@ The commands accepted in replication mode are:
   </varlistentry>
 
   <varlistentry>
-    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ]
+    <term><literal>BASE_BACKUP</literal> [ <literal>LABEL</literal> <replaceable>'label'</replaceable> ] [ <literal>PROGRESS</literal> ] [ <literal>FAST</literal> ] [ <literal>WAL</literal> ] [ <literal>NOWAIT</literal> ] [ <literal>MAX_RATE</literal> <replaceable>rate</replaceable> ] [ <literal>TABLESPACE_MAP</literal> ] [ <literal>NOVERIFY_CHECKSUMS</literal> ]
      <indexterm><primary>BASE_BACKUP</primary></indexterm>
     </term>
     <listitem>
@@ -2481,6 +2481,17 @@ The commands accepted in replication mode are:
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry>
+        <term><literal>NOVERIFY_CHECKSUMS</literal></term>
+        <listitem>
+         <para>
+          By default, checksums are verified during a base backup if they are
+          enabled. Specifying <literal>NOVERIFY_CHECKSUMS</literal> disables
+          this verification.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist>
      </para>
      <para>
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index e8921b1bb4..95045669c9 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -506,6 +506,22 @@ PostgreSQL documentation
       </listitem>
      </varlistentry>
 
+     <varlistentry>
+      <term><option>-k</option></term>
+      <term><option>--no-verify-checksums</option></term>
+      <listitem>
+       <para>
+        Disables verification of checksums, if they are enabled on the server
+        the base backup is taken from. 
+       </para>
+       <para>
+        By default, checksums are verified and checksum failures will result in
+        a non-zero exit status. However, the base backup will not be removed in
+        this case, as if the <literal>--no-clean</literal> option was used.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry>
       <term><option>-v</option></term>
       <term><option>--verbose</option></term>
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 516eea57f8..3c1fd620ae 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -16,6 +16,7 @@
 #include <unistd.h>
 #include <time.h>
 
+#include "access/xlog.h"
 #include "access/xlog_internal.h"	/* for pg_start/stop_backup */
 #include "catalog/catalog.h"
 #include "catalog/pg_type.h"
@@ -31,6 +32,8 @@
 #include "replication/basebackup.h"
 #include "replication/walsender.h"
 #include "replication/walsender_private.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -99,6 +102,15 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* The starting XLOG position of the base backup. */
+static XLogRecPtr startptr;
+
+/* Total number of checksum failures during base backup. */
+static int64 total_checksum_failures;
+
+/* Do not verify checksums. */
+static bool noverify_checksums = false;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -194,7 +206,6 @@ base_backup_cleanup(int code, Datum arg)
 static void
 perform_base_backup(basebackup_options *opt)
 {
-	XLogRecPtr	startptr;
 	TimeLineID	starttli;
 	XLogRecPtr	endptr;
 	TimeLineID	endtli;
@@ -210,6 +221,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	total_checksum_failures = 0;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
 								  labelfile, &tablespaces,
 								  tblspc_map_file,
@@ -568,6 +581,17 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (total_checksum_failures)
+	{
+		if (total_checksum_failures > 1)
+			ereport(WARNING,
+				(errmsg("%ld total checksum verification failures", total_checksum_failures)));
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("checksum verification failure during base backup")));
+	}
+
 }
 
 /*
@@ -597,6 +621,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_wal = false;
 	bool		o_maxrate = false;
 	bool		o_tablespace_map = false;
+	bool		o_noverify_checksums = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -676,6 +701,15 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->sendtblspcmapfile = true;
 			o_tablespace_map = true;
 		}
+		else if (strcmp(defel->defname, "noverify_checksums") == 0)
+		{
+			if (o_noverify_checksums)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("duplicate option \"%s\"", defel->defname)));
+			noverify_checksums = true;
+			o_noverify_checksums = true;
+		}
 		else
 			elog(ERROR, "option \"%s\" not recognized",
 				 defel->defname);
@@ -1257,6 +1291,40 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
 	return size;
 }
 
+static const char *no_heap_files[] = {
+	"pg_control",
+	"pg_filenode.map",
+	"pg_internal.init",
+	"PG_VERSION",
+	NULL,
+};
+
+static bool
+is_heap_file(const char *pn, const char *fn)
+{
+	const char **f;
+
+	/* Skip current and parent directory */
+	if (strcmp(pn, ".") == 0 ||
+		strcmp(pn, "..") == 0)
+		return false;
+
+	/* Check that the file is in a tablespace */
+	if (strncmp(pn, "./global/", 9) == 0 ||
+		strncmp(pn, "./base/", 7) == 0 ||
+		strncmp(pn, "/", 1) == 0)
+	{
+		/* Compare file against no_heap_files skiplist */
+		for (f = no_heap_files; *f; f++)
+			if (strcmp(*f, fn) == 0)
+				return false;
+
+		return true;
+	}
+	else
+		return false;
+}
+
 /*****
  * Functions for handling tar file format
  *
@@ -1277,10 +1345,21 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok)
 {
 	FILE	   *fp;
+	BlockNumber blkno = 0;
+	bool		block_retry = false;
 	char		buf[TAR_SEND_SIZE];
+	uint16		checksum;
+	int			checksum_failures = 0;
 	size_t		cnt;
+	char	   *filename;
+	int			i;
 	pgoff_t		len = 0;
+	char	   *page;
 	size_t		pad;
+	PageHeader  phdr;
+	int			segmentno = 0;
+	char	   *segmentpath;
+	bool		verify_checksum = false;
 
 	fp = AllocateFile(readfilename, "rb");
 	if (fp == NULL)
@@ -1294,8 +1373,143 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	_tarWriteHeader(tarfilename, NULL, statbuf, false);
 
+	/*
+	 * Filename (excluding path).  As last_dir_separator() includes
+	 * the last directory separator, we chop that off by incrementing the
+	 * pointer.
+	 */
+	filename = last_dir_separator(pstrdup(readfilename)) + 1;
+
+	/*
+	 * Verify checksums only when checksums are enabled, and the file to
+	 * send is either in one of the default tablespaces (`./global' or
+	 * `./base'), or in an external tablespace with an absolute pathname
+	 * (starting with `/') and is not in the list of non-heap files to be
+	 * skipped.
+	 */
+	if (!noverify_checksums && DataChecksumsEnabled() &&
+		is_heap_file(readfilename, filename))
+	{
+		verify_checksum = true;
+
+		/*
+		 * Cut off at the segment boundary (".") to get the segment number in order
+		 * to mix it into the checksum.
+		 */
+		segmentpath = strstr(filename, ".");
+		if (verify_checksum && segmentpath != NULL)
+		{
+			*segmentpath++ = '\0';
+			segmentno = atoi(segmentpath);
+			if (segmentno == 0)
+				ereport(ERROR,
+						(errmsg("invalid segment number %d in filen \"%s\"",
+						segmentno, filename)));
+		}
+	}
+
 	while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)
 	{
+		if (verify_checksum)
+		{
+			/*
+			 * The checksums are verified at block level, so we iterate over
+			 * the buffer in chunks of BLCKSZ, after making sure that
+			 * TAR_SEND_SIZE/buf is divisible by BLCKSZ and we read
+			 * a multiple of BLCKSZ bytes.
+			 */
+			Assert(TAR_SEND_SIZE % BLCKSZ == 0);
+			if (cnt % BLCKSZ != 0)
+			{
+				ereport(WARNING,
+						(errmsg("cannot verify checksum in file \"%s\", block "
+								"%d: read buffer size %d and page size %d "
+								"differ",
+								readfilename, blkno, (int)cnt, BLCKSZ)));
+				verify_checksum = false;
+				continue;
+			}
+			for (i = 0; i < cnt / BLCKSZ; i++)
+			{
+				page = buf + BLCKSZ * i;
+				/*
+				 * Only check pages which have not been modified since the
+				 * start of the base backup. Otherwise, they might have been
+				 * written only halfway and the checksum would not be valid.
+				 * However, replaying WAL would reinstate the correct page in
+				 * this case.
+				 */
+				if (PageGetLSN(page) < startptr)
+				{
+					checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
+					phdr = (PageHeader) page;
+					if (phdr->pd_checksum != checksum)
+					{
+						/*
+						 * Retry the block on the first failure.  It's possible
+						 * that we read the first 4K page of the block just
+						 * before postgres updated the entire block so it ends
+						 * up looking torn to us.  We only need to retry once
+						 * because the LSN should be updated to something we can
+						 * ignore on the next pass.  If the error happens again
+						 * then it is a true validation failure.
+						 */
+						if (block_retry == false)
+						{
+							/* Reread the failed block */
+							if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not reread block %d of file \"%s\": %m",
+										 blkno, readfilename)));
+							}
+
+							if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1)
+							{
+								ereport(ERROR,
+										(errcode_for_file_access(),
+										 errmsg("could not fseek in file \"%s\": %m",
+										 readfilename)));
+							}
+
+							/* Set flag so we know a retry was attempted */
+							block_retry = true;
+
+							/* Reset loop to validate the block again */
+							i--;
+							continue;
+						}
+
+						checksum_failures++;
+
+						if (checksum_failures <= 5)
+							ereport(WARNING,
+									(errmsg("checksum verification failed in "
+											"file \"%s\", block %d: calculated "
+											"%X but expected %X",
+											readfilename, blkno, checksum,
+											phdr->pd_checksum)));
+						if (checksum_failures == 5)
+							ereport(WARNING,
+									(errmsg("further checksum verification "
+											"failures in file \"%s\" will not "
+											"be reported", readfilename)));
+					}
+				}
+				block_retry = false;
+				blkno++;
+			}
+		}
+
 		/* Send the chunk as a CopyData message */
 		if (pq_putmessage('d', buf, cnt))
 			ereport(ERROR,
@@ -1313,6 +1527,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 			 */
 			break;
 		}
+
 	}
 
 	/* If the file was truncated while we were sending it, pad it with zeros */
@@ -1341,6 +1556,13 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 	FreeFile(fp);
 
+	if (checksum_failures > 1) {
+		ereport(WARNING,
+				(errmsg("file \"%s\" has a total of %d checksum verification "
+						"failures", readfilename, checksum_failures)));
+	}
+	total_checksum_failures += checksum_failures;
+
 	return true;
 }
 
diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
index beb2c2877b..843a878ff3 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -77,6 +77,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_MAX_RATE
 %token K_WAL
 %token K_TABLESPACE_MAP
+%token K_NOVERIFY_CHECKSUMS
 %token K_TIMELINE
 %token K_PHYSICAL
 %token K_LOGICAL
@@ -154,7 +155,7 @@ var_name:	IDENT	{ $$ = $1; }
 
 /*
  * BASE_BACKUP [LABEL '<label>'] [PROGRESS] [FAST] [WAL] [NOWAIT]
- * [MAX_RATE %d] [TABLESPACE_MAP]
+ * [MAX_RATE %d] [TABLESPACE_MAP] [NOVERIFY_CHECKSUMS]
  */
 base_backup:
 			K_BASE_BACKUP base_backup_opt_list
@@ -208,6 +209,11 @@ base_backup_opt:
 				  $$ = makeDefElem("tablespace_map",
 								   (Node *)makeInteger(true), -1);
 				}
+			| K_NOVERIFY_CHECKSUMS
+				{
+				  $$ = makeDefElem("noverify_checksums",
+								   (Node *)makeInteger(true), -1);
+				}
 			;
 
 create_replication_slot:
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index fb48241e48..7bb501f594 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -92,6 +92,7 @@ PROGRESS			{ return K_PROGRESS; }
 MAX_RATE		{ return K_MAX_RATE; }
 WAL			{ return K_WAL; }
 TABLESPACE_MAP			{ return K_TABLESPACE_MAP; }
+NOVERIFY_CHECKSUMS	{ return K_NOVERIFY_CHECKSUMS; }
 TIMELINE			{ return K_TIMELINE; }
 START_REPLICATION	{ return K_START_REPLICATION; }
 CREATE_REPLICATION_SLOT		{ return K_CREATE_REPLICATION_SLOT; }
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..39013ff7e0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -95,6 +97,7 @@ static char *replication_slot = NULL;
 static bool temp_replication_slot = true;
 static bool create_slot = false;
 static bool no_slot = false;
+static bool verify_checksums = true;
 
 static bool success = false;
 static bool made_new_pgdata = false;
@@ -155,7 +158,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +198,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 					_("%s: data directory \"%s\" not removed at user's request\n"),
 					progname, basedir);
@@ -206,7 +209,7 @@ cleanup_directories_atexit(void)
 					progname, xlog_dir);
 	}
 
-	if (made_tablespace_dirs || found_tablespace_dirs)
+	if ((made_tablespace_dirs || found_tablespace_dirs) && !checksum_failure)
 		fprintf(stderr,
 				_("%s: changes to tablespace directories will not be undone\n"),
 				progname);
@@ -360,6 +363,8 @@ usage(void)
 	printf(_("  -P, --progress         show progress information\n"));
 	printf(_("  -S, --slot=SLOTNAME    replication slot to use\n"));
 	printf(_("      --no-slot          prevent creation of temporary replication slot\n"));
+	printf(_("  -k, --no-verify-checksums\n"
+			 "                         do not verify checksums\n"));
 	printf(_("  -v, --verbose          output verbose messages\n"));
 	printf(_("  -V, --version          output version information, then exit\n"));
 	printf(_("  -?, --help             show this help, then exit\n"));
@@ -1808,14 +1813,15 @@ BaseBackup(void)
 	}
 
 	basebkp =
-		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s",
+		psprintf("BASE_BACKUP LABEL '%s' %s %s %s %s %s %s %s",
 				 escaped_label,
 				 showprogress ? "PROGRESS" : "",
 				 includewal == FETCH_WAL ? "WAL" : "",
 				 fastcheckpoint ? "FAST" : "",
 				 includewal == NO_WAL ? "" : "NOWAIT",
 				 maxrate_clause ? maxrate_clause : "",
-				 format == 't' ? "TABLESPACE_MAP" : "");
+				 format == 't' ? "TABLESPACE_MAP" : "",
+				 verify_checksums ? "" : "NOVERIFY_CHECKSUMS");
 
 	if (PQsendQuery(conn, basebkp) == 0)
 	{
@@ -1970,8 +1976,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-				progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum error occured\n"),
+					progname);
+			checksum_failure = true;
+		} else {
+			fprintf(stderr, _("%s: final receive failed: %s"),
+					progname, PQerrorMessage(conn));
+		}
 		disconnect_and_exit(1);
 	}
 
@@ -2140,6 +2155,7 @@ main(int argc, char **argv)
 		{"progress", no_argument, NULL, 'P'},
 		{"waldir", required_argument, NULL, 1},
 		{"no-slot", no_argument, NULL, 2},
+		{"no-verify-checksums", no_argument, NULL, 'k'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -2166,7 +2182,7 @@ main(int argc, char **argv)
 
 	atexit(cleanup_directories_atexit);
 
-	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWvP",
+	while ((c = getopt_long(argc, argv, "CD:F:r:RS:T:X:l:nNzZ:d:c:h:p:U:s:wWkvP",
 							long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -2308,6 +2324,9 @@ main(int argc, char **argv)
 			case 'P':
 				showprogress = true;
 				break;
+			case 'k':
+				verify_checksums = false;
+				break;
 			default:
 
 				/*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 32d21ce644..3162cdcd01 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -5,7 +5,7 @@ use Config;
 use File::Basename qw(basename dirname);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 93;
+use Test::More tests => 104;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -16,7 +16,7 @@ my $tempdir = TestLib::tempdir;
 my $node = get_new_node('main');
 
 # Initialize node without replication settings
-$node->init;
+$node->init(extra => [ '--data-checksums' ]);
 $node->start;
 my $pgdata = $node->data_dir;
 
@@ -402,3 +402,61 @@ like(
 	slurp_file("$tempdir/backupxs_sl_R/recovery.conf"),
 	qr/^primary_slot_name = 'slot1'\n/m,
 	'recovery.conf sets primary_slot_name');
+
+my $checksum = $node->safe_psql('postgres', 'SHOW data_checksums;');
+is($checksum, 'on', 'checksums are enabled');
+
+# get relfilenodes of relations to corrupt
+my $pg_class = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_class')}
+);
+my $pg_index = $node->safe_psql('postgres',
+	q{SELECT pg_relation_filepath('pg_index')}
+);
+
+# induce corruption
+open $file, '+<', "$pgdata/$pg_class";
+seek($file, 4000, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt"],
+	1,
+	[qr{^$}],
+	[qr/^WARNING.*checksum verification failed/s],
+	'pg_basebackup reports checksum mismatch'
+);
+
+# induce further corruption in 5 more blocks
+open $file, '+<', "$pgdata/$pg_class";
+my @offsets = (12192, 20384, 28576, 36768, 44960);
+foreach my $offset (@offsets) {
+  seek($file, $offset, 0);
+  syswrite($file, '\0\0\0\0\0\0\0\0\0');
+}
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt2"],
+        1,
+        [qr{^$}],
+        [qr/^WARNING.*further.*failures.*will.not.be.reported/s],
+        'pg_basebackup does not report more than 5 checksum mismatches'
+);
+
+# induce corruption in a second file
+open $file, '+<', "$pgdata/$pg_index";
+seek($file, 4000, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+$node->command_checks_all([ 'pg_basebackup', '-D', "$tempdir/backup_corrupt3"],
+        1,
+        [qr{^$}],
+        [qr/^WARNING.*7 total checksum verification failures/s],
+        'pg_basebackup correctly report the total number of checksum mismatches'
+);
+
+# do not verify checksums, should return ok
+$node->command_ok(
+	[   'pg_basebackup', '-D', "$tempdir/backup_corrupt4", '-k' ],
+	'pg_basebackup with -k does not report checksum mismatch');
#29Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#28)
Re: [PATCH] Verify Checksums during Basebackups

On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi!

On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:

It might be a micro-optimisation, but ISTM that we shouldn't do the
basename(palloc(fn)) in is_heap_file() unless we actually need it -- so

not

at the top of the function. (And surely "." and ".." should not occur

here?

I think that's a result of copy/paste from the online checksum patch?

We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
find a way to reuse that duplication? Perhaps we want to split it into

the

two pieces already out in sendFile() and pass it in as separate

parameters?

I've done the latter now, although it looks a bit weird that the second
argument data is a subset of the first. I couldn't find another way to
not have it done twice, though.

I agree, but I think it's still cleaner.

On further look, there is actually no need to pstrdup() at all -- we never
used the modified part of the string anyway, because we don't care about
the oid (unlike pg_verify_checksums).

So I adjusted the patch by that.

If not then this second one should at least only be done inside the if

(verify_checksums).

We can't have both, as we need to call the is_heap_file() function in
order to determine whether we should verify the checksums.

Right. I realize that -- thus the "if not". But I guess I was not clear in
what I meant -- see attached file for it.

There is a bigger problem next to that though -- I believe basename()
does

not exist on Win32. I haven't tested it, but there is zero documentation

of

it existing, which usually indicates it doesn't. That's the part that
definitely needs to get fixed.

I think you need to look into the functionality in port/path.c, in
particular last_dir_separator()?

Thanks for the pointer, I've used that now; I mentioned before that
basename() might be a portability hazard, but couldn't find a good
substitute myself.

Yeah, I have a recollection somewhere of running into this before, but I
couldn't find any references. But the complete lack of docs about it on
msdn.microsoft.com is a clear red flag :)

V6 of the patch is attached.

Excellent. I've done some mangling on it:

* Changed the is_heap_file to is_checksummed_file (and the associtaed
struct name), because this is really what it's about (we for example verify
checksums on indexes, which are clearly not heaps)
* Moved the list of files to the top of the file next to the other lists of
files/directories
* Added missing function prototype at the top, and changed the parameter
names to be a bit more clear
* Added some comments
* Changed the logic around the checksum-check to avoid the pstrdup() and to
not call the path functions unless necessary (per comment above)
* "filen" -> "file" in message
* xlog.h does not need to be included
* pgindent

Remaining question:

The check for (cnt % BLCKSZ != 0) currently does "continue", which means
that this block of data isn't actually sent to the client at all, which
seems completely wrong. We only want to prevent checksum validations.

I have moved the check up a bit, and refactored it so it continues to do
the actual transmission of the file if this path is hit.

I have pushed an updated patch with those changes. Please review the result
and let me know I broke something :)

Thanks!!

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#30Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#29)
Re: [PATCH] Verify Checksums during Basebackups

On Tue, Apr 3, 2018 at 1:52 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi!

On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:

It might be a micro-optimisation, but ISTM that we shouldn't do the
basename(palloc(fn)) in is_heap_file() unless we actually need it -- so

not

at the top of the function. (And surely "." and ".." should not occur

here?

I think that's a result of copy/paste from the online checksum patch?

We also do exactly the same basename(palloc(fn)) in sendFile(). Can we
find a way to reuse that duplication? Perhaps we want to split it into

the

two pieces already out in sendFile() and pass it in as separate

parameters?

I've done the latter now, although it looks a bit weird that the second
argument data is a subset of the first. I couldn't find another way to
not have it done twice, though.

I agree, but I think it's still cleaner.

On further look, there is actually no need to pstrdup() at all -- we never
used the modified part of the string anyway, because we don't care about
the oid (unlike pg_verify_checksums).

So I adjusted the patch by that.

If not then this second one should at least only be done inside the if

(verify_checksums).

We can't have both, as we need to call the is_heap_file() function in
order to determine whether we should verify the checksums.

Right. I realize that -- thus the "if not". But I guess I was not clear in
what I meant -- see attached file for it.

There is a bigger problem next to that though -- I believe basename()
does

not exist on Win32. I haven't tested it, but there is zero

documentation of

it existing, which usually indicates it doesn't. That's the part that
definitely needs to get fixed.

I think you need to look into the functionality in port/path.c, in
particular last_dir_separator()?

Thanks for the pointer, I've used that now; I mentioned before that
basename() might be a portability hazard, but couldn't find a good
substitute myself.

Yeah, I have a recollection somewhere of running into this before, but I
couldn't find any references. But the complete lack of docs about it on
msdn.microsoft.com is a clear red flag :)

V6 of the patch is attached.

Excellent. I've done some mangling on it:

* Changed the is_heap_file to is_checksummed_file (and the associtaed
struct name), because this is really what it's about (we for example verify
checksums on indexes, which are clearly not heaps)
* Moved the list of files to the top of the file next to the other lists
of files/directories
* Added missing function prototype at the top, and changed the parameter
names to be a bit more clear
* Added some comments
* Changed the logic around the checksum-check to avoid the pstrdup() and
to not call the path functions unless necessary (per comment above)
* "filen" -> "file" in message
* xlog.h does not need to be included
* pgindent

Remaining question:

The check for (cnt % BLCKSZ != 0) currently does "continue", which means
that this block of data isn't actually sent to the client at all, which
seems completely wrong. We only want to prevent checksum validations.

I have moved the check up a bit, and refactored it so it continues to do
the actual transmission of the file if this path is hit.

And of course I forgot that particular part in the first push, so I've
pushed it as a separate commit.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#31Michael Banck
michael.banck@credativ.de
In reply to: Magnus Hagander (#30)
Re: [PATCH] Verify Checksums during Basebackups

<div dir='auto'><span style="font-family: sans-serif;">Hi Magnus,</span><div dir="auto"><span style="font-family: sans-serif;"><br></span></div><div dir="auto"><span style="font-family: sans-serif;">Am 03.04.2018 13:59 schrieb Magnus Hagander &lt;magnus@hagander.net&gt;:</span><blockquote style="font-family: sans-serif; margin: 0px 0px 0px 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div dir="ltr"><div dir="auto"><div>And of course I forgot that particular part in the first push, so I've pushed it as a separate commit.&nbsp;</div></div></div></blockquote><div dir="auto"><br></div><div dir="auto">Many thanks for cleaning up the patch and committing it!</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto">Michael</div><blockquote style="font-family: sans-serif; margin: 0px 0px 0px 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"></blockquote><br><div data-smartmail="gmail_signature" dir="auto">-- <br>Michael Banck<br>Projektleiter / Senior Berater<br>Tel.: +49 2166 9901-171<br>Fax:&nbsp; +49 2166 9901-100<br>Email: michael.banck@credativ.de&nbsp;<br>credativ GmbH, HRB Mönchengladbach 12080<br>USt-ID-Nummer: DE204566209<br>Trompeterallee 108, 41189 Mönchengladbach<br>Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer</div></div></div>

#32Magnus Hagander
magnus@hagander.net
In reply to: Michael Banck (#31)
Re: [PATCH] Verify Checksums during Basebackups

On Tue, Apr 3, 2018 at 4:00 PM, Michael Banck <michael.banck@credativ.de>
wrote:

Hi Magnus,

Am 03.04.2018 13:59 schrieb Magnus Hagander <magnus@hagander.net>:

And of course I forgot that particular part in the first push, so I've
pushed it as a separate commit.

Many thanks for cleaning up the patch and committing it!

Seems the tests are failing on prairiedog:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog&amp;dt=2018-04-03%2012%3A15%3A27&amp;stg=pg_basebackup-check

I don't have time to dive in right now, but that seems interesting -- it's
reporting the failures, but it's then reporting the total number of
failures as 0...

Note that prairedog is a PowerPC machine -- I bet that has something to do
with it.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#32)
Re: [PATCH] Verify Checksums during Basebackups

Magnus Hagander <magnus@hagander.net> writes:

Seems the tests are failing on prairiedog:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog&amp;dt=2018-04-03%2012%3A15%3A27&amp;stg=pg_basebackup-check
I don't have time to dive in right now, but that seems interesting -- it's
reporting the failures, but it's then reporting the total number of
failures as 0...
Note that prairedog is a PowerPC machine -- I bet that has something to do
with it.

endianness issue? I can look closer if you can point me to where to look.

regards, tom lane

#34Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#33)
Re: [PATCH] Verify Checksums during Basebackups

On Tue, Apr 3, 2018 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Seems the tests are failing on prairiedog:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?

nm=prairiedog&dt=2018-04-03%2012%3A15%3A27&stg=pg_basebackup-check

I don't have time to dive in right now, but that seems interesting --

it's

reporting the failures, but it's then reporting the total number of
failures as 0...
Note that prairedog is a PowerPC machine -- I bet that has something to

do

with it.

endianness issue? I can look closer if you can point me to where to look.

I think the problem comes from:
if (total_checksum_failures > 1)
ereport(WARNING,
(errmsg("%ld total checksum verification failures",
total_checksum_failures)));

That one actually logs a zero in the text. Which should not possibly ever
pr5int "0 total checksum verification failures".

Unless.. %ld is the wrong thing to print:
static int64 total_checksum_failures;

We should perhaps be using something other than %ld to print that?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#34)
Re: [PATCH] Verify Checksums during Basebackups

Magnus Hagander <magnus@hagander.net> writes:

Unless.. %ld is the wrong thing to print:
static int64 total_checksum_failures;
We should perhaps be using something other than %ld to print that?

INT64_FORMAT.

regards, tom lane

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#35)
Re: [PATCH] Verify Checksums during Basebackups

I wrote:

Magnus Hagander <magnus@hagander.net> writes:

Unless.. %ld is the wrong thing to print:
static int64 total_checksum_failures;
We should perhaps be using something other than %ld to print that?

INT64_FORMAT.

BTW, don't just stick INT64_FORMAT into the message-to-be-translated,
or you'll break things for translation. Good practice is to sprintf
into a local char array with INT64_FORMAT, then include the number
into the displayed message with %s. You can find examples easily
by grepping for INT64_FORMAT.

regards, tom lane

#37Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#36)
Re: [PATCH] Verify Checksums during Basebackups

On Tue, Apr 3, 2018 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Magnus Hagander <magnus@hagander.net> writes:

Unless.. %ld is the wrong thing to print:
static int64 total_checksum_failures;
We should perhaps be using something other than %ld to print that?

INT64_FORMAT.

BTW, don't just stick INT64_FORMAT into the message-to-be-translated,
or you'll break things for translation. Good practice is to sprintf
into a local char array with INT64_FORMAT, then include the number
into the displayed message with %s. You can find examples easily
by grepping for INT64_FORMAT.

Thanks for the hint. I've pushed a fix along this line, let's see if clears
things.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;