pgsql: Add TAP tests for pg_verify_checksums

Started by Michael Paquierover 7 years ago76 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.

Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: /messages/by-id/20181005012645.GE1629@paquier.xyz

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/b34e84f160a47d9070b304bfc1baf17596d71865

Modified Files
--------------
src/bin/initdb/t/001_initdb.pl | 14 +++++-
src/bin/pg_verify_checksums/.gitignore | 2 +
src/bin/pg_verify_checksums/Makefile | 6 +++
src/bin/pg_verify_checksums/t/001_basic.pl | 8 ++++
src/bin/pg_verify_checksums/t/002_actions.pl | 69 ++++++++++++++++++++++++++++
5 files changed, 98 insertions(+), 1 deletion(-)

#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Fri, Oct 12, 2018 at 12:17:57AM +0000, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

culicidae is failing after this commit in an interesting way, so we
really needed some tests for this utility:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=culicidae&br=HEAD

ok 7 - fails with corrupted data status (got 1 vs expected 1)
not ok 8 - fails with corrupted data stdout /(?^:Bad checksums:.*1)/

# Failed test 'fails with corrupted data stdout /(?^:Bad checksums:.*1)/'
# at t/002_actions.pl line 57.
# ''
# doesn't match '(?^:Bad checksums:.*1)'
not ok 9 - fails with corrupted data stderr /(?^:checksum verification failed)/

So the checksum failure is happening, but the output is not present.
What's not clear to me is that all the other hosts running Debian SID
don't complain, and that the follow-up test on a single relfilenode is
able to pass with a test much similar to the previous one. Is the issue
that we should just call fflush() on stderr and stdout at the end of
main() in pg_verify_checksum.c? Shouldn't we back-patch that?
--
Michael

#3Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#2)
Re: pgsql: Add TAP tests for pg_verify_checksums

Hi,

On 2018-10-12 09:56:14 +0900, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:17:57AM +0000, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

culicidae is failing after this commit in an interesting way, so we
really needed some tests for this utility:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=culicidae&br=HEAD

ok 7 - fails with corrupted data status (got 1 vs expected 1)
not ok 8 - fails with corrupted data stdout /(?^:Bad checksums:.*1)/

# Failed test 'fails with corrupted data stdout /(?^:Bad checksums:.*1)/'
# at t/002_actions.pl line 57.
# ''
# doesn't match '(?^:Bad checksums:.*1)'
not ok 9 - fails with corrupted data stderr /(?^:checksum verification failed)/

So the checksum failure is happening, but the output is not present.
What's not clear to me is that all the other hosts running Debian SID
don't complain, and that the follow-up test on a single relfilenode is
able to pass with a test much similar to the previous one.

culicidae tests EXEC_BACKEND, so there's an explanation as to why it
sometimes behaves differently. But here I don't immediately see how
that'd matter. Probably still worth verifying that it's not somehow
caused by that.

Is the issue
that we should just call fflush() on stderr and stdout at the end of
main() in pg_verify_checksum.c? Shouldn't we back-patch that?

I don't see how that'd would change anything. A "missing" fflush() isn't
a licence to simply throw away buffer contents, so I doubt it's just that.

Greetings,

Andres Freund

#4Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#3)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Thu, Oct 11, 2018 at 06:04:11PM -0700, Andres Freund wrote:

culicidae tests EXEC_BACKEND, so there's an explanation as to why it
sometimes behaves differently. But here I don't immediately see how
that'd matter. Probably still worth verifying that it's not somehow
caused by that.

Thanks, that's the point of detail I needed about culicidae (you will
need to explain me one day face-to-face how you pronounce it). I have
been able to reproduce the problem, and that's a bug within
pg_verify_checksums as it fails to consider that config_exec_params is
not a file it should scan when using EXEC_BACKEND. The same can happen
with config_exec_params.new.

The attached, which fixes the issue for me, needs to be back-patched to
v11.
--
Michael

Attachments:

checksum-exec-backend.patchtext/x-diff; charset=us-asciiDownload+4-0
#5Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#4)
Re: pgsql: Add TAP tests for pg_verify_checksums

Hi,

On 2018-10-12 10:39:18 +0900, Michael Paquier wrote:

On Thu, Oct 11, 2018 at 06:04:11PM -0700, Andres Freund wrote:

culicidae tests EXEC_BACKEND, so there's an explanation as to why it
sometimes behaves differently. But here I don't immediately see how
that'd matter. Probably still worth verifying that it's not somehow
caused by that.

Thanks, that's the point of detail I needed about culicidae

Note that that's also mentioned on the BF page when you hover over the
yellow stick-it symbol.

(you will need to explain me one day face-to-face how you pronounce
it).

I didn't choose it ;)

I have been able to reproduce the problem, and that's a bug within
pg_verify_checksums as it fails to consider that config_exec_params is
not a file it should scan when using EXEC_BACKEND. The same can
happen with config_exec_params.new.

Ugh, so it's actively borked on windows right now?

The attached, which fixes the issue for me, needs to be back-patched to
v11.

Looks consistent with what we have rn. But:

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 1bc020ab6c..a6c884f149 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -54,6 +54,10 @@ static const char *const skip[] = {
"pg_filenode.map",
"pg_internal.init",
"PG_VERSION",
+#ifdef EXEC_BACKEND
+	"config_exec_params",
+	"config_exec_params.new",
+#endif
NULL,
};

Hm. Maybe I'm confused, but how is it correct to use a blacklisting
approach here? It's far from absurd to have files inside a tablespacs
when using temp_tablespace that aren't written through the buffer
manager. Like, uh, temp files, when using temp_tablespaces. But even
leaving that aside, there's a few extensions that place additional files
into the tablespace directories (and we don't give them an alternative
solution). I think at the very least you're going to need to pattern
match to relation looking files.

Greetings,

Andres Freund

#6Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#5)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Thu, Oct 11, 2018 at 06:53:19PM -0700, Andres Freund wrote:

On 2018-10-12 10:39:18 +0900, Michael Paquier wrote:

I have been able to reproduce the problem, and that's a bug within
pg_verify_checksums as it fails to consider that config_exec_params is
not a file it should scan when using EXEC_BACKEND. The same can
happen with config_exec_params.new.

Ugh, so it's actively borked on windows right now?

It looks so. Automated testing proves to be useful sometimes.

Hm. Maybe I'm confused, but how is it correct to use a blacklisting
approach here? It's far from absurd to have files inside a tablespacs
when using temp_tablespace that aren't written through the buffer
manager. Like, uh, temp files, when using temp_tablespaces.

pg_verify_checksums assumes that the cluster has been cleanly shut
down so temp files are not an issue, no? However...

But even
leaving that aside, there's a few extensions that place additional files
into the tablespace directories (and we don't give them an alternative
solution).

This is a good argument. I have not played much with such extensions
myself though.

I think at the very least you're going to need to pattern match to
relation looking files.

Yes, it seems so. A whitelist approach would consist in checking for
relfilenodes, VMs and FSMs as files authorized for scan, in a way rather
similar to what looks_like_temp_rel_name() does for temp files...
--
Michael

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#6)
Re: pgsql: Add TAP tests for pg_verify_checksums

Hi,

On 2018-10-12 11:07:58 +0900, Michael Paquier wrote:

On Thu, Oct 11, 2018 at 06:53:19PM -0700, Andres Freund wrote:

Hm. Maybe I'm confused, but how is it correct to use a blacklisting
approach here? It's far from absurd to have files inside a tablespacs
when using temp_tablespace that aren't written through the buffer
manager. Like, uh, temp files, when using temp_tablespaces.

pg_verify_checksums assumes that the cluster has been cleanly shut
down so temp files are not an issue, no? However...

We only remove temp dirs on startup, and I'm pretty sure there at least
not too long ago were a few error cases where we can leak temp files in
individual sessions. And there's talk about allowing
pg_verify_checksums when online. So this seems to be an angle that
needs to be thought about...

Greetings,

Andres Freund

#8Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#7)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Thu, Oct 11, 2018 at 07:41:18PM -0700, Andres Freund wrote:

We only remove temp dirs on startup, and I'm pretty sure there at least
not too long ago were a few error cases where we can leak temp files in
individual sessions. And there's talk about allowing
pg_verify_checksums when online. So this seems to be an angle that
needs to be thought about...

Agreed. I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.
--
Michael

#9Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#8)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Fri, Oct 12, 2018 at 12:11:58PM +0900, Michael Paquier wrote:

Agreed. I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.

So, I have coded this thing, and finish with the attached. The
following file patterns are accepted for the checksums:
<digits>.<segment>
<digits>_<forkname>
<digits>_<forkname>.<segment>
I have added some tests on the way to make sure that all the patterns
get covered. Please note that this skips the temporary files.

Thoughts?
--
Michael

Attachments:

checksum-whitelist.patchtext/x-diff; charset=us-asciiDownload+68-11
#10Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#9)
Re: pgsql: Add TAP tests for pg_verify_checksums

Hi,

On Fri, Oct 12, 2018 at 03:05:43PM +0900, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:11:58PM +0900, Michael Paquier wrote:

Agreed. I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.

I think that's the right solution; the online verification patch adds
even more logic to the blacklist so getting rid of it in favor of a
whitelist is +1 with me.

So, I have coded this thing, and finish with the attached. The
following file patterns are accepted for the checksums:
<digits>.<segment>
<digits>_<forkname>
<digits>_<forkname>.<segment>
I have added some tests on the way to make sure that all the patterns
get covered. Please note that this skips the temporary files.

Maybe also add some correct (to be scanned) but non-empty garbage files
later on that it should barf on?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB M�nchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 M�nchengladbach
Gesch�ftsf�hrung: Dr. Michael Meskes, J�rg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#1)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/11/2018 08:17 PM, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.

Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: /messages/by-id/20181005012645.GE1629@paquier.xyz

We have some failures here on Windows. I suspect we might need to do a
little more surgery like was done in commit
efd7f8e36553cd32e445061cbbc80d32028f4248

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#11)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/12/2018 05:14 PM, Andrew Dunstan wrote:

On 10/11/2018 08:17 PM, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.

Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: /messages/by-id/20181005012645.GE1629@paquier.xyz

We have some failures here on Windows. I suspect we might need to do a
little more surgery like was done in commit
efd7f8e36553cd32e445061cbbc80d32028f4248

No, hah, I see what it's doing. It's trying to checksum a non-data file
that only exists in the EXEC_BACKEND case:

pg_verify_checksums: could not read block 0 in file "G:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_verify_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata/global/config_exec_params": read 4843 of 8192

This gadget needs to learn to exclude such files.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#12)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 2018-10-12 17:46:17 -0400, Andrew Dunstan wrote:

On 10/12/2018 05:14 PM, Andrew Dunstan wrote:

On 10/11/2018 08:17 PM, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.

Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: /messages/by-id/20181005012645.GE1629@paquier.xyz

We have some failures here on Windows. I suspect we might need to do a
little more surgery like was done in commit
efd7f8e36553cd32e445061cbbc80d32028f4248

No, hah, I see what it's doing. It's trying to checksum a non-data file that
only exists in the EXEC_BACKEND case:

pg_verify_checksums: could not read block 0 in file "G:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_verify_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata/global/config_exec_params": read 4843 of 8192

This gadget needs to learn to exclude such files.

See conversation around http://archives.postgresql.org/message-id/20181012060543.GE30064%40paquier.xyz

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Andres Freund (#13)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/12/2018 05:52 PM, Andres Freund wrote:

On 2018-10-12 17:46:17 -0400, Andrew Dunstan wrote:

On 10/12/2018 05:14 PM, Andrew Dunstan wrote:

On 10/11/2018 08:17 PM, Michael Paquier wrote:

Add TAP tests for pg_verify_checksums

All options available in the utility get coverage:
- Tests with disabled page checksums.
- Tests with enabled test checksums.
- Emulation of corruption and broken checksums with a full scan and
single relfilenode scan.

This patch has been contributed mainly by Michael Banck and Magnus
Hagander with things presented on various threads, and I have gathered
all the contents into a single patch.

Author: Michael Banck, Magnus Hagander, Michael Paquier
Reviewed-by: Peter Eisentraut
Discussion: /messages/by-id/20181005012645.GE1629@paquier.xyz

We have some failures here on Windows. I suspect we might need to do a
little more surgery like was done in commit
efd7f8e36553cd32e445061cbbc80d32028f4248

No, hah, I see what it's doing. It's trying to checksum a non-data file that
only exists in the EXEC_BACKEND case:

pg_verify_checksums: could not read block 0 in file "G:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_verify_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata/global/config_exec_params": read 4843 of 8192

This gadget needs to learn to exclude such files.

See conversation around http://archives.postgresql.org/message-id/20181012060543.GE30064%40paquier.xyz

Ah, thanks, missed that.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#10)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Fri, Oct 12, 2018 at 12:14:48PM +0200, Michael Banck wrote:

On Fri, Oct 12, 2018 at 03:05:43PM +0900, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:11:58PM +0900, Michael Paquier wrote:

Agreed. I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.

I think that's the right solution; the online verification patch adds
even more logic to the blacklist so getting rid of it in favor of a
whitelist is +1 with me.

Thanks Michael for the input!

So, I have coded this thing, and finish with the attached. The
following file patterns are accepted for the checksums:
<digits>.<segment>
<digits>_<forkname>
<digits>_<forkname>.<segment>
I have added some tests on the way to make sure that all the patterns
get covered. Please note that this skips the temporary files.

Maybe also add some correct (to be scanned) but non-empty garbage files
later on that it should barf on?

I was not sure about doing that in the main patch so I tweaked manually
the test to make sure that the tool still complained with "could not
read block" as it should. That's easy enough to add, so I'll add them
with multiple file patterns. Those are cheap checks as well if they are
placed in global/.

Another problem that the patch has is that it is not using
forkname_to_number() to scan for all the fork types, and I forgot init
forks in the previous version. Using forkname_to_number() also makes
the tool more bug-proof, it is also not complicated to plug into the
existing patch.

Anyway, I have a bit of a problem here, which prevents me to stay in
front of a computer or to look at a screen for more than a couple of
minutes in a row for a couple of days at least, and I don't like to keep
the buildfarm unhappy for the time being. There are three options:
1) Revert the TAP tests of pg_verify_checksums.
2) Push the patch which adds new entries for EXEC_BACKEND files in the
skip list. That's a short workaround, and that would allow default
deployments of Postgres to use the tool.
3) Finish the patch with the whitelist approach.

I can do 1) or 2) in my condition. 3) requires more work than I can do
now, though the patch to do is getting in shape, so the buildfarm would
stay unhappy. Any preference of the course of action to take?
--
Michael

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#15)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/13/2018 04:30 AM, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:14:48PM +0200, Michael Banck wrote:

On Fri, Oct 12, 2018 at 03:05:43PM +0900, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:11:58PM +0900, Michael Paquier wrote:

Agreed. I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.

I think that's the right solution; the online verification patch adds
even more logic to the blacklist so getting rid of it in favor of a
whitelist is +1 with me.

Thanks Michael for the input!

So, I have coded this thing, and finish with the attached. The
following file patterns are accepted for the checksums:
<digits>.<segment>
<digits>_<forkname>
<digits>_<forkname>.<segment>
I have added some tests on the way to make sure that all the patterns
get covered. Please note that this skips the temporary files.

Maybe also add some correct (to be scanned) but non-empty garbage files
later on that it should barf on?

I was not sure about doing that in the main patch so I tweaked manually
the test to make sure that the tool still complained with "could not
read block" as it should. That's easy enough to add, so I'll add them
with multiple file patterns. Those are cheap checks as well if they are
placed in global/.

Another problem that the patch has is that it is not using
forkname_to_number() to scan for all the fork types, and I forgot init
forks in the previous version. Using forkname_to_number() also makes
the tool more bug-proof, it is also not complicated to plug into the
existing patch.

Anyway, I have a bit of a problem here, which prevents me to stay in
front of a computer or to look at a screen for more than a couple of
minutes in a row for a couple of days at least, and I don't like to keep
the buildfarm unhappy for the time being. There are three options:
1) Revert the TAP tests of pg_verify_checksums.
2) Push the patch which adds new entries for EXEC_BACKEND files in the
skip list. That's a short workaround, and that would allow default
deployments of Postgres to use the tool.
3) Finish the patch with the whitelist approach.

I can do 1) or 2) in my condition. 3) requires more work than I can do
now, though the patch to do is getting in shape, so the buildfarm would
stay unhappy. Any preference of the course of action to take?

I have disabled the test temporarily on my two animals since I want to
make sure they are working OK with other changes, and we know what the
problem is. Andres might want to do that with his animal also just add
"--skip-steps=pg_verify_checksums-check" to the command line.

If you want to throw what you have for 3) over to wall to me I can see
if I can finish it.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#16)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/13/2018 10:00 AM, Andrew Dunstan wrote:

On 10/13/2018 04:30 AM, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:14:48PM +0200, Michael Banck wrote:

On Fri, Oct 12, 2018 at 03:05:43PM +0900, Michael Paquier wrote:

On Fri, Oct 12, 2018 at 12:11:58PM +0900, Michael Paquier wrote:

Agreed.� I am just working on a patch for v11- which uses a
whitelist-based method instead of what is present now. Reverting the
tests to put the buildfarm to green could be done, but that's not the
root of the problem.

I think that's the right solution; the online verification patch adds
even more logic to the blacklist so getting rid of it in favor of a
whitelist is +1 with me.

Thanks Michael for the input!

So, I have coded this thing, and finish with the attached.� The
following file patterns are accepted for the checksums:
<digits>.<segment>
<digits>_<forkname>
<digits>_<forkname>.<segment>
I have added some tests on the way to make sure that all the patterns
get covered.� Please note that this skips the temporary files.

Maybe also add some correct (to be scanned) but non-empty garbage files
later on that it should barf on?

I was not sure about doing that in the main patch so I tweaked manually
the test to make sure that the tool still complained with "could not
read block" as it should.� That's easy enough to add, so I'll add them
with multiple file patterns.� Those are cheap checks as well if they are
placed in global/.

Another problem that the patch has is that it is not using
forkname_to_number() to scan for all the fork types, and I forgot init
forks in the previous version.� Using forkname_to_number() also makes
the tool more bug-proof, it is also not complicated to plug into the
existing patch.

Anyway, I have a bit of a problem here, which prevents me to stay in
front of a computer or to look at a screen for more than a couple of
minutes in a row for a couple of days at least, and I don't like to keep
the buildfarm unhappy for the time being.� There are three options:
1) Revert the TAP tests of pg_verify_checksums.
2) Push the patch which adds new entries for EXEC_BACKEND files in the
skip list.� That's a short workaround, and that would allow default
deployments of Postgres to use the tool.
3) Finish the patch with the whitelist approach.

I can do 1) or 2) in my condition.� 3) requires more work than I can do
now, though the patch to do is getting in shape, so the buildfarm would
stay unhappy.� Any preference of the course of action to take?

I have disabled the test temporarily on my two animals since I want to
make sure they are working OK with other changes, and we know what the
problem is. Andres might want to do that with his animal also just add
"--skip-steps=pg_verify_checksums-check" to the command line.

If you want to throw what you have for 3) over to wall to me I can see
if I can finish it.

It occurred to me that a pretty simple fix could just be to blacklist
everything that didn't start with a digit. The whitelist approach is
probably preferable� ... depends how urgent we see this as.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#17)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Sat, Oct 13, 2018 at 05:53:00PM -0400, Andrew Dunstan wrote:

It occurred to me that a pretty simple fix could just be to blacklist
everything that didn't start with a digit. The whitelist approach is
probably preferable... depends how urgent we see this as.

Yeah, possibly, still that's not a correct long-term fix. So attached
is what I think we should do for HEAD and REL_11_STABLE with an approach
using a whitelist. I have added positive and negative tests on top of
the existing TAP tests, as suggested by Michael B, and I made the code
use relpath.h to make sure that we don't miss any fork types.

Any objections to this patch?
--
Michael

Attachments:

checksum-whitelist-v2.patchtext/x-diff; charset=us-asciiDownload+102-12
#19Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#18)
Re: pgsql: Add TAP tests for pg_verify_checksums

On 10/13/2018 09:59 PM, Michael Paquier wrote:

On Sat, Oct 13, 2018 at 05:53:00PM -0400, Andrew Dunstan wrote:

It occurred to me that a pretty simple fix could just be to blacklist
everything that didn't start with a digit. The whitelist approach is
probably preferable... depends how urgent we see this as.

Yeah, possibly, still that's not a correct long-term fix. So attached
is what I think we should do for HEAD and REL_11_STABLE with an approach
using a whitelist. I have added positive and negative tests on top of
the existing TAP tests, as suggested by Michael B, and I made the code
use relpath.h to make sure that we don't miss any fork types.

Any objections to this patch?

This code now seems redundant:

���� if (strcmp(fn, ".") == 0 ||
���� ��� strcmp(fn, "..") == 0)
���� ��� return true;

I would probably reverse the order of these two tests. It might not make
any difference, assuming fn is never an empty string, but it seems more
logical to me.

    +��� /* good to go if only digits */
    +��� if (fn[pos] == '\0')
    +��� ��� return false;
    +��� /* leave if no digits */
    +��� if (pos == 0)
    +��� ��� return true;

It also looks to me like the check for a segment number doesn't ensure there is at least one digit, so "123." might pass, but I could be wrong. In any case, there isn't a test for that, and there probably should be.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#19)
Re: pgsql: Add TAP tests for pg_verify_checksums

On Sun, Oct 14, 2018 at 10:24:55AM -0400, Andrew Dunstan wrote:

[snip]

Thanks a lot for the review, Andrew!

This code now seems redundant:

     if (strcmp(fn, ".") == 0 ||
         strcmp(fn, "..") == 0)
         return true;

Indeed. I have renamed skipfile() to isRelFileName on the way, which
looks better in the context.

I would probably reverse the order of these two tests. It might not make any
difference, assuming fn is never an empty string, but it seems more logical
to me.

+    /* good to go if only digits */
+    if (fn[pos] == '\0')
+        return false;
+    /* leave if no digits */
+    if (pos == 0)
+        return true;

No objections to that. Done.

It also looks to me like the check for a segment number doesn't ensure
there is at least one digit, so "123." might pass, but I could be
wrong. In any case, there isn't a test for that, and there probably
should be.

You are right here. This name pattern has been failing. Fixed in the
attached. I have added "123_" and "123_." as extra patterns to check.

What do you think about the updated version attached?
--
Michael

Attachments:

checksum-whitelist-v3.patchtext/x-diff; charset=us-asciiDownload+116-18
#21Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#20)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#22)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#23)
#25Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#25)
#29Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#28)
#30Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#29)
#31Michael Banck
michael.banck@credativ.de
In reply to: Stephen Frost (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Banck (#31)
#33Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#33)
#35Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#30)
#37Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#37)
#39J Chapman Flack
jflack@math.purdue.edu
In reply to: Andres Freund (#36)
#40Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#38)
#41Andres Freund
andres@anarazel.de
In reply to: J Chapman Flack (#39)
#42Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#40)
#43Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#42)
#44Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#43)
#45Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#44)
#46Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#45)
#47Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#46)
#48Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#47)
#49Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#48)
#50Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#49)
#51Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#50)
#52Andrew Dunstan
andrew@dunslane.net
In reply to: Stephen Frost (#49)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#52)
#54Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#53)
#55Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#54)
#56Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#54)
#57Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#55)
#58Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#57)
#59Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#57)
#60Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#56)
#61Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#59)
#62Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#60)
#63Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#59)
#64Stephen Frost
sfrost@snowman.net
In reply to: Michael Paquier (#63)
#65Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#64)
#66Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#56)
#67Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#66)
#68Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#65)
#69Michael Paquier
michael@paquier.xyz
In reply to: Stephen Frost (#68)
#70Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#18)
#71Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#70)
#72Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#29)
#73Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#72)
#74Michael Banck
michael.banck@credativ.de
In reply to: Michael Paquier (#73)
#75Michael Paquier
michael@paquier.xyz
In reply to: Michael Banck (#74)
#76Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#75)