pg_verify_checksums and -fno-strict-aliasing
Hi,
I noticed that pg_verify_checksums computes bogus checksums if I compile
it with '-O2 -Wall' but without -fno-strict-aliasing. Also I am getting
a compile warning then:
|src/bin/pg_verify_checksums$ make CFLAGS='-g -O2 -Wall'
[...]
|gcc -g -O2 -Wall -I../../../src/include
| -I/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/include
| -D_GNU_SOURCE -c -o pg_verify_checksums.o
| /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c
| -MMD -MP -MF .deps/pg_verify_checksums.Po
|/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c:
| In function ‘scan_file’:
|/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums/pg_verify_checksums.c:112:3:
| warning: dereferencing type-punned pointer will break strict-aliasing
| rules [-Wstrict-aliasing]
| if (PageIsNew(buf))
| ^~
[...]
|src/bin/pg_verify_checksums$ ./pg_verify_checksums -D ../../test/regress/tmp_check/data
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/global/1233", block 0: calculated checksum FC8A but expected F857
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/global/1233", block 1: calculated checksum F340 but expected A68D
[...]
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block 5: calculated checksum 954D but expected D2ED
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block 6: calculated checksum 361 but expected E7F7
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block 7: calculated checksum 4C0D but expected 4B10
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block 8: calculated checksum 2B9A but expected 71E3
|pg_verify_checksums: checksum verification failed in file "../../test/regress/tmp_check/data/base/12368/2659", block 9: calculated checksum 19CD but expected 541C
|Checksum scan completed
|Data checksum version: 1
|Files scanned: 932
|Blocks scanned: 2789
|Bad checksums: 2789
If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
I noticed that pg_verify_checksums computes bogus checksums if I compile
it with '-O2 -Wall' but without -fno-strict-aliasing. Also I am getting
a compile warning then:[...]
If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.
It looks like the code is using aliasing where the standard says it should
not, which breaks compiler optimization, and the added options tells the
compiler to not assume that the code conforms to the standard...
As PostgreSQL source is expected to conform to some C standard (unsure
which one right now, possibly c89 but maybe it is beginning to switch to
c99, a young 19 years old standard), I'd suggest that the right fix is
rather to actually remove the aliasing issue.
--
Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes:
If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.
It looks like the code is using aliasing where the standard says it should
not, which breaks compiler optimization, and the added options tells the
compiler to not assume that the code conforms to the standard...
Actually, this code is just plain broken:
char buf[BLCKSZ];
PageHeader header = (PageHeader) buf;
There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.
I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant. The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.
(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
regards, tom lane
On 2018-Aug-30, Fabien COELHO wrote:
As PostgreSQL source is expected to conform to some C standard (unsure which
one right now, possibly c89 but maybe it is beginning to switch to c99, a
young 19 years old standard), I'd suggest that the right fix is rather to
actually remove the aliasing issue.
Yeah, type aliasing is quite a rampant issue in our code and I don't
think there's an easy fix.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 30, 2018 at 10:39:26AM -0400, Tom Lane wrote:
I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant. The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
pg_upgrade/file.c is careful about that (5afcd2a), and has a comment on
the matter, as does pg_standby.c.
Now, grepping around for "BLCKSZ]", some garbage may have accumulated?
- entrySplitPage in ginentrypage.c
- rewind_copy_file_range in file.c
- _hash_alloc_buckets in hashpage.c
- pg_prewarm.c
- blinsert.c
- pg_waldump.c
- logtape.c
--
Michael
On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.It looks like the code is using aliasing where the standard says it
should
not, which breaks compiler optimization, and the added options tells the
compiler to not assume that the code conforms to the standard...Actually, this code is just plain broken:
char buf[BLCKSZ];
PageHeader header = (PageHeader) buf;There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant. The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.
I know I certainly didn't test it on non-intel.
We did have that in the online verify checksum patch, but it came out as
part of the revert of that part.
Given that we also recently found bugs in the actual hash index
implementation that would've gotten caught by this, perhaps we should add a
TAP test for this.
Should we make it a separate test in pg_verify_checksums, or should we
piggyback on the pg_basebackup tests (which AFAICT is the only ones that
create a cluster with checksums enabled at all, and thus is the only
codepath that actually uses the backend checksum code at all, which I think
is an even worse thing to have no tests of)
(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
So if I get you right, you're saying the attached patch should be all
that's needed?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
verify_checksums_buf.patchtext/x-patch; charset=US-ASCII; name=verify_checksums_buf.patchDownload+3-1
On Thu, Aug 30, 2018 at 9:35 PM, Magnus Hagander <magnus@hagander.net>
wrote:
On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
If I add -fno-strict-aliasing to $CFLAGS, the problem goes away.
Is this something to worry about, or just pilot error cause I am not
using the same $CFLAGS as for the rest of the build? I originally
noticed this problem with my external fork of pg_verify_checksums.It looks like the code is using aliasing where the standard says it
should
not, which breaks compiler optimization, and the added options tells
the
compiler to not assume that the code conforms to the standard...
Actually, this code is just plain broken:
char buf[BLCKSZ];
PageHeader header = (PageHeader) buf;There is no guarantee that a local char[] array is aligned on anything
stronger than a byte boundary, and if it isn't, word-sized accesses to
*header are going to fail on machines with strict alignment rules.
I suppose that that is really what Michael's compiler is moaning about.I rather suspect that this hasn't been tested on anything but Intel
hardware, which is famously misalignment-tolerant. The lack of any
apparent regression test infrastructure for it isn't leaving a warm
feeling about how much the buildfarm is testing it.I know I certainly didn't test it on non-intel.
We did have that in the online verify checksum patch, but it came out as
part of the revert of that part.Given that we also recently found bugs in the actual hash index
implementation that would've gotten caught by this, perhaps we should add a
TAP test for this.Should we make it a separate test in pg_verify_checksums, or should we
piggyback on the pg_basebackup tests (which AFAICT is the only ones that
create a cluster with checksums enabled at all, and thus is the only
codepath that actually uses the backend checksum code at all, which I think
is an even worse thing to have no tests of)
PFA some *very* basic tests for pg_verify_checksums, which should at least
be enough to catch the kind of errors we had now in the tool itself.
Does not at this point try to solve the fact that the backend checksum code
is not tested.
(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
So if I get you right, you're saying the attached patch should be all
that's needed?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Attachments:
verify_checksum_tests.patchtext/x-patch; charset=US-ASCII; name=verify_checksum_tests.patchDownload+39-0
On Thu, Aug 30, 2018 at 09:35:33PM +0200, Magnus Hagander wrote:
Should we make it a separate test in pg_verify_checksums, or should we
piggyback on the pg_basebackup tests (which AFAICT is the only ones that
create a cluster with checksums enabled at all, and thus is the only
codepath that actually uses the backend checksum code at all, which I think
is an even worse thing to have no tests of)
This should be a separate suite. And FWIW, we only use pg_regress to
make sure that an already-initialized data folder has the correct,
secure authentication set. So creating a node with checksums enabled is
just that:
$node->init(extra => ['--data-checksums']);
[... 20 minutes later ...]
Attached is a basic test suite ;)
--
Michael
Attachments:
verify-checksums-tap.patchtext/x-diff; charset=us-asciiDownload+58-0
On Thu, Aug 30, 2018 at 10:02 PM, Michael Paquier <michael@paquier.xyz>
wrote:
On Thu, Aug 30, 2018 at 09:35:33PM +0200, Magnus Hagander wrote:
Should we make it a separate test in pg_verify_checksums, or should we
piggyback on the pg_basebackup tests (which AFAICT is the only ones that
create a cluster with checksums enabled at all, and thus is the only
codepath that actually uses the backend checksum code at all, which Ithink
is an even worse thing to have no tests of)
This should be a separate suite. And FWIW, we only use pg_regress to
make sure that an already-initialized data folder has the correct,
secure authentication set. So creating a node with checksums enabled is
just that:
$node->init(extra => ['--data-checksums']);[... 20 minutes later ...]
Attached is a basic test suite ;)
Haha, nice timing :)
I wonder if your tests that pg_control has picked things up belong more in
the tests of initdb itself?
Do you think there is value in testing against a non-checksum cluster? I
guess there's some point to it. I think testing actual corruption (like my
version of the tests) is more valuable, but perhaps we should just do both?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
On 2018-08-30 10:39:26 -0400, Tom Lane wrote:
char buf[BLCKSZ];
PageHeader header = (PageHeader) buf;
(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
Or alternatively, for places where such allocations could be a problem
for performance, using a union can be used to force the required
alignment. I'm fairly certain that just allocating is more than OK
here, to be clear.
Greetings,
Andres Freund
Hi,
Am Donnerstag, den 30.08.2018, 22:02 +0200 schrieb Magnus Hagander:
PFA some *very* basic tests for pg_verify_checksums, which should at
least be enough to catch the kind of errors we had now in the tool
itself.
I proposed something similar for pg_basebackup back then and IIRC Peter
(rightfully) complained that using a system catalog for that wasn't so
great - also, are you sure the relfilenode will always be stable? Then
again, this is obviously a throw-away data directory so it should be no
general issue, just a matter of taste.
https://github.com/credativ/pg_checksums/blob/master/t/001_checksums.pl
has some similar tests at the end but creates a table for corruption,
adds some safeguards for block size and also uses command_checks_all()
to parse the output. It's PGDG licensed so you're welcome to take some
or all of that.
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Hi,
Am Donnerstag, den 30.08.2018, 21:35 +0200 schrieb Magnus Hagander:
So if I get you right, you're saying the attached patch should be all
that's needed?
I tried to do some similar changes but neither what you proposed nor
what I came up with actually fixes the checksum failures, though they
silence the warning at -Wall level.
Could well be I'm doing something wrong, so it would be cool if somebody
could reproduce this first. In principle, it should be enough to run
'make clean && make CLFAGS=-O2' in the src/bin/pg_verify_checksums
subdirectory in order to get a broken executable.
Michael
--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael.banck@credativ.de
credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Michael Banck <michael.banck@credativ.de> writes:
Could well be I'm doing something wrong, so it would be cool if somebody
could reproduce this first. In principle, it should be enough to run
'make clean && make CLFAGS=-O2' in the src/bin/pg_verify_checksums
subdirectory in order to get a broken executable.
I can reproduce it on my Fedora 28 machine (gcc 8.1.1) by compiling
pg_verify_checksums with all the normal flags except -fno-strict-aliasing.
I don't see any warning, not even with -Wstrict-aliasing=2 :-(, but the
generated code malfunctions as Michael describes. As best I can tell,
the problem is in the inlined code from checksum_impl.h rather than
in pg_verify_checksums.c itself. I think what is happening is gcc
is concluding that it can optimize away the code that temporarily
sets the page's checksum field to zero.
Although, as Alvaro mentioned upthread, there's basically no hope of
getting away from -fno-strict-aliasing for Postgres-at-large, I think it'd
be a good thing if we could get checksum_impl.h to not depend on it. The
whole point of that header, according to its own self-description, is to
export the checksum calculation for use by external programs --- and we
can't really control what compiler flags will be used by those. The fact
that Michael even ran into this problem seems to be an instance of that.
So, I've been fooling around trying to get it to work without
-fno-strict-aliasing, but with little luck so far.
regards, tom lane
Hi,
On 2018-08-30 17:19:28 -0400, Tom Lane wrote:
So, I've been fooling around trying to get it to work without
-fno-strict-aliasing, but with little luck so far.
The problem presumably is that pg_checksum_block() accesses the relevant
fields as an uint32, whereas pg_checksum_page() accesses it as a
PageHeader. That's an aliasing violation. *One* cast from char* to
either type is fine, it's accessing under both those types that's
problematic.
One way to fix it would be to memcpy in/out the modified PageHeader, or
just do offset math and memcpy to that offset.
Greetings,
Andres Freund
Magnus Hagander <magnus@hagander.net> writes:
On Thu, Aug 30, 2018 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
(The right fix, of course, is to malloc the work buffer rather than
put it on the stack.)
So if I get you right, you're saying the attached patch should be all
that's needed?
Well, that's some of what's needed. I notice this code is also being
sloppy about whether block numbers are signed or unsigned, which means
it'd probably misbehave on relations exceeding 2^31 blocks. I have
a patch in progress to clean all that up, though I'm still working
on the strict-aliasing part.
regards, tom lane
On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
Hi,
On 2018-08-30 17:19:28 -0400, Tom Lane wrote:
So, I've been fooling around trying to get it to work without
-fno-strict-aliasing, but with little luck so far.The problem presumably is that pg_checksum_block() accesses the relevant
fields as an uint32, whereas pg_checksum_page() accesses it as a
PageHeader. That's an aliasing violation. *One* cast from char* to
either type is fine, it's accessing under both those types that's
problematic.One way to fix it would be to memcpy in/out the modified PageHeader, or
just do offset math and memcpy to that offset.
It took me a bit to reproduce the issue (due to sheer stupidity on my
part: no, changing the flags passed to gcc to link pg_verify_checksums
doesn't do the trick), but the above indeed fixes the issue for me.
The attached is just for demonstration that the approach works.
Greetings,
Andres Freund
Attachments:
checksum.difftext/x-diff; charset=us-asciiDownload+14-6
Andres Freund <andres@anarazel.de> writes:
On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
One way to fix it would be to memcpy in/out the modified PageHeader, or
just do offset math and memcpy to that offset.
It took me a bit to reproduce the issue (due to sheer stupidity on my
part: no, changing the flags passed to gcc to link pg_verify_checksums
doesn't do the trick), but the above indeed fixes the issue for me.
I suspect people will complain about the added cost of doing that.
I've been AFK all afternoon, but what I was intending to try next was
the union approach, specifically union'ing PageHeaderData with the uint32
array representation needed by pg_checksum_block(). That might also
end up giving us code less unreadable than this:
uint32 (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
BTW, not to mention the elephant in the room, but: is it *really* OK
that pg_checksum_page scribbles on the page buffer, even temporarily?
It's certainly quite horrid that there aren't large warnings about
that in the function's API comment.
regards, tom lane
On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-08-30 14:46:06 -0700, Andres Freund wrote:
One way to fix it would be to memcpy in/out the modified PageHeader, or
just do offset math and memcpy to that offset.It took me a bit to reproduce the issue (due to sheer stupidity on my
part: no, changing the flags passed to gcc to link pg_verify_checksums
doesn't do the trick), but the above indeed fixes the issue for me.I suspect people will complain about the added cost of doing that.
I think the compiler will just optimize it away. But if we're concerned
we could write it as
memcpy(&save_checksum, page + offsetof(PageHeaderData, pd_checksum), sizeof(save_checksum));
memset(page + offsetof(PageHeaderData, pd_checksum), 0, sizeof(save_checksum));
checksum = pg_checksum_block(page, BLCKSZ);
memcpy(page + offsetof(PageHeaderData, pd_checksum), &save_checksum, sizeof(save_checksum));
works, but still not exceedingly pretty :/. The code generated looks
reasonable:
194 memcpy(&save_checksum, page + offsetof(PageHeaderData, pd_checksum), sizeof(save_checksum));
0x00000000000035d0 <+0>: push %r12
0x00000000000035d2 <+2>: xor %eax,%eax
0x00000000000035d4 <+4>: movzwl 0x8(%rdi),%r12d
195 memset(page + offsetof(PageHeaderData, pd_checksum), 0, sizeof(save_checksum));
0x00000000000035d9 <+9>: push %rbp
0x00000000000035da <+10>: mov %ax,0x8(%rdi)
(the pushes are just interspersed stuff, yay latency aware instruction
scheduling)
I've been AFK all afternoon, but what I was intending to try next was
the union approach, specifically union'ing PageHeaderData with the uint32
array representation needed by pg_checksum_block(). That might also
end up giving us code less unreadable than this:uint32 (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
Hm.
BTW, not to mention the elephant in the room, but: is it *really* OK
that pg_checksum_page scribbles on the page buffer, even temporarily?
It's certainly quite horrid that there aren't large warnings about
that in the function's API comment.
It certainly should be warned about. Practically I don't think it's a
problem, because we pretty much always operate on a copy of the page
when writing out, as otherwise concurrently set hint bits would be
troublesome.
Greetings,
Andres Freund
On Thu, Aug 30, 2018 at 10:07:38PM +0200, Magnus Hagander wrote:
I wonder if your tests that pg_control has picked things up belong more in
the tests of initdb itself?
For the case where checksums are disabled, moving there the check on
control data makes sense.
Do you think there is value in testing against a non-checksum cluster? I
guess there's some point to it. I think testing actual corruption (like my
version of the tests) is more valuable, but perhaps we should just do both?
Yeah, let's do stuff on a single cluster which has them only enabled,
as initializing a node is one of the most costly operations in TAP
tests. Checking that the server is stopped is definitely a must in my
opinion, and your addition about emulating corrupted blocks is a good
idea. I would personally vote for keeping a control file check within
the tests of pg_verify_checksums as that's cheap.
--
Michael
Andres Freund <andres@anarazel.de> writes:
On 2018-08-30 18:11:40 -0400, Tom Lane wrote:
I suspect people will complain about the added cost of doing that.
I think the compiler will just optimize it away.
Maybe. In any case, the attached version avoids any need for memcpy
and is, I think, cleaner code anyhow. It fixes the problem for me
with Fedora's gcc, though I'm not sure that it'd be enough to get rid
of the warning Michael sees (I wonder what compiler he's using).
BTW, not to mention the elephant in the room, but: is it *really* OK
that pg_checksum_page scribbles on the page buffer, even temporarily?
It's certainly quite horrid that there aren't large warnings about
that in the function's API comment.
It certainly should be warned about. Practically I don't think it's a
problem, because we pretty much always operate on a copy of the page
when writing out, as otherwise concurrently set hint bits would be
troublesome.
The write end of things is not a problem IMO, since presumably the caller
would be about to overwrite the checksum field anyway. The issue is for
reading and/or verifying, where it's much less obvious that clobbering
the page image is safe.
regards, tom lane