"inconsistent page found" with checksum and wal_consistency_checking enabled
Currently, page checksum is not masked by Page masking routines used
by wal_consistency_checking facility. So, when running `make installcheck`
with data checksum enabled and wal_consistency_checking='all', it easily
and consistently FATALs with "inconsistent page found".
If anything needs to be masked on Page to perform / pass wal consistency
checking, definitely checksum is not going to match and hence must be
masked as well. Attaching patch to fix the same, installcheck passes with
checksums enabled and wal_consistency_checking='all' with the fix.
Clubbed to perform the masking with lsn as it sounds logical to have them
together, as lsn is masked is all the cases so far and such is needed for
checksum as well.
Thank You,
Ashwin Agrawal
Attachments:
mask_checksum.difftext/plain; charset=US-ASCII; name=mask_checksum.diffDownload
diff --git a/src/backend/access/brin/brin_xlog.c b/src/backend/access/brin/brin_xlog.c
index dff7198..60daa54 100644
--- a/src/backend/access/brin/brin_xlog.c
+++ b/src/backend/access/brin/brin_xlog.c
@@ -332,7 +332,7 @@ brin_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 10253d3..d880aef 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -23,15 +23,17 @@
* mask_page_lsn
*
* In consistency checks, the LSN of the two pages compared will likely be
- * different because of concurrent operations when the WAL is generated
- * and the state of the page when WAL is applied.
+ * different because of concurrent operations when the WAL is generated and
+ * the state of the page when WAL is applied. Also, mask out checksum as
+ * masking anything else on page means checksum is not going to match as well.
*/
void
-mask_page_lsn(Page page)
+mask_page_lsn_and_checksum(Page page)
{
PageHeader phdr = (PageHeader) page;
PageXLogRecPtrSet(phdr->pd_lsn, (uint64) MASK_MARKER);
+ phdr->pd_checksum = MASK_MARKER;
}
/*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e3..92cafe9 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -770,7 +770,7 @@ gin_mask(char *pagedata, BlockNumber blkno)
Page page = (Page) pagedata;
GinPageOpaque opaque;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
opaque = GinPageGetOpaque(page);
mask_page_hint_bits(page);
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 4f4fe8f..7fd91ce 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -352,14 +352,14 @@ gist_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
mask_unused_space(page);
/*
* NSN is nothing but a special purpose LSN. Hence, mask it for the same
- * reason as mask_page_lsn.
+ * reason as mask_page_lsn_and_checksum.
*/
GistPageSetNSN(page, (uint64) MASK_MARKER);
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index 67a856c..f19f6fd 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -1263,7 +1263,7 @@ hash_mask(char *pagedata, BlockNumber blkno)
HashPageOpaque opaque;
int pagetype;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
mask_unused_space(page);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d20f038..d03f544 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -9166,7 +9166,7 @@ heap_mask(char *pagedata, BlockNumber blkno)
Page page = (Page) pagedata;
OffsetNumber off;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
mask_unused_space(page);
diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c
index 4afdf47..82337f8 100644
--- a/src/backend/access/nbtree/nbtxlog.c
+++ b/src/backend/access/nbtree/nbtxlog.c
@@ -1034,7 +1034,7 @@ btree_mask(char *pagedata, BlockNumber blkno)
Page page = (Page) pagedata;
BTPageOpaque maskopaq;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
mask_unused_space(page);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index c440d21..87def79 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -1034,7 +1034,7 @@ spg_mask(char *pagedata, BlockNumber blkno)
{
Page page = (Page) pagedata;
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_page_hint_bits(page);
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index fbc6810..3adbf7b 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -541,7 +541,7 @@ generic_redo(XLogReaderState *record)
void
generic_mask(char *page, BlockNumber blkno)
{
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_unused_space(page);
}
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 6293712..5c2ce78 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1941,7 +1941,7 @@ ResetSequenceCaches(void)
void
seq_mask(char *page, BlockNumber blkno)
{
- mask_page_lsn(page);
+ mask_page_lsn_and_checksum(page);
mask_unused_space(page);
}
diff --git a/src/include/access/bufmask.h b/src/include/access/bufmask.h
index 95c6c3a..6a24c94 100644
--- a/src/include/access/bufmask.h
+++ b/src/include/access/bufmask.h
@@ -23,7 +23,7 @@
/* Marker used to mask pages consistently */
#define MASK_MARKER 0
-extern void mask_page_lsn(Page page);
+extern void mask_page_lsn_and_checksum(Page page);
extern void mask_page_hint_bits(Page page);
extern void mask_unused_space(Page page);
extern void mask_lp_flags(Page page);
On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal@pivotal.io> wrote:
Currently, page checksum is not masked by Page masking routines used by
wal_consistency_checking facility. So, when running `make installcheck` with
data checksum enabled and wal_consistency_checking='all', it easily and
consistently FATALs with "inconsistent page found".
Indeed. This had better be fixed before PG10 is out. I am adding an open item.
If anything needs to be masked on Page to perform / pass wal consistency
checking, definitely checksum is not going to match and hence must be masked
as well. Attaching patch to fix the same, installcheck passes with checksums
enabled and wal_consistency_checking='all' with the fix.Clubbed to perform the masking with lsn as it sounds logical to have them
together, as lsn is masked is all the cases so far and such is needed for
checksum as well.
Agreed.
* In consistency checks, the LSN of the two pages compared will likely be
- * different because of concurrent operations when the WAL is generated
- * and the state of the page when WAL is applied.
+ * different because of concurrent operations when the WAL is generated and
+ * the state of the page when WAL is applied. Also, mask out checksum as
+ * masking anything else on page means checksum is not going to match as well.
*/
Nit: Using "the LSN and the checksum" instead of the "the LSN".
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 10:22 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal@pivotal.io> wrote:
Currently, page checksum is not masked by Page masking routines used by
wal_consistency_checking facility. So, when running `make installcheck` with
data checksum enabled and wal_consistency_checking='all', it easily and
consistently FATALs with "inconsistent page found".Indeed. This had better be fixed before PG10 is out. I am adding an open item.
Oops and surprised! How come we missed that previously. If page lsns
are different, checksums will be different as well. Anyhow, nice catch
and thanks for the patch.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 2:26 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
On Wed, Sep 20, 2017 at 10:22 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal@pivotal.io> wrote:
Currently, page checksum is not masked by Page masking routines used by
wal_consistency_checking facility. So, when running `make installcheck` with
data checksum enabled and wal_consistency_checking='all', it easily and
consistently FATALs with "inconsistent page found".Indeed. This had better be fixed before PG10 is out. I am adding an open item.
Oops and surprised! How come we missed that previously. If page lsns
are different, checksums will be different as well. Anyhow, nice catch
and thanks for the patch.
That happens. We have really covered maaany points during many rounds
of reviews, still I am not surprised to see one or two things that
fell into the cracks.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 11:05 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Sep 20, 2017 at 2:26 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:On Wed, Sep 20, 2017 at 10:22 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal@pivotal.io> wrote:
Currently, page checksum is not masked by Page masking routines used by
wal_consistency_checking facility. So, when running `make installcheck` with
data checksum enabled and wal_consistency_checking='all', it easily and
consistently FATALs with "inconsistent page found".Indeed. This had better be fixed before PG10 is out. I am adding an open item.
Oops and surprised! How come we missed that previously. If page lsns
are different, checksums will be different as well. Anyhow, nice catch
and thanks for the patch.That happens. We have really covered maaany points during many rounds
of reviews, still I am not surprised to see one or two things that
fell into the cracks.
Yup, that's true. :-)
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 01:52:15PM +0900, Michael Paquier wrote:
On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal <aagrawal@pivotal.io> wrote:
Currently, page checksum is not masked by Page masking routines used by
wal_consistency_checking facility. So, when running `make installcheck` with
data checksum enabled and wal_consistency_checking='all', it easily and
consistently FATALs with "inconsistent page found".Indeed. This had better be fixed before PG10 is out. I am adding an open item.
[Action required within three days. This is a generic notification.]
The above-described topic is currently a PostgreSQL 10 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20170404140717.GA2675809@tornado.leadboat.com and send a status update within three calendar days of
this message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10. Consequently, I will appreciate your efforts
toward speedy resolution. Thanks.
[1]: /messages/by-id/20170404140717.GA2675809@tornado.leadboat.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 12:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Indeed. This had better be fixed before PG10 is out. I am adding an open item.
This seems a little hyperbolic to me. Sure, it's a new bug in v10,
and sure, it's always better to fix bugs sooner rather than later, but
there's nothing particularly serious or urgent about this bug as
compared to any other one.
I've committed the proposed patch to master and REL_10_STABLE.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers