pg_verify_checksums failure with hash indexes

amit.kapila16@gmail.com

over 7 years ago

In reply to: Michael Paquier (#3)

Re: pg_verify_checksums failure with hash indexes

On Tue, Aug 28, 2018 at 6:43 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 28, 2018 at 11:21:34AM +0200, Peter Eisentraut wrote:

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

The hash index code has been largely refactored in v10, so I would
imagine that you can see the problem as well there.

[... digging digging ...]

And indeed I can see the problem in 10 as well with my own pg_checksums,
and I can see hash_f8_index with a problem on top of what Peter has
reported.

Amit?

I will look into it tomorrow, hope that's okay.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Bernd Helmle

mailings@oopsware.de

over 7 years ago

In reply to: Peter Eisentraut (#1)

Re: pg_verify_checksums failure with hash indexes

Am Dienstag, den 28.08.2018, 11:21 +0200 schrieb Peter Eisentraut:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

I tried to reproduce this and by accident i had a blocksize=4 in my
configure script, and i got immediately failed installcheck results.
They seem hash index related and can easily be reproduced:

SHOW block_size ;
block_size
────────────
4096

CREATE TABLE foo(val text);
INSERT INTO foo VALUES('bernd');

CREATE INDEX ON foo USING hash(val);
ERROR: index "foo_val_idx" contains corrupted page at block 0
HINT: Please REINDEX it.

I have no idea wether this could be related, but i thought it won't
harm to share this here.

Bernd

amit.kapila16@gmail.com

over 7 years ago

In reply to: Peter Eisentraut (#1)

1 attachment(s)

Re: pg_verify_checksums failure with hash indexes

On Tue, Aug 28, 2018 at 2:51 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

I have looked into this problem and found the cause of it. This
problem is happening for the empty page in the hash index. On a
split, we allocate a new splitpoint's worth of bucket pages wherein we
initialize the last page with zero's, this is all fine, but we forgot
to set the checksum for that last page. Attached patch fixes the
problem for me.

Can someone try and share their findings?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

setchecksum_empty_pages_v1.patchapplication/octet-stream; name=setchecksum_empty_pages_v1.patchDownload

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 3ec29a5356..b97d7e73cd 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1038,6 +1038,7 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 					true);
 
 	RelationOpenSmgr(rel);
+	PageSetChecksumInplace(page, lastblock);
 	smgrextend(rel->rd_smgr, MAIN_FORKNUM, lastblock, zerobuf, false);
 
 	return true;

nagata@sraoss.co.jp

over 7 years ago

In reply to: Michael Banck (#2)

Re: pg_verify_checksums failure with hash indexes

On Tue, 28 Aug 2018 15:02:56 +0200
Michael Banck <michael.banck@credativ.de> wrote:

Hi,

On Tue, Aug 28, 2018 at 11:21:34AM +0200, Peter Eisentraut wrote:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

pg_verify_checksums data

pg_verify_checksums: checksum verification failed in file
"data/base/16384/28647", block 65: calculated checksum DC70 but expected 0
pg_verify_checksums: checksum verification failed in file
"data/base/16384/28649", block 65: calculated checksum 89D8 but expected 0
pg_verify_checksums: checksum verification failed in file
"data/base/16384/28648", block 65: calculated checksum 9636 but expected 0
Checksum scan completed
Data checksum version: 1
Files scanned: 2493
Blocks scanned: 13172
Bad checksums: 3

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

Discuss. ;-)

I took a look at hash_name_index, assuming the others are similar.

Page 65 is the last page, pageinspect barfs on it as well:

regression=# SELECT get_raw_page('hash_name_index', 'main', 65);
WARNING: page verification failed, calculated checksum 18066 but expected 0
ERROR: invalid page in block 65 of relation base/16384/28638

The pages before that one from page 35 on are empty:

regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 1));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/422D890 | 8807 | 0 | 664 | 5616 | 8176 | 8192 | 4 | 0
(1 Zeile)
[...]
regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 34));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/422C690 | 18153 | 0 | 580 | 5952 | 8176 | 8192 | 4 | 0
(1 Zeile)
regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 35));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----+----------+-------+-------+-------+---------+----------+---------+-----------
0/0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
[...]
regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 64));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----+----------+-------+-------+-------+---------+----------+---------+-----------
0/0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
regression=# SELECT * FROM page_header(get_raw_page('hash_name_index', 'main', 65));
WARNING: page verification failed, calculated checksum 18066 but expected 0
ERROR: invalid page in block 65 of relation base/16384/28638

Running pg_filedump on the last two pages results in (not sure the
"Invalid header information." are legit; neither about the checksum
failure on block 64):

mba@fock:~/[...]postgresql/build/src/test/regress$ ~/tmp/bin/pg_filedump -R 64 65 -k -f tmp_check/data/base/16384/28638

--8<--
*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 10.1
*
* File: tmp_check/data/base/16384/28638
* Options used: -R 64 65 -k -f
*
* Dump created on: Tue Aug 28 14:53:37 2018
*******************************************************************

Block 64 ********************************************************
<Header> -----
Block Offset: 0x00080000 Offsets: Lower 0 (0x0000)
Block: Size 0 Version 0 Upper 0 (0x0000)
LSN: logid 0 recoff 0x00000000 Special 0 (0x0000)
Items: 0 Free Space: 0
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 24

Error: Invalid header information.

Error: checksum failure: calculated 0xc66a.

0000: 00000000 00000000 00000000 00000000 ................
0010: 00000000 00000000 ........

<Data> ------
Empty block - no items listed

<Special Section> -----
Error: Invalid special section encountered.
Error: Special section points off page. Unable to dump contents.

Block 65 ********************************************************
<Header> -----
Block Offset: 0x00082000 Offsets: Lower 24 (0x0018)
Block: Size 8192 Version 4 Upper 8176 (0x1ff0)
LSN: logid 0 recoff 0x04229c20 Special 8176 (0x1ff0)
Items: 0 Free Space: 8152
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 24

Error: checksum failure: calculated 0x4692.

0000: 00000000 209c2204 00000000 1800f01f .... .".........
0010: f01f0420 00000000 ... ....

<Data> ------
Empty block - no items listed

<Special Section> -----
Hash Index Section:
Flags: 0x0000 ()
Bucket Number: 0xffffffff
Blocks: Previous (-1) Next (-1)

1ff0: ffffffff ffffffff ffffffff 000080ff ................

*** End of Requested Range Encountered. Last Block Read: 65 ***
--8<--

So it seems there is some data on the last page, which makes the zero
checksum bogus, but I don't know anything about hash indexes. Also maybe
those empty pages are not initialized correctly? Or maybe the "Invalid
special section encountered" error meand pg_filedump cannot handle hash
indexes completely.

I saw the same thing in the hash_i4_index case using pageinspect with
checksum disablbed. The last page (block 65) has some data in its header.

regression=# select * from page_header(get_raw_page('hash_i4_index',65));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/939FE48 | 0 | 0 | 24 | 8176 | 8176 | 8192 | 4 | 0
(1 row)

Looking at the code to check the checksum, each page is checked if this is a
new page by using PageIsNew(), and if so its checksum is not checked because
new pages are assumed to have no checksum. PageIsNew() determines if a
page is new or not from pd_upper. For some reason, the last page has pd_upper
but no checksum, so the checksum verification fails.

It is not clear for me why the last page has a head information, but, after
some code investigation, I think it happend in _hash_alloc_buckets(). When
expanding a hash table, smgrextend() add some blocks to a file. At that time,
it seems that a page that has a header infomation is written to the end of
the file (in mdextend()).

I'm not sure how to fix this for now, but it might be worth to share my
analysis for this issue.

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

nagata@sraoss.co.jp

over 7 years ago

In reply to: Amit Kapila (#6)

Re: pg_verify_checksums failure with hash indexes

On Wed, 29 Aug 2018 14:39:10 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 28, 2018 at 2:51 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

..

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

I have looked into this problem and found the cause of it. This
problem is happening for the empty page in the hash index. On a
split, we allocate a new splitpoint's worth of bucket pages wherein we
initialize the last page with zero's, this is all fine, but we forgot
to set the checksum for that last page. Attached patch fixes the
problem for me.

Can someone try and share their findings?

I confirmed that this patch fixed the problem by setting a checksum in the last
page in hash indexes, and pg_veviry_checksum is done successfully.

regression=# select * from page_header(get_raw_page('hash_i4_index',65));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/41CACF0 | 18720 | 0 | 24 | 8176 | 8176 | 8192 | 4 | 0
(1 row)

By the way, I think we can fix this also by clearing the header information of the last
page instead of setting a checksum to the unused page although I am not sure which way
is better.

Regards,

--
Yugo Nagata <nagata@sraoss.co.jp>

dilipbalaut@gmail.com

over 7 years ago

In reply to: Bernd Helmle (#5)

Re: pg_verify_checksums failure with hash indexes

On Tue, Aug 28, 2018 at 8:33 PM, Bernd Helmle <mailings@oopsware.de> wrote:

Am Dienstag, den 28.08.2018, 11:21 +0200 schrieb Peter Eisentraut:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

I tried to reproduce this and by accident i had a blocksize=4 in my
configure script, and i got immediately failed installcheck results.
They seem hash index related and can easily be reproduced:

SHOW block_size ;
block_size
────────────
4096

CREATE TABLE foo(val text);
INSERT INTO foo VALUES('bernd');

CREATE INDEX ON foo USING hash(val);
ERROR: index "foo_val_idx" contains corrupted page at block 0
HINT: Please REINDEX it.

I have no idea wether this could be related, but i thought it won't
harm to share this here.

This issue seems different than the one got fixed in this thread. The
reason for this issue is that the size of the hashm_mapp in
HashMetaPageData is 4096, irrespective of the block size. So when the
block size is big enough (i.e. 8192) then there is no problem, but
when you set it to 4096, in that case, the hashm_mapp of the meta page
is overwriting the special space of the meta page. That's the reason
its showing corrupted page while checking the hash_page.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10

amit.kapila16@gmail.com

over 7 years ago

In reply to: Yugo Nagata (#8)

Re: pg_verify_checksums failure with hash indexes

On Wed, Aug 29, 2018 at 3:30 PM Yugo Nagata <nagata@sraoss.co.jp> wrote:

On Wed, 29 Aug 2018 14:39:10 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 28, 2018 at 2:51 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

..

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

I have looked into this problem and found the cause of it. This
problem is happening for the empty page in the hash index. On a
split, we allocate a new splitpoint's worth of bucket pages wherein we
initialize the last page with zero's, this is all fine, but we forgot
to set the checksum for that last page. Attached patch fixes the
problem for me.

Can someone try and share their findings?

I confirmed that this patch fixed the problem by setting a checksum in the last
page in hash indexes, and pg_veviry_checksum is done successfully.

Thanks.

regression=# select * from page_header(get_raw_page('hash_i4_index',65));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/41CACF0 | 18720 | 0 | 24 | 8176 | 8176 | 8192 | 4 | 0
(1 row)

By the way, I think we can fix this also by clearing the header information of the last
page instead of setting a checksum to the unused page although I am not sure which way
is better.

I think that can complicate the WAL logging of this operation which we
are able to deal easily with log_newpage and it sounds quite hacky.
The fix I have posted seems better, but I am open to suggestions.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#11

dilipbalaut@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#9)

Re: pg_verify_checksums failure with hash indexes

On Wed, Aug 29, 2018 at 3:39 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Aug 28, 2018 at 8:33 PM, Bernd Helmle <mailings@oopsware.de> wrote:

Am Dienstag, den 28.08.2018, 11:21 +0200 schrieb Peter Eisentraut:

This is reproducible with PG11 and PG12:

initdb -k data
postgres -D data

make installcheck
# shut down postgres with Ctrl-C

I tried to reproduce this and by accident i had a blocksize=4 in my
configure script, and i got immediately failed installcheck results.
They seem hash index related and can easily be reproduced:

SHOW block_size ;
block_size
────────────
4096

CREATE TABLE foo(val text);
INSERT INTO foo VALUES('bernd');

CREATE INDEX ON foo USING hash(val);
ERROR: index "foo_val_idx" contains corrupted page at block 0
HINT: Please REINDEX it.

I have no idea wether this could be related, but i thought it won't
harm to share this here.

This issue seems different than the one got fixed in this thread. The
reason for this issue is that the size of the hashm_mapp in
HashMetaPageData is 4096, irrespective of the block size. So when the
block size is big enough (i.e. 8192) then there is no problem, but
when you set it to 4096, in that case, the hashm_mapp of the meta page
is overwriting the special space of the meta page. That's the reason
its showing corrupted page while checking the hash_page.

Just to verify this I just hacked it like below and it worked. I
think we need a more thoughtful value for HASH_MAX_BITMAPS.

diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 543d802..9909f69 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -232,7 +232,7 @@ typedef HashScanOpaqueData *HashScanOpaque;
  * needing to fit into the metapage.  (With 8K block size, 1024 bitmaps
  * limit us to 256 GB of overflow space...)
  */
-#define HASH_MAX_BITMAPS                       1024
+#define HASH_MAX_BITMAPS                       Min(BLCKSZ / 8, 1024)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12

nagata@sraoss.co.jp

over 7 years ago

In reply to: Amit Kapila (#10)

Re: pg_verify_checksums failure with hash indexes

On Wed, 29 Aug 2018 16:01:53 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

By the way, I think we can fix this also by clearing the header information of the last
page instead of setting a checksum to the unused page although I am not sure which way
is better.

I think that can complicate the WAL logging of this operation which we
are able to deal easily with log_newpage and it sounds quite hacky.
The fix I have posted seems better, but I am open to suggestions.

Thank you for your explanation. I understood this way could make the
codes complicated, so I think the way you posted is better.

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

#13

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 7 years ago

In reply to: Yugo Nagata (#12)

Re: pg_verify_checksums failure with hash indexes

At Wed, 29 Aug 2018 20:10:15 +0900, Yugo Nagata <nagata@sraoss.co.jp> wrote in <20180829201015.d9d4fde2748910e86a13c0da@sraoss.co.jp>

On Wed, 29 Aug 2018 16:01:53 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

By the way, I think we can fix this also by clearing the header information of the last
page instead of setting a checksum to the unused page although I am not sure which way
is better.

I think that can complicate the WAL logging of this operation which we
are able to deal easily with log_newpage and it sounds quite hacky.
The fix I have posted seems better, but I am open to suggestions.

Thank you for your explanation. I understood this way could make the
codes complicated, so I think the way you posted is better.

FWIW, I confirmed that this is the only place where smgrextend
for non-zero pages is not preceded by checksum calculation.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#14

nagata@sraoss.co.jp

over 7 years ago

In reply to: Kyotaro HORIGUCHI (#13)

Re: pg_verify_checksums failure with hash indexes

On Thu, 30 Aug 2018 15:01:24 +0900 (Tokyo Standard Time)
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

At Wed, 29 Aug 2018 20:10:15 +0900, Yugo Nagata <nagata@sraoss.co.jp> wrote in <20180829201015.d9d4fde2748910e86a13c0da@sraoss.co.jp>

On Wed, 29 Aug 2018 16:01:53 +0530
Amit Kapila <amit.kapila16@gmail.com> wrote:

By the way, I think we can fix this also by clearing the header information of the last
page instead of setting a checksum to the unused page although I am not sure which way
is better.

I think that can complicate the WAL logging of this operation which we
are able to deal easily with log_newpage and it sounds quite hacky.
The fix I have posted seems better, but I am open to suggestions.

Thank you for your explanation. I understood this way could make the
codes complicated, so I think the way you posted is better.

FWIW, I confirmed that this is the only place where smgrextend
for non-zero pages is not preceded by checksum calculation.

I also confirmed this. I didn't know calling PageSetChecksumInplace
before smgrextend for non-zero pages was a typical coding pattern.
Thanks.

Regards,
--
Yugo Nagata <nagata@sraoss.co.jp>

#15

amit.kapila16@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#11)

Re: pg_verify_checksums failure with hash indexes

On Wed, Aug 29, 2018 at 4:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Wed, Aug 29, 2018 at 3:39 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

SHOW block_size ;
block_size
────────────
4096

CREATE TABLE foo(val text);
INSERT INTO foo VALUES('bernd');

CREATE INDEX ON foo USING hash(val);
ERROR: index "foo_val_idx" contains corrupted page at block 0
HINT: Please REINDEX it.

I have no idea wether this could be related, but i thought it won't
harm to share this here.

This issue seems different than the one got fixed in this thread. The
reason for this issue is that the size of the hashm_mapp in
HashMetaPageData is 4096, irrespective of the block size. So when the
block size is big enough (i.e. 8192) then there is no problem, but
when you set it to 4096, in that case, the hashm_mapp of the meta page
is overwriting the special space of the meta page. That's the reason
its showing corrupted page while checking the hash_page.

Your analysis appears correct to me.

Just to verify this I just hacked it like below and it worked. I
think we need a more thoughtful value for HASH_MAX_BITMAPS.
diff --git a/src/include/access/hash.h b/src/include/access/hash.h

-#define HASH_MAX_BITMAPS                       1024
+#define HASH_MAX_BITMAPS                       Min(BLCKSZ / 8, 1024)

We have previously changed this define in 620b49a1 with the intent to
allow many non-unique values in hash indexes without worrying to reach
the limit of the number of overflow pages. I think this didn't occur
to us that it won't work for smaller block sizes. As such, I don't
see any problem with the suggested fix. It will allow us the same
limit for the number of overflow pages at 8K block size and a smaller
limit at smaller block size. I am not sure if we can do any better
with the current design. As it will change the metapage, I think we
need to bump HASH_VERSION.

Robert, others, any thoughts?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#16

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Amit Kapila (#15)

Re: pg_verify_checksums failure with hash indexes

On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

We have previously changed this define in 620b49a1 with the intent to
allow many non-unique values in hash indexes without worrying to reach
the limit of the number of overflow pages. I think this didn't occur
to us that it won't work for smaller block sizes. As such, I don't
see any problem with the suggested fix. It will allow us the same
limit for the number of overflow pages at 8K block size and a smaller
limit at smaller block size. I am not sure if we can do any better
with the current design. As it will change the metapage, I think we
need to bump HASH_VERSION.

I wouldn't bother bumping HASH_VERSION. First, the fix needs to be
back-patched, and you certainly can't back-patch a HASH_VERSION bump.
Second, you should just pick a formula that gives the same answer as
now for the cases where the overrun doesn't occur, and some other
sufficiently-value for the cases where an overrun currently does
occur. If you do that, you're not changing the behavior in any case
that currently works, so there's really no reason for a version bump.
It just becomes a bug fix at that point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

dilipbalaut@gmail.com

over 7 years ago

In reply to: Robert Haas (#16)

Re: pg_verify_checksums failure with hash indexes

On Sat, Sep 1, 2018 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

We have previously changed this define in 620b49a1 with the intent to
allow many non-unique values in hash indexes without worrying to reach
the limit of the number of overflow pages. I think this didn't occur
to us that it won't work for smaller block sizes. As such, I don't
see any problem with the suggested fix. It will allow us the same
limit for the number of overflow pages at 8K block size and a smaller
limit at smaller block size. I am not sure if we can do any better
with the current design. As it will change the metapage, I think we
need to bump HASH_VERSION.

I wouldn't bother bumping HASH_VERSION. First, the fix needs to be
back-patched, and you certainly can't back-patch a HASH_VERSION bump.
Second, you should just pick a formula that gives the same answer as
now for the cases where the overrun doesn't occur, and some other
sufficiently-value for the cases where an overrun currently does
occur. If you do that, you're not changing the behavior in any case
that currently works, so there's really no reason for a version bump.
It just becomes a bug fix at that point.

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#18

amit.kapila16@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#17)

Re: pg_verify_checksums failure with hash indexes

On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Sep 1, 2018 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I wouldn't bother bumping HASH_VERSION. First, the fix needs to be
back-patched, and you certainly can't back-patch a HASH_VERSION bump.
Second, you should just pick a formula that gives the same answer as
now for the cases where the overrun doesn't occur, and some other
sufficiently-value for the cases where an overrun currently does
occur. If you do that, you're not changing the behavior in any case
that currently works, so there's really no reason for a version bump.
It just becomes a bug fix at that point.

makes sense.

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
formula which is what it was its value prior to the commit 620b49a1.
I think it will be better if you add a comment in your patch
indicating the importance/advantage of such a formula.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#19

amit.kapila16@gmail.com

over 7 years ago

In reply to: Michael Paquier (#3)

Re: pg_verify_checksums failure with hash indexes

On Tue, Aug 28, 2018 at 6:43 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Aug 28, 2018 at 11:21:34AM +0200, Peter Eisentraut wrote:

The files in question correspond to

hash_i4_index
hash_name_index
hash_txt_index

The hash index code has been largely refactored in v10, so I would
imagine that you can see the problem as well there.

[... digging digging ...]

And indeed I can see the problem in 10 as well with my own pg_checksums,
and I can see hash_f8_index with a problem on top of what Peter has
reported.

AFAICS, this problem exists in 9.6 and prior branches as well,
although, I can't test it. I am not sure whether we need to backpatch
this beyond 10 (where hash indexes are WAL logged) as prior to that
hash-indexes are anyway not-reliable. What's your opinion?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20

dilipbalaut@gmail.com

over 7 years ago

In reply to: Amit Kapila (#18)

1 attachment(s)

Re: pg_verify_checksums failure with hash indexes

On Mon, Sep 3, 2018 at 8:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Sat, Sep 1, 2018 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 30, 2018 at 7:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I wouldn't bother bumping HASH_VERSION. First, the fix needs to be
back-patched, and you certainly can't back-patch a HASH_VERSION bump.
Second, you should just pick a formula that gives the same answer as
now for the cases where the overrun doesn't occur, and some other
sufficiently-value for the cases where an overrun currently does
occur. If you do that, you're not changing the behavior in any case
that currently works, so there's really no reason for a version bump.
It just becomes a bug fix at that point.

makes sense.

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
formula which is what it was its value prior to the commit 620b49a1.
I think it will be better if you add a comment in your patch
indicating the importance/advantage of such a formula.

I have added the comments.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

hash_overflow_fix_v1.patchapplication/octet-stream; name=hash_overflow_fix_v1.patchDownload

diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 543d802..c3fc117 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -230,9 +230,15 @@ typedef HashScanOpaqueData *HashScanOpaque;
  *
  * There is no particular upper limit on the size of mapp[], other than
  * needing to fit into the metapage.  (With 8K block size, 1024 bitmaps
- * limit us to 256 GB of overflow space...)
+ * limit us to 256 GB of overflow space...).  For smaller block size we
+ * can not use 1024 bitmaps otherwise the mapp[] will cross the
+ * block size. However, it is better to use the BLCKSZ to determine the
+ * maximum number of bitmaps. For example with current formula, if BLCKSZ
+ * is 1K then there will be 128 bitmaps. This will make mapp[] size to
+ * 512 bytes, now including the space for page opaque and meta data
+ * header, the total size will be 968 bytes.
  */
-#define HASH_MAX_BITMAPS			1024
+#define HASH_MAX_BITMAPS			Max(BLCKSZ / 8, 1024)
 
 #define HASH_SPLITPOINT_PHASE_BITS	2
 #define HASH_SPLITPOINT_PHASES_PER_GRP	(1 << HASH_SPLITPOINT_PHASE_BITS)

#21

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Amit Kapila (#19)

Re: pg_verify_checksums failure with hash indexes

Amit Kapila <amit.kapila16@gmail.com> writes:

AFAICS, this problem exists in 9.6 and prior branches as well,
although, I can't test it. I am not sure whether we need to backpatch
this beyond 10 (where hash indexes are WAL logged) as prior to that
hash-indexes are anyway not-reliable. What's your opinion?

Presumably, any patch for pre-10 would look completely different
as the hash index code was quite different. I can't see that it's
worth the development time to do something there, especially if
you lack an easy way to test.

regards, tom lane

#22

amit.kapila16@gmail.com

over 7 years ago

In reply to: Tom Lane (#21)

Re: pg_verify_checksums failure with hash indexes

On Mon, Sep 3, 2018 at 7:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kapila <amit.kapila16@gmail.com> writes:

AFAICS, this problem exists in 9.6 and prior branches as well,
although, I can't test it. I am not sure whether we need to backpatch
this beyond 10 (where hash indexes are WAL logged) as prior to that
hash-indexes are anyway not-reliable. What's your opinion?

Presumably, any patch for pre-10 would look completely different
as the hash index code was quite different. I can't see that it's
worth the development time to do something there, especially if
you lack an easy way to test.

The fix might or might not be different, but lack of test is
definitely the reason for not pursuing it. I have pushed the fix and
back-patched it till 10.

Thanks for the input.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#23

amit.kapila16@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#20)

Re: pg_verify_checksums failure with hash indexes

On Mon, Sep 3, 2018 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Sep 3, 2018 at 8:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
formula which is what it was its value prior to the commit 620b49a1.
I think it will be better if you add a comment in your patch
indicating the importance/advantage of such a formula.

I have added the comments.

Thanks, I will look into it. Can you please do some pg_upgrade tests
to ensure that this doesn't impact the upgrade? You can create
hash-index and populate it with some data in version 10 and try
upgrading to 11 after applying this patch. You can also try it with
different block-sizes.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#24

dilipbalaut@gmail.com

over 7 years ago

In reply to: Amit Kapila (#23)

1 attachment(s)

Re: pg_verify_checksums failure with hash indexes

On Tue, Sep 4, 2018 at 10:14 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 3, 2018 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Sep 3, 2018 at 8:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
formula which is what it was its value prior to the commit 620b49a1.
I think it will be better if you add a comment in your patch
indicating the importance/advantage of such a formula.

I have added the comments.

In my previous patch mistakenly I put Max(BLCKSZ / 8, 1024) instead of
Min(BLCKSZ / 8, 1024). I have fixed the same.

Thanks, I will look into it. Can you please do some pg_upgrade tests
to ensure that this doesn't impact the upgrade? You can create
hash-index and populate it with some data in version 10 and try
upgrading to 11 after applying this patch. You can also try it with
different block-sizes.

Ok, I will do that.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

hash_overflow_fix_v2.patchapplication/octet-stream; name=hash_overflow_fix_v2.patchDownload

diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 543d802949..aa0ffb34ac 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -230,9 +230,15 @@ typedef HashScanOpaqueData *HashScanOpaque;
  *
  * There is no particular upper limit on the size of mapp[], other than
  * needing to fit into the metapage.  (With 8K block size, 1024 bitmaps
- * limit us to 256 GB of overflow space...)
+ * limit us to 256 GB of overflow space...).  For smaller block size we
+ * can not use 1024 bitmaps otherwise the mapp[] will cross the
+ * block size. However, it is better to use the BLCKSZ to determine the
+ * maximum number of bitmaps. For example with current formula, if BLCKSZ
+ * is 1K then there will be 128 bitmaps. This will make mapp[] size to
+ * 512 bytes, now including the space for page opaque and meta data
+ * header, the total size will be 968 bytes.
  */
-#define HASH_MAX_BITMAPS			1024
+#define HASH_MAX_BITMAPS			Min(BLCKSZ / 8, 1024)
 
 #define HASH_SPLITPOINT_PHASE_BITS	2
 #define HASH_SPLITPOINT_PHASES_PER_GRP	(1 << HASH_SPLITPOINT_PHASE_BITS)

#25

dilipbalaut@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#24)

Re: pg_verify_checksums failure with hash indexes

On Tue, Sep 4, 2018 at 10:49 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Sep 4, 2018 at 10:14 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Sep 3, 2018 at 2:44 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Mon, Sep 3, 2018 at 8:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Sep 1, 2018 at 10:28 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think if we compute with below formula which I suggested upthread

#define HASH_MAX_BITMAPS Min(BLCKSZ / 8, 1024)

then for BLCKSZ 8K and bigger, it will remain the same value where it
does not overrun. And, for the small BLCKSZ, I think it will give
sufficient space for the hash map. If the BLCKSZ is 1K then the sizeof
(HashMetaPageData) + sizeof (HashPageOpaque) = 968 which is very close
to the BLCKSZ.

Yeah, so at 1K, the value of HASH_MAX_BITMAPS will be 128 as per above
formula which is what it was its value prior to the commit 620b49a1.
I think it will be better if you add a comment in your patch
indicating the importance/advantage of such a formula.

I have added the comments.

In my previous patch mistakenly I put Max(BLCKSZ / 8, 1024) instead of
Min(BLCKSZ / 8, 1024). I have fixed the same.

Thanks, I will look into it. Can you please do some pg_upgrade tests
to ensure that this doesn't impact the upgrade? You can create
hash-index and populate it with some data in version 10 and try
upgrading to 11 after applying this patch. You can also try it with
different block-sizes.

Ok, I will do that.

I have tested pg_upgrade with different block size (1K, 4K, 8K, 32K).
The upgrade is working fine from v10 to v11 and I am able to fetch
data with index scan on the hash index after an upgrade.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#26

amit.kapila16@gmail.com

over 7 years ago

In reply to: Dilip Kumar (#25)

Re: pg_verify_checksums failure with hash indexes

On Tue, Sep 4, 2018 at 1:42 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I have tested pg_upgrade with different block size (1K, 4K, 8K, 32K).
The upgrade is working fine from v10 to v11 and I am able to fetch
data with index scan on the hash index after an upgrade.

Thanks, do you see any way to write a test for this patch? AFAICS,
there is no existing test for a different block size and not sure if
there is an easy way to write one. I feel it is not a bad idea if we
have some tests for different block sizes. Recently, during zheap
development, we found that we have introduced a bug for a non-default
block size and we can't find that because we don't have any test for
it and the same happens here.

Does anybody else have any idea on how can we write a test for
non-default block size or if we already have anything similar?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#27

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Amit Kapila (#26)

Re: pg_verify_checksums failure with hash indexes

Amit Kapila <amit.kapila16@gmail.com> writes:

Does anybody else have any idea on how can we write a test for
non-default block size or if we already have anything similar?

Build with a non-default BLCKSZ and see if the regression tests pass.
There's no way that a build with BLCKSZ x can run any tests for
BLCKSZ y.

Note that you can expect some plan variations from a different BLCKSZ,
so there'd be at least a few "failures" in the regression tests, which'd
require manual inspection. Otherwise this could be delegated to a
buildfarm animal using a nonstandard BLCKSZ.

regards, tom lane

#28