pageinspect and hash indexes

Started by Jeff Janesalmost 9 years ago22 messages
#1Jeff Janes
jeff.janes@gmail.com

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x))
from generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually? And isn't it unhelpful to have the
pageinspect module throw errors, rather than returning a dummy value to
indicate there was an error?

Thanks,

Jeff

#2Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jeff Janes (#1)
Re: pageinspect and hash indexes

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

You are seeing this error because of following change in the WAL patch
for hash index,

Basically, when we started working on WAL for hash index, we found
that WAL routine 'XLogReadBufferExtended' does not expect a page to be
completely zero page else it returns Invalid Buffer. To fix this, we
started initializing freed overflow page or new bucket pages using
_hash_pageinit() which basically initialises page header portion but
not it's special area where page type information is present. That's
why you are seeing an ERROR saying 'page is not a hash page'. Actually
pageinspect module needs to handle this type of page. Currently it is
just handling zero pages but not an empty pages. I will submit a patch
for this.

And isn't it unhelpful to have the

pageinspect module throw errors, rather than returning a dummy value to
indicate there was an error?

Well, this is something not specific to hash index. So, I have no answer :)

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jeff Janes (#1)
Re: pageinspect and hash indexes

On 3/17/17 13:24, Jeff Janes wrote:

And isn't it unhelpful to have the pageinspect module throw errors,
rather than returning a dummy value to indicate there was an error?

Even if one agreed with that premise, it would be hard to satisfy
reliably, because the user-facing pageinspect functions might reach into
lower-level code to do some of the decoding, which are free to error out.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ashutosh Sharma (#2)
Re: pageinspect and hash indexes

Ashutosh Sharma <ashu.coek88@gmail.com> writes:

Basically, when we started working on WAL for hash index, we found
that WAL routine 'XLogReadBufferExtended' does not expect a page to be
completely zero page else it returns Invalid Buffer. To fix this, we
started initializing freed overflow page or new bucket pages using
_hash_pageinit() which basically initialises page header portion but
not it's special area where page type information is present. That's
why you are seeing an ERROR saying 'page is not a hash page'. Actually
pageinspect module needs to handle this type of page. Currently it is
just handling zero pages but not an empty pages. I will submit a patch
for this.

That seems like entirely the wrong approach. You should make the special
space valid, instead, so that tools like pg_filedump can make sense of
the page.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#4)
Re: pageinspect and hash indexes

On Sat, Mar 18, 2017 at 1:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Sharma <ashu.coek88@gmail.com> writes:

Basically, when we started working on WAL for hash index, we found
that WAL routine 'XLogReadBufferExtended' does not expect a page to be
completely zero page else it returns Invalid Buffer. To fix this, we
started initializing freed overflow page or new bucket pages using
_hash_pageinit() which basically initialises page header portion but
not it's special area where page type information is present. That's
why you are seeing an ERROR saying 'page is not a hash page'. Actually
pageinspect module needs to handle this type of page. Currently it is
just handling zero pages but not an empty pages. I will submit a patch
for this.

That seems like entirely the wrong approach. You should make the special
space valid, instead, so that tools like pg_filedump can make sense of
the page.

We were not aware that external tools like pg_filedump are dependent
on special space of index, but after looking at the code of
pg_filedump, I agree with you that we need to initialize the special
space in this case (free overflow page). I think we can mark such a
page type as LH_UNUSED_PAGE and then initialize the other fields of
special space. Nonetheless, I think we still need modifications in
hashfuncs.c so that it can understand this type of page.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#2)
Re: pageinspect and hash indexes

On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

What makes you think that it can be a newly allocated page?
Basically, we always initialize the special space of newly allocated
page, so not sure what makes you deduce that it can be newly allocated
page.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#6)
Re: pageinspect and hash indexes

On Sat, Mar 18, 2017 at 1:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

What makes you think that it can be a newly allocated page?
Basically, we always initialize the special space of newly allocated
page, so not sure what makes you deduce that it can be newly allocated
page.

I came to know this from the following experiment.

I created a hash index and kept on inserting data in it till the split happens.

When split happened, I could see following values for firstblock and
lastblock in _hash_alloc_buckets()

Breakpoint 1, _hash_alloc_buckets (rel=0x7f6ac951ee30, firstblock=34,
nblocks=32) at hashpage.c:993
(gdb) n
(gdb) p firstblock
$15 = 34
(gdb) p nblocks
$16 = 32
(gdb) n
(gdb) p lastblock
$17 = 65

AFAIU, this bucket split resulted in creation of new bucket pages from
block number 34 to 65.

The contents for metap are as follows,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples =
2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15,
hashm_maxbucket = 32, hashm_highmask = 63, hashm_lowmask = 31,
hashm_ovflpoint = 6, hashm_firstfree = 0, hashm_nmaps = 1,
hashm_procid = 450,
hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats 25 times>},
hashm_mapp = {33, 0 <repeats 127 times>}}

Now, if i try to check the page type for block number 65, this is what i see,

test=# select * from hash_page_type(get_raw_page('con_hash_index', 65));
ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.
test=#

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#7)
Re: pageinspect and hash indexes

On Sat, Mar 18, 2017 at 5:13 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Sat, Mar 18, 2017 at 1:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

What makes you think that it can be a newly allocated page?
Basically, we always initialize the special space of newly allocated
page, so not sure what makes you deduce that it can be newly allocated
page.

I came to know this from the following experiment.

I created a hash index and kept on inserting data in it till the split happens.

When split happened, I could see following values for firstblock and
lastblock in _hash_alloc_buckets()

Breakpoint 1, _hash_alloc_buckets (rel=0x7f6ac951ee30, firstblock=34,
nblocks=32) at hashpage.c:993
(gdb) n
(gdb) p firstblock
$15 = 34
(gdb) p nblocks
$16 = 32
(gdb) n
(gdb) p lastblock
$17 = 65

AFAIU, this bucket split resulted in creation of new bucket pages from
block number 34 to 65.

The contents for metap are as follows,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples =
2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15,
hashm_maxbucket = 32, hashm_highmask = 63, hashm_lowmask = 31,
hashm_ovflpoint = 6, hashm_firstfree = 0, hashm_nmaps = 1,
hashm_procid = 450,
hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats 25 times>},
hashm_mapp = {33, 0 <repeats 127 times>}}

Now, if i try to check the page type for block number 65, this is what i see,

test=# select * from hash_page_type(get_raw_page('con_hash_index', 65));
ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.
test=#

The contents of such a page should be zero and Jeff has reported some
valid-looking contents of the page. If you see this page contents as
zero, then we can conclude what Jeff is seeing was an freed overflow
page.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#8)
2 attachment(s)
Re: pageinspect and hash indexes

On Mon, Mar 20, 2017 at 9:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 5:13 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Sat, Mar 18, 2017 at 1:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

What makes you think that it can be a newly allocated page?
Basically, we always initialize the special space of newly allocated
page, so not sure what makes you deduce that it can be newly allocated
page.

I came to know this from the following experiment.

I created a hash index and kept on inserting data in it till the split happens.

When split happened, I could see following values for firstblock and
lastblock in _hash_alloc_buckets()

Breakpoint 1, _hash_alloc_buckets (rel=0x7f6ac951ee30, firstblock=34,
nblocks=32) at hashpage.c:993
(gdb) n
(gdb) p firstblock
$15 = 34
(gdb) p nblocks
$16 = 32
(gdb) n
(gdb) p lastblock
$17 = 65

AFAIU, this bucket split resulted in creation of new bucket pages from
block number 34 to 65.

The contents for metap are as follows,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples =
2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15,
hashm_maxbucket = 32, hashm_highmask = 63, hashm_lowmask = 31,
hashm_ovflpoint = 6, hashm_firstfree = 0, hashm_nmaps = 1,
hashm_procid = 450,
hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats 25 times>},
hashm_mapp = {33, 0 <repeats 127 times>}}

Now, if i try to check the page type for block number 65, this is what i see,

test=# select * from hash_page_type(get_raw_page('con_hash_index', 65));
ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.
test=#

The contents of such a page should be zero and Jeff has reported some
valid-looking contents of the page. If you see this page contents as
zero, then we can conclude what Jeff is seeing was an freed overflow
page.

As shown in the mail thread-[1]/messages/by-id/CAE9k0P=N+JjzqnHqrURE7ZQMgySRpv=Bkjafbz=peD4cbCgbhA@mail.gmail.com, the contents of metapage are,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples
=2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15, hashm_maxbucket = 32, hashm_highmask = 63,
hashm_lowmask = 31, hashm_ovflpoint = 6, hashm_firstfree = 0,
hashm_nmaps = 1,
hashm_procid = 450, hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats
25 times>}, hashm_mapp = {33, 0 <repeats 127 times>}}

postgres=# select spares from
hash_metapage_info(get_raw_page('con_hash_index', 0));
spares
-------------------------------------------------------------------
{0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
(1 row)

Here, if you see the spare page count is just 1 which corresponds to
bitmap page. So, that means there is no overflow page in hash index
table and neither I have ran any DELETE statements in my experiment
that would result in freeing an overflow page.

Also, the page header content for such a page is,

$9 = {pd_lsn = {xlogid = 0, xrecoff = 23638120}, pd_checksum = 0,
pd_flags = 0, pd_lower = 24, pd_upper = 8176, pd_special = 8176,
pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp = 0x1f3aa88}

From values of pd_lower and pd_upper it is clear that it is an empty page.

The content of this page is,

\x00000000b0308a01000000001800f01ff01f042000.....

I think it is not just happening for freed overflow but also for newly
allocated bucket page. It would be good if we could mark freed
overflow page as UNUSED page rather than just initialising it's header
portion and leaving the page type in special area as it is. Attached
is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
marks a freed overflow page as an unused page.

Also, I have now changed pageinspect module to handle unused and empty
hash index page. Attached is the patch
(0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
the same.

[1]: /messages/by-id/CAE9k0P=N+JjzqnHqrURE7ZQMgySRpv=Bkjafbz=peD4cbCgbhA@mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

0001-mark_freed_ovflpage_as_UNUSED_pagetype.patchapplication/x-download; name=0001-mark_freed_ovflpage_as_UNUSED_pagetype.patchDownload
From 787974ea551285a6c069dbbdacb03a7e955c5dfd Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Mon, 20 Mar 2017 14:44:57 +0530
Subject: [PATCH] mark_freed_ovflpage_as_UNUSED_pagetype.patch

---
 src/backend/access/hash/hashovfl.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patchapplication/x-download; name=0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patchDownload
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..1f063f0 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -70,6 +70,17 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errmsg("index table contains corrupted page")));
 
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
+	/* Check if it is an unused hash page. */
+	if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+		return page;
+
+	/* Check if it is an empty hash page. */
+	if (PageIsEmpty(page))
+		ereport(ERROR,
+				(errcode(ERRCODE_INDEX_CORRUPTED),
+				 errmsg("index table contains empty page")));
+
 	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
#10Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#9)
Re: pageinspect and hash indexes

On Mon, Mar 20, 2017 at 6:53 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Mon, Mar 20, 2017 at 9:31 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 5:13 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Sat, Mar 18, 2017 at 1:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Mar 18, 2017 at 12:12 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 17, 2017 at 10:54 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

While trying to figure out some bloating in the newly logged hash indexes,
I'm looking into the type of each page in the index. But I get an error:

psql -p 9876 -c "select hash_page_type(get_raw_page('foo_index_idx',x)) from
generate_series(1650,1650) f(x)"

ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.

The contents of the page are:

\xa4000000d8f203bf65c900001800f01ff01f0420...

(where the elided characters at the end are all zero)

What kind of page is that actually?

it is basically either a newly allocated bucket page or a freed overflow page.

What makes you think that it can be a newly allocated page?
Basically, we always initialize the special space of newly allocated
page, so not sure what makes you deduce that it can be newly allocated
page.

I came to know this from the following experiment.

I created a hash index and kept on inserting data in it till the split happens.

When split happened, I could see following values for firstblock and
lastblock in _hash_alloc_buckets()

Breakpoint 1, _hash_alloc_buckets (rel=0x7f6ac951ee30, firstblock=34,
nblocks=32) at hashpage.c:993
(gdb) n
(gdb) p firstblock
$15 = 34
(gdb) p nblocks
$16 = 32
(gdb) n
(gdb) p lastblock
$17 = 65

AFAIU, this bucket split resulted in creation of new bucket pages from
block number 34 to 65.

The contents for metap are as follows,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples =
2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15,
hashm_maxbucket = 32, hashm_highmask = 63, hashm_lowmask = 31,
hashm_ovflpoint = 6, hashm_firstfree = 0, hashm_nmaps = 1,
hashm_procid = 450,
hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats 25 times>},
hashm_mapp = {33, 0 <repeats 127 times>}}

Now, if i try to check the page type for block number 65, this is what i see,

test=# select * from hash_page_type(get_raw_page('con_hash_index', 65));
ERROR: page is not a hash page
DETAIL: Expected 0000ff80, got 00000000.
test=#

The contents of such a page should be zero and Jeff has reported some
valid-looking contents of the page. If you see this page contents as
zero, then we can conclude what Jeff is seeing was an freed overflow
page.

As shown in the mail thread-[1], the contents of metapage are,

(gdb) p *metap
$18 = {hashm_magic = 105121344, hashm_version = 2, hashm_ntuples
=2593, hashm_ffactor = 81, hashm_bsize = 8152, hashm_bmsize = 4096,
hashm_bmshift = 15, hashm_maxbucket = 32, hashm_highmask = 63,
hashm_lowmask = 31, hashm_ovflpoint = 6, hashm_firstfree = 0,
hashm_nmaps = 1,
hashm_procid = 450, hashm_spares = {0, 0, 0, 0, 0, 1, 1, 0 <repeats
25 times>}, hashm_mapp = {33, 0 <repeats 127 times>}}

postgres=# select spares from
hash_metapage_info(get_raw_page('con_hash_index', 0));
spares
-------------------------------------------------------------------
{0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
(1 row)

Here, if you see the spare page count is just 1 which corresponds to
bitmap page. So, that means there is no overflow page in hash index
table and neither I have ran any DELETE statements in my experiment
that would result in freeing an overflow page.

Also, the page header content for such a page is,

$9 = {pd_lsn = {xlogid = 0, xrecoff = 23638120}, pd_checksum = 0,
pd_flags = 0, pd_lower = 24, pd_upper = 8176, pd_special = 8176,
pd_pagesize_version = 8196, pd_prune_xid = 0, pd_linp = 0x1f3aa88}

From values of pd_lower and pd_upper it is clear that it is an empty page.

The content of this page is,

\x00000000b0308a01000000001800f01ff01f042000.....

I think it is not just happening for freed overflow but also for newly
allocated bucket page. It would be good if we could mark freed
overflow page as UNUSED page rather than just initialising it's header
portion and leaving the page type in special area as it is. Attached
is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
marks a freed overflow page as an unused page.

Also, I have now changed pageinspect module to handle unused and empty
hash index page. Attached is the patch
(0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
the same.

[1] - /messages/by-id/CAE9k0P=N+JjzqnHqrURE7ZQMgySRpv=Bkjafbz=peD4cbCgbhA@mail.gmail.com

I think when expanding hash index table we are only initialising the
pages that will be in-use after split or the last block in the index.
The initial few pages where tuples will be moved from old to new pages
has no issues but the last block that is just initialised and is not
marked as hash page needs to be handled along with the freed overflow
page. Please let me know thoughts on this. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#10)
Re: pageinspect and hash indexes

On Tue, Mar 21, 2017 at 10:15 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

I think it is not just happening for freed overflow but also for newly
allocated bucket page. It would be good if we could mark freed
overflow page as UNUSED page rather than just initialising it's header
portion and leaving the page type in special area as it is. Attached
is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
marks a freed overflow page as an unused page.

Also, I have now changed pageinspect module to handle unused and empty
hash index page. Attached is the patch
(0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
the same.

[1] - /messages/by-id/CAE9k0P=N+JjzqnHqrURE7ZQMgySRpv=Bkjafbz=peD4cbCgbhA@mail.gmail.com

I have yet to review your patches, let's first try to clear other
things before doing so.

I think when expanding hash index table we are only initialising the
pages that will be in-use after split or the last block in the index.
The initial few pages where tuples will be moved from old to new pages
has no issues but the last block that is just initialised and is not
marked as hash page needs to be handled along with the freed overflow
page.

I think PageIsEmpty() check in hashfuncs.c is sufficient for same. I
don't see any value in treating the last page allocated during
_hash_alloc_buckets() any different from other pages which are prior
to that but will get allocated when required.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#9)
Re: pageinspect and hash indexes

On Mon, Mar 20, 2017 at 6:53 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

I think it is not just happening for freed overflow but also for newly
allocated bucket page. It would be good if we could mark freed
overflow page as UNUSED page rather than just initialising it's header
portion and leaving the page type in special area as it is. Attached
is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
marks a freed overflow page as an unused page.

  _hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+ ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+ ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+ ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+ ovflopaque->hasho_bucket = -1;
+ ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+ ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+

You need similar initialization in replay routine as well.

Also, I have now changed pageinspect module to handle unused and empty
hash index page. Attached is the patch
(0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
the same.

1.
@@ -70,6 +70,17 @@ verify_hash_page(bytea *raw_page, int flags)
  pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
+ /* Check if it is an unused hash page. */
+ if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+ return page;

I don't see we need to have a separate check for unused page, why
can't it be checked with other page types in below check.

if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#12)
Re: pageinspect and hash indexes

I think it is not just happening for freed overflow but also for newly
allocated bucket page. It would be good if we could mark freed
overflow page as UNUSED page rather than just initialising it's header
portion and leaving the page type in special area as it is. Attached
is the patch '0001-mark_freed_ovflpage_as_UNUSED_pagetype.patch' that
marks a freed overflow page as an unused page.

_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+ ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+ ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+ ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+ ovflopaque->hasho_bucket = -1;
+ ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+ ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+

You need similar initialization in replay routine as well.

I will do that and share the patch shortly. Thanks.

Also, I have now changed pageinspect module to handle unused and empty
hash index page. Attached is the patch
(0002-allow_pageinspect_handle_UNUSED_OR_EMPTY_hash_pages.patch) for
the same.

1.
@@ -70,6 +70,17 @@ verify_hash_page(bytea *raw_page, int flags)
pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
+ /* Check if it is an unused hash page. */
+ if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+ return page;

I don't see we need to have a separate check for unused page, why
can't it be checked with other page types in below check.

if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)

That is because UNUSED page is as good as an empty page except that
empty page doesn't have any pagetype. If we add condition for checking
UNUSED page in above if check it will never show unused page as an
unsed page rather it will always show it as an empty page. To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#13)
Re: pageinspect and hash indexes

On Thu, Mar 23, 2017 at 10:02 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

1.
@@ -70,6 +70,17 @@ verify_hash_page(bytea *raw_page, int flags)
pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
+ /* Check if it is an unused hash page. */
+ if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+ return page;

I don't see we need to have a separate check for unused page, why
can't it be checked with other page types in below check.

if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)

That is because UNUSED page is as good as an empty page except that
empty page doesn't have any pagetype. If we add condition for checking
UNUSED page in above if check it will never show unused page as an
unsed page rather it will always show it as an empty page.

Oh, okay, but my main objection was that we should not check hash page
type (hasho_flag) without ensuring whether it is a hash page. Can you
try to adjust the above code so that this check can be moved after
hasho_page_id check?

To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

I understand your point, but not sure if it makes any difference to user.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#14)
2 attachment(s)
Re: pageinspect and hash indexes

Hi Amit,

+
+ /* Check if it is an unused hash page. */
+ if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+ return page;

I don't see we need to have a separate check for unused page, why
can't it be checked with other page types in below check.

if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)

That is because UNUSED page is as good as an empty page except that
empty page doesn't have any pagetype. If we add condition for checking
UNUSED page in above if check it will never show unused page as an
unsed page rather it will always show it as an empty page.

Oh, okay, but my main objection was that we should not check hash page
type (hasho_flag) without ensuring whether it is a hash page. Can you
try to adjust the above code so that this check can be moved after
hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

I understand your point, but not sure if it makes any difference to user.

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'

Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1]/messages/by-id/CAA4eK1+VE_TDRLWpyeOf+7+6if68kgPNwO4guKo060rm_t3O5w@mail.gmail.com.

[1]: /messages/by-id/CAA4eK1+VE_TDRLWpyeOf+7+6if68kgPNwO4guKo060rm_t3O5w@mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchapplication/x-patch; name=0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchDownload
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +++++++++
 src/backend/access/hash/hashovfl.c  | 13 ++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, &ovflbuf) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

0002-allow_pageinspect_handle_UNUSED_hash_pages.patchapplication/x-patch; name=0002-allow_pageinspect_handle_UNUSED_hash_pages.patchDownload
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..6ba8847 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -70,6 +70,7 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errmsg("index table contains corrupted page")));
 
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
 	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -77,6 +78,10 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errdetail("Expected %08x, got %08x.",
 						   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
 
+	/* Check if it is an unused hash page. */
+	if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+		return page;
+
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#15)
Re: pageinspect and hash indexes

On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Oh, okay, but my main objection was that we should not check hash page
type (hasho_flag) without ensuring whether it is a hash page. Can you
try to adjust the above code so that this check can be moved after
hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

Yeah, I think it is worth adding a Note in docs, but we can do that
separately if required.

To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

I understand your point, but not sure if it makes any difference to user.

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'

+
if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
ereport(ERROR,

spurious white space.

Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1].

In general, we have to initialize prevblock with max_bucket, but here
it is okay, because we anyway initialize it after this page is
allocated as overflow page.

Your patches look good to me except for small white space change.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#16)
2 attachment(s)
Re: pageinspect and hash indexes

On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Oh, okay, but my main objection was that we should not check hash page
type (hasho_flag) without ensuring whether it is a hash page. Can you
try to adjust the above code so that this check can be moved after
hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

Yeah, I think it is worth adding a Note in docs, but we can do that
separately if required.

To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

I understand your point, but not sure if it makes any difference to user.

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'

+
if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
ereport(ERROR,

spurious white space.

Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1].

In general, we have to initialize prevblock with max_bucket, but here
it is okay, because we anyway initialize it after this page is
allocated as overflow page.

Your patches look good to me except for small white space change.

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchapplication/x-patch; name=0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchDownload
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +++++++++
 src/backend/access/hash/hashovfl.c  | 13 ++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, &ovflbuf) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

0002-allow_pageinspect_handle_UNUSED_hash_pages.patchapplication/x-patch; name=0002-allow_pageinspect_handle_UNUSED_hash_pages.patchDownload
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..6ba8847 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -70,6 +70,7 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errmsg("index table contains corrupted page")));
 
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
 	if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -77,6 +78,10 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errdetail("Expected %08x, got %08x.",
 						   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
 
+	/* Check if it is an unused hash page. */
+	if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+		return page;
+
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
#18Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#17)
2 attachment(s)
Re: pageinspect and hash indexes

On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Oh, okay, but my main objection was that we should not check hash page
type (hasho_flag) without ensuring whether it is a hash page. Can you
try to adjust the above code so that this check can be moved after
hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

Yeah, I think it is worth adding a Note in docs, but we can do that
separately if required.

To avoid
this, at the start of verify_hash_page function itself if we recognise
page as UNUSED page we return immediately.

2.
+ /* Check if it is an empty hash page. */
+ if (PageIsEmpty(page))
+ ereport(ERROR,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("index table contains empty page")));

Do we want to give a separate message for EMPTY and NEW pages? Isn't
it better that the same error message can be given for both of them as
from user perspective there is not much difference between both the
messages?

I think we should show separate message because they are two different
type of pages. In the sense like, one is initialised whereas other is
completely zero.

I understand your point, but not sure if it makes any difference to user.

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'

+
if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
ereport(ERROR,

spurious white space.

Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1].

In general, we have to initialize prevblock with max_bucket, but here
it is okay, because we anyway initialize it after this page is
allocated as overflow page.

Your patches look good to me except for small white space change.

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchapplication/x-patch; name=0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patchDownload
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +++++++++
 src/backend/access/hash/hashovfl.c  | 13 ++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, &ovflbuf) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+	ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+	ovflopaque->hasho_bucket = -1;
+	ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+	ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 	MarkBufferDirty(ovflbuf);
 
 	if (BufferIsValid(prevbuf))
-- 
1.8.3.1

0002-allow_pageinspect_handle_UNUSED_hash_pages_v2.patchapplication/x-patch; name=0002-allow_pageinspect_handle_UNUSED_hash_pages_v2.patchDownload
diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..bc65c4b 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -77,6 +77,10 @@ verify_hash_page(bytea *raw_page, int flags)
 				 errdetail("Expected %08x, got %08x.",
 						   HASHO_PAGE_ID, pageopaque->hasho_page_id)));
 
+	/* Check if it is an unused hash page. */
+	if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
+		return page;
+
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#18)
Re: pageinspect and hash indexes

On Fri, Mar 24, 2017 at 9:50 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

The latest version of patches looks fine to me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#19)
Re: pageinspect and hash indexes

On Fri, Mar 24, 2017 at 3:54 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Mar 24, 2017 at 9:50 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

The latest version of patches looks fine to me.

I don't really like 0002. What about this, instead?

--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
     /* Check that page type is sane. */
     pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
     if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-        pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+        pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+        pagetype != LH_UNUSED_PAGE)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                  errmsg("invalid hash page type %08x", pagetype)));

The advantage of that is (1) it won't get confused if in the future we
have an unused page that has some flag bit not in LH_PAGE_TYPE set,
and (2) if in the future we want to add any other checks to this
function which should apply to unused pages also, they won't get
bypassed by an early return statement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#20)
1 attachment(s)
Re: pageinspect and hash indexes

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

The latest version of patches looks fine to me.

I don't really like 0002. What about this, instead?

--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
/* Check that page type is sane. */
pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-        pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+        pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+        pagetype != LH_UNUSED_PAGE)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid hash page type %08x", pagetype)));

The advantage of that is (1) it won't get confused if in the future we
have an unused page that has some flag bit not in LH_PAGE_TYPE set,
and (2) if in the future we want to add any other checks to this
function which should apply to unused pages also, they won't get
bypassed by an early return statement.

Agreed. Moreover, previous approach might even change the current
behaviour of functions like hash_page_items() and hash_page_stats()
basically when we pass UNUSED pages to these functions. Attached is
the newer version of patch.

Attachments:

0001-Allow-pageinspect-to-handle-UNUSED-hash-pages-v3.patchapplication/x-patch; name=0001-Allow-pageinspect-to-handle-UNUSED-hash-pages-v3.patchDownload
From 5c54fa58895cd279ff44424a3eb1feafdaca69d5 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Sat, 25 Mar 2017 01:02:35 +0530
Subject: [PATCH] Allow pageinspect to handle UNUSED hash pages v3

---
 contrib/pageinspect/hashfuncs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..2632287 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+		pagetype != LH_UNUSED_PAGE)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("invalid hash page type %08x", pagetype)));
-- 
1.8.3.1

#22Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#21)
Re: pageinspect and hash indexes

On Fri, Mar 24, 2017 at 3:44 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Agreed. Moreover, previous approach might even change the current
behaviour of functions like hash_page_items() and hash_page_stats()
basically when we pass UNUSED pages to these functions. Attached is
the newer version of patch.

This patch doesn't completely solve the problem:

pgbench -i -s 40
alter table pgbench_accounts drop constraint pgbench_accounts_pkey;
create index on pgbench_accounts using hash (aid);
insert into pgbench_accounts select * from pgbench_accounts;
select hash_page_type, min(n), max(n), count(*) from (select n,
hash_page_type(get_raw_page('pgbench_accounts_aid_idx'::text, n)) from
generate_series(0,
(pg_relation_size('pgbench_accounts_aid_idx')/8192)::integer - 1) n) x
group by 1;
ERROR: index table contains zero page

This happens because of the sparse allocation forced by
_hash_alloc_buckets. Perhaps the real fix is to have that function
initialize all of the pages instead of just the last one, but unless
and until we decide to do that, we've got to cope with zero pages in
the index. Actually, even if we did fix that I think we'd still need
to do this, because the way relation extension works in general is
that we first write a page of zeroes and then go back and fill in the
data; an intervening crash can leave behind the intermediate state.

A second problem is that the one page initialized by
_hash_alloc_buckets needs to end up with a valid special space;
otherwise, you get the same error Jeff complained about originally
when you try to use hash_page_type() on it.

I fixed those issues and committed this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers