pageinspect: Hash index support

Started by Jesper Pedersenover 9 years ago102 messageshackers
Jump to latest
#1Jesper Pedersen
jesper.pedersen@redhat.com

Hi,

Attached is a patch that adds support for hash indexes in pageinspect.

The functionality (hash_metap, hash_page_stats and hash_page_items)
follows the B-tree functions.

This patch will need an update once Amit's and Mithun's work on
Concurrent Hash Indexes is committed to account for the new meta-page
constants.

I'll create a CommitFest entry for this submission.

Feedback is most welcome.

Best regards,
Jesper

Attachments:

0001-pageinspect-Hash-index-support_v1.patchtext/x-patch; name=0001-pageinspect-Hash-index-support_v1.patchDownload+981-234
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jesper Pedersen (#1)
Re: pageinspect: Hash index support

Jesper Pedersen wrote:

Hi,

Attached is a patch that adds support for hash indexes in pageinspect.

The functionality (hash_metap, hash_page_stats and hash_page_items) follows
the B-tree functions.

I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea. This enables other use cases such as grabbing a page from one
server and examining it in another one.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#2)
Re: pageinspect: Hash index support

On Wed, Aug 31, 2016 at 2:06 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Jesper Pedersen wrote:

Attached is a patch that adds support for hash indexes in pageinspect.

The functionality (hash_metap, hash_page_stats and hash_page_items) follows
the B-tree functions.

I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea. This enables other use cases such as grabbing a page from one
server and examining it in another one.

+1.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jeff Janes
jeff.janes@gmail.com
In reply to: Alvaro Herrera (#2)
Re: pageinspect: Hash index support

On Tue, Aug 30, 2016 at 10:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Jesper Pedersen wrote:

Hi,

Attached is a patch that adds support for hash indexes in pageinspect.

The functionality (hash_metap, hash_page_stats and hash_page_items)

follows

the B-tree functions.

I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea. This enables other use cases such as grabbing a page from one
server and examining it in another one.

I've never needed to do that, but it does sound like it might be useful.
But it is also annoying to have to do that when you want to examine a
current server. Could we use overloading, so that it can be used in either
way?

For hash_page_items, the 'data' is always a 32 bit integer, isn't it?
(unlike other pageinspect features, where the data could be any
user-defined data type). If so, I'd rather see it in plain hex (without
the spaces, without the trailing zeros)

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_UNUSED_PAGE)
+       stat->type = 'u';

This can never be true, because LH_UNUSED_PAGE is zero. You should make
this be the fall-through case, and have LH_META_PAGE be explicitly tested
for.

Cheers,

Jeff

#5Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jeff Janes (#4)
Re: pageinspect: Hash index support

On 09/14/2016 04:21 PM, Jeff Janes wrote:

I suggest that pageinspect functions are more convenient to use via the
get_raw_page interface, that is, instead of reading the buffer
themselves, the buffer is handed over from elsewhere and they receive it
as bytea. This enables other use cases such as grabbing a page from one
server and examining it in another one.

I've never needed to do that, but it does sound like it might be useful.
But it is also annoying to have to do that when you want to examine a
current server. Could we use overloading, so that it can be used in either
way?

For hash_page_items, the 'data' is always a 32 bit integer, isn't it?
(unlike other pageinspect features, where the data could be any
user-defined data type). If so, I'd rather see it in plain hex (without
the spaces, without the trailing zeros)

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_UNUSED_PAGE)
+       stat->type = 'u';

This can never be true, because LH_UNUSED_PAGE is zero. You should make
this be the fall-through case, and have LH_META_PAGE be explicitly tested
for.

This version adds the overloaded get_raw_page based methods. However,
I'm not verifying the page other than page_id, as hasho_flag can contain
multiple flags in Amit's patches. And, I don't see having different page
ids as a benefit - at least not part of this patch.

We should probably just have one of the method sets, so I'll send a v3
with the set voted for.

I kept the 'data' field as is, for now.

Best regards,
Jesper

Attachments:

0001-pageinspect-Hash-index-support_v2.patchtext/x-patch; name=0001-pageinspect-Hash-index-support_v2.patchDownload+1372-234
#6Michael Paquier
michael@paquier.xyz
In reply to: Jesper Pedersen (#5)
Re: pageinspect: Hash index support

On Tue, Sep 20, 2016 at 1:11 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

This version adds the overloaded get_raw_page based methods. However, I'm
not verifying the page other than page_id, as hasho_flag can contain
multiple flags in Amit's patches. And, I don't see having different page ids
as a benefit - at least not part of this patch.

We should probably just have one of the method sets, so I'll send a v3 with
the set voted for.

I kept the 'data' field as is, for now.

$ git diff master --check
contrib/pageinspect/hashfuncs.c:758: space before tab in indent.
+ values);
That's always good to run to check the format of a patch before sending it.

You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
function hash_metap(text)
function hash_metap_bytea(bytea)
function hash_page_items(text,integer)
function hash_page_items_bytea(bytea)
function hash_page_stats(text,integer)
function hash_page_stats_bytea(bytea,integer)

Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.

Note: the patch checks if a superuser is calling the new functions,
which is a good thing.

I am switching the patch back to "waiting on author". As far as I can
see the patch is in rather good shape, you just need to adapt the docs
and shave all the parts that are not needed for the final result.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Michael Paquier (#6)
Re: pageinspect: Hash index support

On 09/20/2016 03:19 AM, Michael Paquier wrote:

You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
function hash_metap(text)
function hash_metap_bytea(bytea)
function hash_page_items(text,integer)
function hash_page_items_bytea(bytea)
function hash_page_stats(text,integer)
function hash_page_stats_bytea(bytea,integer)

Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.

Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked
for both.

Attached is v3 with only the bytea based methods.

Alvaro, Michael and Jeff - Thanks for the review !

Best regards,
Jesper

Attachments:

0001-pageinspect-Hash-index-support_v3.patchtext/x-patch; name=0001-pageinspect-Hash-index-support_v3.patchDownload+917-234
#8Jeff Janes
jeff.janes@gmail.com
In reply to: Jesper Pedersen (#7)
Re: pageinspect: Hash index support

On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen <jesper.pedersen@redhat.com

wrote:

On 09/20/2016 03:19 AM, Michael Paquier wrote:

You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
function hash_metap(text)
function hash_metap_bytea(bytea)
function hash_page_items(text,integer)
function hash_page_items_bytea(bytea)
function hash_page_stats(text,integer)
function hash_page_stats_bytea(bytea,integer)

Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.

Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked
for both.

Attached is v3 with only the bytea based methods.

Alvaro, Michael and Jeff - Thanks for the review !

Is the 2nd "1" in this call needed?

SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)

As far as I can tell, that argument is only used to stuff into the output
field "blkno", it is not used to instruct the interpretation of the raw
page itself. It doesn't seem worthwhile to have the parameter that only
echos back to the user what the user already put in (twice). The only
existing funtions which take the blkno argument are those that don't use
the get_raw_page style.

Also, should we document what the single letter values mean in the
hash_page_stats.type column? It is not obvious that 'i' means bitmap, for
example.

Cheers,

Jeff

#9Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jeff Janes (#8)
Re: pageinspect: Hash index support

Hi,

I am getting a fatal error along with some warnings when trying to
apply the v3 patch using "git apply" command.

[ashu@localhost postgresql]$ git apply
0001-pageinspect-Hash-index-support_v3.patch
0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace.
brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace.
DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace.
pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace.
pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace.
pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
fatal: git apply: bad git-diff - expected /dev/null on line 46

Also, i think the USAGE for hash_metap() is refering to
hash_metap_bytea(). Please correct it. I have just started reviewing
the patch, will keep on posting my comments upon further review.
Thanks.

With Regards,
Ashutosh Sharma.
EnterpriseDB: http://www.enterprisedb.com

On Tue, Sep 20, 2016 at 10:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Sep 20, 2016 at 5:40 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

On 09/20/2016 03:19 AM, Michael Paquier wrote:

You did not get right the comments from Alvaro upthread. The following
functions are added with this patch:
function hash_metap(text)
function hash_metap_bytea(bytea)
function hash_page_items(text,integer)
function hash_page_items_bytea(bytea)
function hash_page_stats(text,integer)
function hash_page_stats_bytea(bytea,integer)

Now the following set of functions would be sufficient:
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
The last time pageinspect has been updated, when BRIN functions have
been added, it has been discussed to just use (bytea) as an argument
interface and just rely on get_raw_page() to get the pages wanted, so
I think that we had better stick with that and keep things simple.

Yes, I know, Alvaro and you voted for the bytea methods, and Jeff asked
for both.

Attached is v3 with only the bytea based methods.

Alvaro, Michael and Jeff - Thanks for the review !

Is the 2nd "1" in this call needed?

SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)

As far as I can tell, that argument is only used to stuff into the output
field "blkno", it is not used to instruct the interpretation of the raw page
itself. It doesn't seem worthwhile to have the parameter that only echos
back to the user what the user already put in (twice). The only existing
funtions which take the blkno argument are those that don't use the
get_raw_page style.

Also, should we document what the single letter values mean in the
hash_page_stats.type column? It is not obvious that 'i' means bitmap, for
example.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jeff Janes (#8)
Re: pageinspect: Hash index support

On 09/20/2016 12:45 PM, Jeff Janes wrote:

Is the 2nd "1" in this call needed?

SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)

As far as I can tell, that argument is only used to stuff into the output
field "blkno", it is not used to instruct the interpretation of the raw
page itself. It doesn't seem worthwhile to have the parameter that only
echos back to the user what the user already put in (twice). The only
existing funtions which take the blkno argument are those that don't use
the get_raw_page style.

Also, should we document what the single letter values mean in the
hash_page_stats.type column? It is not obvious that 'i' means bitmap, for
example.

Adjusted in v4. Code/doc will need an update once the CHI patch goes in.

Best regards,
Jesper

Attachments:

0001-pageinspect-Hash-index-support_v4.patchtext/x-patch; name=0001-pageinspect-Hash-index-support_v4.patchDownload+914-234
#11Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#9)
Re: pageinspect: Hash index support

Hi,

On 09/20/2016 01:46 PM, Ashutosh Sharma wrote:

I am getting a fatal error along with some warnings when trying to
apply the v3 patch using "git apply" command.

[ashu@localhost postgresql]$ git apply
0001-pageinspect-Hash-index-support_v3.patch
0001-pageinspect-Hash-index-support_v3.patch:29: trailing whitespace.
brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES)
0001-pageinspect-Hash-index-support_v3.patch:36: trailing whitespace.
DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \
0001-pageinspect-Hash-index-support_v3.patch:37: trailing whitespace.
pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \
0001-pageinspect-Hash-index-support_v3.patch:38: trailing whitespace.
pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
0001-pageinspect-Hash-index-support_v3.patch:39: trailing whitespace.
pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
fatal: git apply: bad git-diff - expected /dev/null on line 46

Which platform are you on ?

The development branch is @
https://github.com/jesperpedersen/postgres/tree/pageinspect_hash

Also, i think the USAGE for hash_metap() is refering to
hash_metap_bytea(). Please correct it. I have just started reviewing
the patch, will keep on posting my comments upon further review.

Fixed in v4.

Thanks for the review.

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Jesper Pedersen (#10)
Re: pageinspect: Hash index support

On Wed, Sep 21, 2016 at 3:25 AM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

On 09/20/2016 12:45 PM, Jeff Janes wrote:

Is the 2nd "1" in this call needed?

SELECT * FROM hash_page_stats(get_raw_page('mytab_index', 1), 1)

As far as I can tell, that argument is only used to stuff into the output
field "blkno", it is not used to instruct the interpretation of the raw
page itself. It doesn't seem worthwhile to have the parameter that only
echos back to the user what the user already put in (twice). The only
existing funtions which take the blkno argument are those that don't use
the get_raw_page style.

Also, should we document what the single letter values mean in the
hash_page_stats.type column? It is not obvious that 'i' means bitmap, for
example.

Adjusted in v4.

Thanks for the new version.

Code/doc will need an update once the CHI patch goes in.

If you are referring to the WAL support, I guess that this patch or
the other patch could just rebase and adjust as needed.

hash_page_items and hash_page_stats share a lot of common points with
their btree equivalents, still doing a refactoring would just impact
the visibility of the code, and it is wanted as educational in this
module, so let's go with things the way you suggest.

+     <para>
+      The type information will be '<literal>m</literal>' for a metadata page,
+      '<literal>v</literal>' for an overflow page,
'<literal>b</literal>' for a bucket page,
+      '<literal>i</literal>' for a bitmap page, and
'<literal>u</literal>' for an unused page.
+     </para>
Other functions don't go into this level of details, so I would just
rip out this paragraph.

The patch looks in pretty good shape to me, so I am switching it as
ready for committer.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#11)
Re: pageinspect: Hash index support

Hi,

Which platform are you on ?

The development branch is @
https://github.com/jesperpedersen/postgres/tree/pageinspect_hash

I am working on centOS 7. I am still facing the issue when applying
your patch using "git apply" command but if i use 'patch' command it
works fine however i am seeing some warning messages with 'patch'
command as well.

I continued reviewing your v4 patch and here are some more comments:

1). Got below error if i pass meta page to hash_page_items(). Does
hash_page_items() accept only bucket or bitmap page as input?

postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', 0));
ERROR: invalid memory alloc request size 18446744073709551593

2). Server crashed when below query was executed on a code that was
build with cassert-enabled.

postgres=# SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

The callstack is as follows:

(gdb) bt
#0 0x00007fef794f15f7 in raise () from /lib64/libc.so.6
#1 0x00007fef794f2ce8 in abort () from /lib64/libc.so.6
#2 0x0000000000967381 in ExceptionalCondition
(conditionName=0x7fef699c942d "!(((id)->lp_len != 0))",
errorType=0x7fef699c92d4 "FailedAssertion",
fileName=0x7fef699c9397 "hashfuncs.c", lineNumber=123) at assert.c:54
#3 0x00007fef699c6ed6 in GetHashPageStatistics (page=0x1060c2c "",
stat=0x7ffd76846230) at hashfuncs.c:123
#4 0x00007fef699c70a4 in hash_page_stats (fcinfo=0x7ffd768463e0) at
hashfuncs.c:169
#5 0x0000000000682e6b in ExecMakeTableFunctionResult
(funcexpr=0x10457f0, econtext=0x1044e28, argContext=0x104ee38,
expectedDesc=0x1047640,
randomAccess=0 '\000') at execQual.c:2216
#6 0x00000000006a88e2 in FunctionNext (node=0x1044d10) at nodeFunctionscan.c:94
#7 0x000000000068a3d9 in ExecScanFetch (node=0x1044d10,
accessMtd=0x6a882b <FunctionNext>, recheckMtd=0x6a8c10
<FunctionRecheck>) at execScan.c:95
#8 0x000000000068a448 in ExecScan (node=0x1044d10, accessMtd=0x6a882b
<FunctionNext>, recheckMtd=0x6a8c10 <FunctionRecheck>) at
execScan.c:145
#9 0x00000000006a8c45 in ExecFunctionScan (node=0x1044d10) at
nodeFunctionscan.c:268
#10 0x000000000067f0cf in ExecProcNode (node=0x1044d10) at execProcnode.c:449
#11 0x000000000067b40b in ExecutePlan (estate=0x1044bf8,
planstate=0x1044d10, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0x10444c8) at
execMain.c:1567
#12 0x0000000000679527 in standard_ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:338
#13 0x00000000006793ab in ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:286
#14 0x0000000000816b3e in PortalRunSelect (portal=0xfa49e8, forward=1
'\001', count=0, dest=0x10444c8) at pquery.c:948
#15 0x00000000008167d1 in PortalRun (portal=0xfa49e8,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x10444c8,
altdest=0x10444c8,
completionTag=0x7ffd76846c60 "") at pquery.c:789
#16 0x0000000000810a27 in exec_simple_query (query_string=0x1007018
"SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));") at
postgres.c:1094
#17 0x0000000000814b5e in PostgresMain (argc=1, argv=0xfb1d08,
dbname=0xf873d8 "postgres", username=0xfb1b70 "edb") at
postgres.c:4070
#18 0x000000000078990c in BackendRun (port=0xfacb50) at postmaster.c:4260

3). Could you please let me know, what does the hard coded values '*5'
and '+1' represents in the below lines of code. You have used them
when allocating memory for storing spare pages allocated before
certain split point and block number of bitmap pages inside
hash_metap().

spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1);

mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1);

I guess it would be better to add some comments if using any hard coded values.

4). Inside hash_page_stats(), you are first calling verify_hash_page()
to verify if it is a hash page or not and then calling
GetHashPageStatistics() to get the page stats wherein you are trying
to check what is the type of hash page. Below is the code snippet for
this,

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_META_PAGE)
+       stat->type = 'm';
+   else if (opaque->hasho_flag & LH_OVERFLOW_PAGE)
+       stat->type = 'v';
+   else if (opaque->hasho_flag & LH_BUCKET_PAGE)
+       stat->type = 'b';
+   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
+       stat->type = 'i';
+   else
+       stat->type = 'u';

My question is, can we have a hash page that does not fall under the
category of Metapage, overflow page, bitmap page, bucket page? I guess
we can't have such type of hash page and incase if we found such type
of undefined page i guess we should error out instead of reading the
page because it is quite possible that the page would be corrupted.
Please let me know your thoughts on this.

5). I think we have added enough functions to show the page level
statistics but not the index level statistics like the total number of
overflow pages , bucket pages, number of free overflow pages, number
of bitmap pages etc. in the hash index. How about adding a function
that would store the index level statistics?

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#13)
Re: pageinspect: Hash index support

Hi,

On 09/21/2016 07:24 AM, Ashutosh Sharma wrote:

The development branch is @
https://github.com/jesperpedersen/postgres/tree/pageinspect_hash

I am working on centOS 7. I am still facing the issue when applying
your patch using "git apply" command but if i use 'patch' command it
works fine however i am seeing some warning messages with 'patch'
command as well.

git apply w/ v4 works here, so you will have to investigate what happens
on your side.

I continued reviewing your v4 patch and here are some more comments:

1). Got below error if i pass meta page to hash_page_items(). Does
hash_page_items() accept only bucket or bitmap page as input?

postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index', 0));
ERROR: invalid memory alloc request size 18446744073709551593

page_stats / page_items should not be used on the metadata page.

As these functions are marked as superuser only it is expected that
people provides the correct input, especially since the raw page
structure is used as the input.

2). Server crashed when below query was executed on a code that was
build with cassert-enabled.

postgres=# SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

The callstack is as follows:

(gdb) bt
#0 0x00007fef794f15f7 in raise () from /lib64/libc.so.6
#1 0x00007fef794f2ce8 in abort () from /lib64/libc.so.6
#2 0x0000000000967381 in ExceptionalCondition
(conditionName=0x7fef699c942d "!(((id)->lp_len != 0))",
errorType=0x7fef699c92d4 "FailedAssertion",
fileName=0x7fef699c9397 "hashfuncs.c", lineNumber=123) at assert.c:54
#3 0x00007fef699c6ed6 in GetHashPageStatistics (page=0x1060c2c "",
stat=0x7ffd76846230) at hashfuncs.c:123
#4 0x00007fef699c70a4 in hash_page_stats (fcinfo=0x7ffd768463e0) at
hashfuncs.c:169
#5 0x0000000000682e6b in ExecMakeTableFunctionResult
(funcexpr=0x10457f0, econtext=0x1044e28, argContext=0x104ee38,
expectedDesc=0x1047640,
randomAccess=0 '\000') at execQual.c:2216
#6 0x00000000006a88e2 in FunctionNext (node=0x1044d10) at nodeFunctionscan.c:94
#7 0x000000000068a3d9 in ExecScanFetch (node=0x1044d10,
accessMtd=0x6a882b <FunctionNext>, recheckMtd=0x6a8c10
<FunctionRecheck>) at execScan.c:95
#8 0x000000000068a448 in ExecScan (node=0x1044d10, accessMtd=0x6a882b
<FunctionNext>, recheckMtd=0x6a8c10 <FunctionRecheck>) at
execScan.c:145
#9 0x00000000006a8c45 in ExecFunctionScan (node=0x1044d10) at
nodeFunctionscan.c:268
#10 0x000000000067f0cf in ExecProcNode (node=0x1044d10) at execProcnode.c:449
#11 0x000000000067b40b in ExecutePlan (estate=0x1044bf8,
planstate=0x1044d10, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0x10444c8) at
execMain.c:1567
#12 0x0000000000679527 in standard_ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:338
#13 0x00000000006793ab in ExecutorRun (queryDesc=0xfac778,
direction=ForwardScanDirection, count=0) at execMain.c:286
#14 0x0000000000816b3e in PortalRunSelect (portal=0xfa49e8, forward=1
'\001', count=0, dest=0x10444c8) at pquery.c:948
#15 0x00000000008167d1 in PortalRun (portal=0xfa49e8,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x10444c8,
altdest=0x10444c8,
completionTag=0x7ffd76846c60 "") at pquery.c:789
#16 0x0000000000810a27 in exec_simple_query (query_string=0x1007018
"SELECT * FROM hash_page_stats(get_raw_page('hash_i4_index', 0));") at
postgres.c:1094
#17 0x0000000000814b5e in PostgresMain (argc=1, argv=0xfb1d08,
dbname=0xf873d8 "postgres", username=0xfb1b70 "edb") at
postgres.c:4070
#18 0x000000000078990c in BackendRun (port=0xfacb50) at postmaster.c:4260

Same deal here.

3). Could you please let me know, what does the hard coded values '*5'
and '+1' represents in the below lines of code. You have used them
when allocating memory for storing spare pages allocated before
certain split point and block number of bitmap pages inside
hash_metap().

spares = palloc0(HASH_MAX_SPLITPOINTS * 5 + 1);

mapp = palloc0(HASH_MAX_BITMAPS * 5 + 1);

I guess it would be better to add some comments if using any hard coded values.

It is the space needed to output the values.

4). Inside hash_page_stats(), you are first calling verify_hash_page()
to verify if it is a hash page or not and

No, we assume that the page is a valid hash page structure, since there
is an explicit cast w/o any further checks.

then calling
GetHashPageStatistics() to get the page stats wherein you are trying
to check what is the type of hash page. Below is the code snippet for
this,

+   /* page type (flags) */
+   if (opaque->hasho_flag & LH_META_PAGE)
+       stat->type = 'm';
+   else if (opaque->hasho_flag & LH_OVERFLOW_PAGE)
+       stat->type = 'v';
+   else if (opaque->hasho_flag & LH_BUCKET_PAGE)
+       stat->type = 'b';
+   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
+       stat->type = 'i';
+   else
+       stat->type = 'u';

My question is, can we have a hash page that does not fall under the
category of Metapage, overflow page, bitmap page, bucket page? I guess
we can't have such type of hash page and incase if we found such type
of undefined page i guess we should error out instead of reading the
page because it is quite possible that the page would be corrupted.
Please let me know your thoughts on this.

The "if" statement will need updating once the CHI patch goes in, as it
defines new constants for page types.

However, as the other pageinspect function it assume that the operator
is passing valid data.

5). I think we have added enough functions to show the page level
statistics but not the index level statistics like the total number of
overflow pages , bucket pages, number of free overflow pages, number
of bitmap pages etc. in the hash index. How about adding a function
that would store the index level statistics?

Sure, additional functions could be of benefit later on. Feel free to
investigate that possibility for a future CommitFest.

Thanks for your feedback !

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Michael Paquier (#12)
Re: pageinspect: Hash index support

On 09/21/2016 02:14 AM, Michael Paquier wrote:

Adjusted in v4.

Thanks for the new version.

Code/doc will need an update once the CHI patch goes in.

If you are referring to the WAL support, I guess that this patch or
the other patch could just rebase and adjust as needed.

It is the main patch [1] that defines the new constants for page type.
But I'll submit an update for pageinspect when needed.

hash_page_items and hash_page_stats share a lot of common points with
their btree equivalents, still doing a refactoring would just impact
the visibility of the code, and it is wanted as educational in this
module, so let's go with things the way you suggest.

Ok.

+     <para>
+      The type information will be '<literal>m</literal>' for a metadata page,
+      '<literal>v</literal>' for an overflow page,
'<literal>b</literal>' for a bucket page,
+      '<literal>i</literal>' for a bitmap page, and
'<literal>u</literal>' for an unused page.
+     </para>
Other functions don't go into this level of details, so I would just
rip out this paragraph.

I'll add an annotation for this part, and leave it for the committer to
decide, since Jeff wanted documentation for the 'type' information.

The patch looks in pretty good shape to me, so I am switching it as
ready for committer.

Thanks for your feedback !

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael@paquier.xyz
In reply to: Jesper Pedersen (#14)
Re: pageinspect: Hash index support

On Wed, Sep 21, 2016 at 9:21 PM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

On 09/21/2016 07:24 AM, Ashutosh Sharma wrote:
git apply w/ v4 works here, so you will have to investigate what happens on
your side.

Works here as well.

I continued reviewing your v4 patch and here are some more comments:

1). Got below error if i pass meta page to hash_page_items(). Does
hash_page_items() accept only bucket or bitmap page as input?

postgres=# SELECT * FROM hash_page_items(get_raw_page('hash_i4_index',
0));
ERROR: invalid memory alloc request size 18446744073709551593

page_stats / page_items should not be used on the metadata page.

As these functions are marked as superuser only it is expected that people
provides the correct input, especially since the raw page structure is used
as the input.

btree functions use the block number to do some sanity checks. You
cannot do that here as only bytea functions are available, but you
could do it in verify_hash_page by looking at the opaque data and look
at LH_META_PAGE. Then add a boolean argument into verify_hash_page to
see if the caller expects a meta page or not and just issue an error.
Actually it would be a good idea to put in those safeguards, even if I
agree with you that calling those functions is at the risk of the
user... Could you update the patch in this sense?

I had fun doing the same tests, aka running the items and stats
functions on a meta page, and the meta function on a non-meta page,
but at my surprise I did not see a crash, so perhaps I was lucky and
perhaps that was because of OSX.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#14)
Re: pageinspect: Hash index support

Hi,

git apply w/ v4 works here, so you will have to investigate what happens on
your side.

Thanks, It works with v4 patch.

As these functions are marked as superuser only it is expected that people
provides the correct input, especially since the raw page structure is used
as the input.

Well, page_stats / page_items does accept bitmap page as input but
these function are not defined to read bitmap page as bitmap page
doesnot have a standard page layout. Infact if we pass bitmap page as
an input to page_stats / page_items, the output is always same. I
think we need to have a separete function to read bitmap page. And i
am not sure why should a server crash if i pass meta page to
hash_page_stats or hash_page_items.

The "if" statement will need updating once the CHI patch goes in, as it
defines new constants for page types.

Not sure if CHI patch is adding a new type of hash page other than
holding an information about split in progress. Basically my point was
can we have hash page of types other than meta page, bucket page,
overflow and bitmap page. If pageinspect tool finds a page that
doesnot fall under any of these category shouldn't it error out.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Michael Paquier (#16)
Re: pageinspect: Hash index support

On 09/21/2016 08:43 AM, Michael Paquier wrote:

page_stats / page_items should not be used on the metadata page.

As these functions are marked as superuser only it is expected that people
provides the correct input, especially since the raw page structure is used
as the input.

btree functions use the block number to do some sanity checks. You
cannot do that here as only bytea functions are available, but you
could do it in verify_hash_page by looking at the opaque data and look
at LH_META_PAGE. Then add a boolean argument into verify_hash_page to
see if the caller expects a meta page or not and just issue an error.
Actually it would be a good idea to put in those safeguards, even if I
agree with you that calling those functions is at the risk of the
user... Could you update the patch in this sense?

I had fun doing the same tests, aka running the items and stats
functions on a meta page, and the meta function on a non-meta page,
but at my surprise I did not see a crash, so perhaps I was lucky and
perhaps that was because of OSX.

Attached is v5, which add basic page verification.

Thanks for the feedback !

Best regards,
Jesper

Attachments:

0001-pageinspect-Hash-index-support_v5.patchtext/x-patch; name=0001-pageinspect-Hash-index-support_v5.patchDownload+933-234
#19Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#12)
Re: pageinspect: Hash index support

On Tue, Sep 20, 2016 at 11:14 PM, Michael Paquier <michael.paquier@gmail.com

wrote:

+     <para>
+      The type information will be '<literal>m</literal>' for a metadata
page,
+      '<literal>v</literal>' for an overflow page,
'<literal>b</literal>' for a bucket page,
+      '<literal>i</literal>' for a bitmap page, and
'<literal>u</literal>' for an unused page.
+     </para>

Other functions don't go into this level of details, so I would just
rip out this paragraph.

I'd argue that the other functions should go into that level detail in some
places. Pageinspect is needlessly hard to use; not all precedent is good
precedent. Some of them do refer you to header files or README files,
which can be useful. But the abbreviations used here are not explained in
any header file or README file, so I think the right place to explain them
is the documentation in that case. Or change from the single-letter strings
to full name strings so they are self-documenting.

Cheers,

Jeff

#20Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#18)
Re: pageinspect: Hash index support

On 9/21/16 9:30 AM, Jesper Pedersen wrote:

Attached is v5, which add basic page verification.

There are still some things that I found that appear to follow the old
style (btree) conventions instead the new style (brin, gin) conventions.

- Rename hash_metap to hash_metapage_info.

- Don't use BuildTupleFromCStrings(). (There is nothing inherently
wrong with it, but why convert numbers to strings and back again when it
can be done more directly.)

- Documentation for hash_page_stats still has blkno argument.

- In hash_metap, the magic field could be type bytea, so that it
displays in hex. Or text like the brin functions.

Some other comments:

- hash_metap result fields spares and mapp should be arrays of integer.

(Incidentally, the comment in hash.h talks about bitmaps[] but I think
it means hashm_mapp[].)

- As of very recently, we don't need to move pageinspect--1.5.sql to
pageinspect--1.6.sql anymore. Just add pageinspect--1.5--1.6.sql.

- In the documentation for hash_page_stats(), the "+" sign is misaligned.

- In hash_page_items, the fields itemlen, nulls, vars are always 16,
false, false. So perhaps there is no need for them. Similarly, the
hash_page_stats in hash_page_stats is always 16.

- The data field could be of type bytea.

- Add a pointer in the documentation to the relevant header file.

Bonus:

- Add subsections in the documentation (general functions, B-tree
functions, etc.)

- Add tests.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#20)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Amit Kapila (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#22)
#24Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#20)
#25Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Amit Kapila (#21)
#26Jeff Janes
jeff.janes@gmail.com
In reply to: Jesper Pedersen (#24)
#27Jeff Janes
jeff.janes@gmail.com
In reply to: Michael Paquier (#6)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Janes (#27)
#29Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#24)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#29)
#31Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#29)
#32Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#31)
#33Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#32)
#34Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#20)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#33)
#36Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#35)
#37Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Jesper Pedersen (#37)
#39Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#37)
#40Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#39)
#41Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#39)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#41)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#43)
#45Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Tom Lane (#44)
#46Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#45)
#47Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#46)
#48Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#45)
#49Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#48)
#50Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#49)
#51Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#50)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#51)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#52)
#54Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Tom Lane (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Jesper Pedersen (#54)
#56Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#55)
#57Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jesper Pedersen (#54)
#58Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#55)
#59Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Alvaro Herrera (#57)
#60Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#58)
#61Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jesper Pedersen (#60)
#62Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#61)
#63Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#62)
#64Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#63)
#65Amit Kapila
amit.kapila16@gmail.com
In reply to: Jesper Pedersen (#60)
#66Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#65)
#67Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Ashutosh Sharma (#66)
#68Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#67)
#69Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#66)
#70Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#69)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#70)
#72Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Ashutosh Sharma (#68)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#72)
#74Peter Eisentraut
peter_e@gmx.net
In reply to: Jesper Pedersen (#67)
#75Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#73)
#76Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#73)
#77Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Peter Eisentraut (#74)
#78Michael Paquier
michael@paquier.xyz
In reply to: Ashutosh Sharma (#76)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#76)
#80Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#79)
#81Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#80)
#82Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Robert Haas (#81)
#83Michael Paquier
michael@paquier.xyz
In reply to: Jesper Pedersen (#82)
#84Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#81)
#85Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jesper Pedersen (#82)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jesper Pedersen (#85)
#87Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#86)
#88Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Robert Haas (#87)
#89Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jesper Pedersen (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Jesper Pedersen (#89)
#91Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Ashutosh Sharma (#75)
#92Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#90)
#93Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#92)
#94Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#93)
#95Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#94)
#96Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Jesper Pedersen (#1)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#96)
#98Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Robert Haas (#97)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#98)
#100Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Robert Haas (#99)
#101Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Mithun Cy (#100)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Sharma (#101)