Cache Hash Index meta page.

Started by Mithun Cyalmost 10 years ago45 messageshackers

mithun.cy@enterprisedb.com

almost 10 years ago

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1]Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/> we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

[1]: Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/>

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Jeff Janes

jeff.janes@gmail.com

almost 10 years ago

In reply to: Mithun Cy (#1)

Re: Cache Hash Index meta page.

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number
in hasho_prevblkno of bucket primary page (which will always be NULL other
wise, so reusing it here for this cause!!!).

If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?

What happens on a system which has gone through pg_upgrade? Are we
sure that those on-disk representations will always have
InvalidBlockNumber in that fields? If not, then it seems we can't
support pg_upgrade at all. If so, I don't see a provision for
properly dealing with pages which still have InvalidBlockNumber in
them. Unless I am missing something, the code below will always think
it found the right bucket in such cases, won't it?

if (opaque->hasho_prevblkno <= metap->hashm_maxbucket)

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

almost 10 years ago

In reply to: Jeff Janes (#2)

Re: Cache Hash Index meta page.

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Is this really safe? The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is. I do not think the hash code is equally robust
about stale data in its metapage.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number
in hasho_prevblkno of bucket primary page (which will always be NULL other
wise, so reusing it here for this cause!!!).

If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?

No, because on other pages that still means what it used to. Nonetheless,
I agree that's a particularly ugly wart, and probably a dangerous one.

What happens on a system which has gone through pg_upgrade?

That being one reason why. It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

almost 10 years ago

In reply to: Tom Lane (#3)

Re: Cache Hash Index meta page.

On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Is this really safe? The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is. I do not think the hash code is equally robust
about stale data in its metapage.

I think stale data in metapage could only cause problem if it leads to
a wrong calculation of bucket based on hashkey. I think that
shouldn't happen. It seems to me that the safety comes from the fact
that required fields (lowmask/highmask) to calculate the bucket won't
be changed more than once without splitting the current bucket (which
we are going to scan). Do you see a problem in hashkey to bucket
mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by
one additional table half or do you have something else in mind?

What happens on a system which has gone through pg_upgrade?

That being one reason why. It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.

Good idea.

- if (retry)
+ if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)

This code seems to be problematic with respect to upgrades, because
hasho_prevblkno will be initialized to 0xFFFFFFFF without the patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jesper Pedersen

jesper.pedersen@redhat.com

over 9 years ago

In reply to: Mithun Cy (#1)

Re: Cache Hash Index meta page.

On 07/22/2016 06:02 AM, Mithun Cy wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

Could you provide a rebased patch based on Amit's v5 ?

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Jesper Pedersen (#5)

Re: Cache Hash Index meta page.

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Mithun Cy (#6)

Re: Cache Hash Index meta page.

On Tue, Sep 6, 2016 at 12:20 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I think you want to say based on patch in the below mail:
/messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com

It is better if we can provide the link for a patch on which the
current patch is based on, that will help people to easily identify
the dependent patches.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jesper Pedersen

jesper.pedersen@redhat.com

over 9 years ago

In reply to: Mithun Cy (#6)

Re: Cache Hash Index meta page.

On 09/05/2016 02:50 PM, Mithun Cy wrote:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.

I think that the

+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

trick should be documented in the README, as hashm_maxbucket is defined
as uint32 where as hasho_prevblkno is a BlockNumber.

(All bucket variables should probably use the Bucket definition instead
of uint32).

For the archives, this patch conflicts with the WAL patch [1]/messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com.

[1]: /messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com
/messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Jesper Pedersen (#8)

Re: Cache Hash Index meta page.

On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com

wrote:

For the archives, this patch conflicts with the WAL patch [1].

[1] /messages/by-id/CAA4eK1JS+SiRSQBzEFp

nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1]Concurrent Hash index. </messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com> and
Amit's wal for hash index patch[2]Wal for hash index. </messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com> -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com together.

[1]: Concurrent Hash index. </messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com>
</messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com>
[2]: Wal for hash index. </messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com> -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
</messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com>
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#10

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Mithun Cy (#9)

Re: Cache Hash Index meta page.

On Tue, Sep 13, 2016 at 12:55 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <
jesper.pedersen@redhat.com> wrote:

For the archives, this patch conflicts with the WAL patch [1].

[1] /messages/by-id/CAA4eK1JS+SiRSQBzEFp

nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1] and
Amit's wal for hash index patch[2] together.

I think that this needs to be updated again for v8 of concurrent and v5 of
wal

Thanks,

Jeff

#11

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Jeff Janes (#10)

Re: Cache Hash Index meta page.

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that this needs to be updated again for v8 of concurrent and v5

of wal

Adding the rebased patch over [1]Concurrent Hash index. </messages/by-id/CAA4eK1+X=8sUd1UCZDZnE3D9CGi9kw+kjxp2Tnw7SX5w8pLBNw@mail.gmail.com> + [2]Wal for hash index. </messages/by-id/CAA4eK1KE=+kkowyYD0vmch=ph4ND3H1tViAB+0cWTHqjZDDfqg@mail.gmail.com>

[1]: Concurrent Hash index. </messages/by-id/CAA4eK1+X=8sUd1UCZDZnE3D9CGi9kw+kjxp2Tnw7SX5w8pLBNw@mail.gmail.com>
</messages/by-id/CAA4eK1+X=8sUd1UCZDZnE3D9CGi9kw+kjxp2Tnw7SX5w8pLBNw@mail.gmail.com>
[2]: Wal for hash index. </messages/by-id/CAA4eK1KE=+kkowyYD0vmch=ph4ND3H1tViAB+0cWTHqjZDDfqg@mail.gmail.com>
</messages/by-id/CAA4eK1KE=+kkowyYD0vmch=ph4ND3H1tViAB+0cWTHqjZDDfqg@mail.gmail.com>

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#12

Michael Paquier

michael@paquier.xyz

over 9 years ago

In reply to: Mithun Cy (#11)

Re: Cache Hash Index meta page.

On Thu, Sep 29, 2016 at 12:55 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that this needs to be updated again for v8 of concurrent and v5

of wal

Adding the rebased patch over [1] + [2]

[1] Concurrent Hash index.
[2] Wal for hash index.

Moved to next CF.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Mithun Cy (#1)

Re: Cache Hash Index meta page.

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try
to read meta page buffer twice just to make sure bucket is not split after
we found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

[1] Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/>

Hi Mithun,

Can you describe your benchmarking machine? Your benchmarking data went up
to 128 clients. But how many cores does the machine have? Are you testing
how well it can use the resources it has, or how well it can deal with
oversubscription of the resources?

Also, was the file supposed to be named .ods? I didn't find it to be
openable as an .odc file.

Cheers,

Jeff

#14

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Jeff Janes (#13)

Re: Cache Hash Index meta page.

On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you describe your benchmarking machine? Your benchmarking data went

up to 128 clients. But how many cores does the machine have? Are

you testing how well it can use the resources it has, or how well it can

deal with oversubscription of the resources?

It is a power2 machine with 192 hyperthreads.

Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191

Also, was the file supposed to be named .ods? I didn't find it to be

openable as an .odc file.

Yes .ods right it is a spreadsheet in ODF.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#15

Jesper Pedersen

jesper.pedersen@redhat.com

over 9 years ago

In reply to: Mithun Cy (#11)

Re: Cache Hash Index meta page.

On 09/28/2016 11:55 AM, Mithun Cy wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that this needs to be updated again for v8 of concurrent and v5

of wal

Adding the rebased patch over [1] + [2]

As the concurrent hash index patch was committed in 6d46f4 this patch
needs a rebase.

I have moved this submission to the next CF.

Thanks for working on this !

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Jesper Pedersen (#15)

Re: Cache Hash Index meta page.

On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

As the concurrent hash index patch was committed in 6d46f4 this patch

needs a rebase.

Thanks Jesper,

Adding the rebased patch.

I have re-run the pgbench readonly tests with below modification.

"alter table pgbench_accounts drop constraint pgbench_accounts_pkey"
postgres
"create index pgbench_accounts_pkey on pgbench_accounts using hash(aid)"
postgres

*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode = -M prepared -S (prepared readonly mode).

Machine used:
power2 with sufficient ram for above shared_buffer.

#############lscpu
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A

*Clients *

*Cache Meta Page patch *

*Base code with amits changes*

* %imp*

17062.513102

17218.353817

-0.9050848685

138525.808342

128149.381759

8.0971335488

212278.44762

205870.456661

3.1126326054

369453.224112

360423.566937

2.5052904425

*64*

*576090.293018*

*510665.044842*

*12.8117733604*

*96*

*686813.187117*

*504950.885867*

*36.0158396272*

*104*

*688932.67516*

*498365.55841*

*38.2384202789*

*128*

*730728.526322*

*409011.008553*

*78.6574226711*

Appears there is a good improvement at higher clients.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#17

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Mithun Cy (#16)

Re: Cache Hash Index meta page.

On Tue, Dec 6, 2016 at 1:28 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

*Clients *

*Cache Meta Page patch *

*Base code with amits changes*

* %imp*

1

17062.513102

17218.353817

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

*64*

*576090.293018*

*510665.044842*

*12.8117733604*

*96*

*686813.187117*

*504950.885867*

*36.0158396272*

*104*

*688932.67516*

*498365.55841*

*38.2384202789*

*128*

*730728.526322*

*409011.008553*

*78.6574226711*

All the above readings are median of 3 runs.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#18

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Mithun Cy (#16)

Re: Cache Hash Index meta page.

On Mon, Dec 5, 2016 at 2:58 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <
jesper.pedersen@redhat.com> wrote:

As the concurrent hash index patch was committed in 6d46f4 this patch

needs a rebase.

Thanks Jesper,

Adding the rebased patch.

-        bucket = _hash_hashkey2bucket(hashkey,
-                                      metap->hashm_maxbucket,
+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                       metap->hashm_highmask,
                                       metap->hashm_lowmask);

This hunk appears useless.

+ metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

+ /* Cache the metapage data for next time*/

Whitespace.

+ /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Shouldn't _hash_doinsert() be using the cache, too?

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

Mithun Cy

mithun.cy@enterprisedb.com

over 9 years ago

In reply to: Robert Haas (#18)

Re: Cache Hash Index meta page.

Thanks Robert, I have tried to address all of the comments,
On Tue, Dec 6, 2016 at 2:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:

+ bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
metap->hashm_highmask,
metap->hashm_lowmask);

This hunk appears useless.

+ metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

Fixed.

+ /* Cache the metapage data for next time*/

Whitespace.

Fixed.

+ /* Check if this bucket is split after we have cached the
metapage.

Whitespace.

Fixed.

Shouldn't _hash_doinsert() be using the cache, too?

Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.

I did some pgbench simple-update tests for same, with below changes.

-               "alter table pgbench_branches add primary key (bid)",
-               "alter table pgbench_tellers add primary key (tid)",
-               "alter table pgbench_accounts add primary key (aid)"
+               "create index pgbench_branches_bid on pgbench_branches
using hash (bid)",
+               "create index pgbench_tellers_tid on pgbench_tellers using
hash (tid)",
+               "create index pgbench_accounts_aid on pgbench_accounts
using hash (aid)"

And, removed all reference keys. But I see no improvements; I will further
do benchmarking for copy command and report same.

Clients

After Meta page cache

Base Code

%imp

2276.151633

2304.253611

-1.2195696631

36816.596513

36439.552652

1.0347104549

50943.763133

51005.236973

-0.120524565

128

49156.980457

48458.275106

1.4418700407
Above result is median of three runs, and each run is for 30mins.

*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode = -M prepared -N (prepared simple-update).

*Machine used:*
power2 same as described as above.

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
uint32s) but we also need

*uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
35*4 = 140 bytes.

The comments in this patch need some work, e.g.:
-
+       oopaque->hasho_prevblkno = maxbucket;
No comment?

I have tried to improve commenting part in the new patch.

Apart from this, there seems to be some base bug in _hash_doinsert().
+ * XXX this is useless code if we are only storing hash keys.

+ */

+ if (itemsz > HashMaxItemSize((Page) metap))

+ ereport(ERROR,

+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+ errmsg("index row size %zu exceeds hash maximum %zu",

+ itemsz, HashMaxItemSize((Page) metap)),

+ errhint("Values larger than a buffer page cannot be
indexed.")));

"metap" (HashMetaPage) and Page are different data structure their member
types are not in sync, so should not typecast blindly as above. I think we
should remove this part of the code as we only store hash keys. So I have
removed same but kept the assert below as it is.
Also, there was a bug in the previous patch. I was not releasing the bucket
page lock if cached metadata is old, now same is fixed.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#20

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Mithun Cy (#19)

Re: Cache Hash Index meta page.

On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Shouldn't _hash_doinsert() be using the cache, too

Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.

Why would you need to read it at least once in one case but not the other?

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
uint32s) but we also need

*uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
35*4 = 140 bytes.

Well, I guess that makes it more appealing to cache the whole page at least
in terms of raw number of bytes, but I suppose my real complaint here is
that there don't seem to be any clear rules for where, whether, and to what
extent we can rely on the cache to be valid. Without that, I think this
patch is creating an extremely serious maintenance hazard for anyone who
wants to try to modify this code in the future. A future patch author
needs to be able to tell what data they can use and what data they can't
use, and why.

Apart from this, there seems to be some base bug in _hash_doinsert().

+ * XXX this is useless code if we are only storing hash keys.

+ */

+ if (itemsz > HashMaxItemSize((Page) metap))

+ ereport(ERROR,

+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+ errmsg("index row size %zu exceeds hash maximum %zu",

+ itemsz, HashMaxItemSize((Page) metap)),

+ errhint("Values larger than a buffer page cannot be
indexed.")));

"metap" (HashMetaPage) and Page are different data structure their member
types are not in sync, so should not typecast blindly as above. I think we
should remove this part of the code as we only store hash keys. So I have
removed same but kept the assert below as it is.