Cache Hash Index meta page.

Started by Mithun Cyover 9 years ago45 messages
#1Mithun Cy
mithun.cy@enterprisedb.com
3 attachment(s)

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1]Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/&gt; we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

[1]: Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/&gt;

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

hash_index_perf_stat_and_benchmarking.odcapplication/vnd.oasis.opendocument.chart; name=hash_index_perf_stat_and_benchmarking.odcDownload
PKO�H�l9�..mimetypeapplication/vnd.oasis.opendocument.spreadsheetPKO�H�g�SH�H�Thumbnails/thumbnail.png�PNG


IHDR���o��IDATx���`7�8<�L�z��L�����
�i�������rS�4�4����d'q����3�=I3���8��]��/��3������G�<)8��(�eY�LF�������DiN�K����Gr#}��2��V$�-��_����G�W>���.���4�b��� ���>���:����C������$���(&z�����9�����=��������T=�W=�s��y?�U�)�!}��p��/��/���������;�?��vN���H�n����
�m�����cr�������*�=��y��������%X{�����`O��\��
�����B����p�mO|�E�$LR��b�4��%�������s>���O���hK+��.i)����O<}�K��(����Gf8H�z��O�����F������=�?R����0�+�>=��w��#=��q�l�w ��y'��6=��F���T��=���P����=EF�Q�5`����x��@���w�����w������������{���G�<>[�}���0{?0����y��wN�=����F��s"������w�.����v]�v/�]��W��9���w#����9��>��c�x��vywP���w��i��c>k����I�
��
�z����`p\�|>��/q>ia�{�QL@(���{Ls>��, �yo�P
�������I��~��A9'����_����������q��f����S��x��9������[�3C��cx`�=�=�X��?�=���
�����O��@H��D�oO��W�w��w��������z�%�OZ����b�=��=����p�ab=���rl�����y�6��z(p{(:{��_���{�����w���.�8��XD�i�T�<�{1Q���������Q��������>�V$�-��_�����^�|��3]���n�=���[�\��������/g`��y��<J�=�[�R���D@�>�$����4�4���{�K+b%��cR�i�������'�>�Y�e��Q^(_��q����p>��qB�:&�G����TF���v��Q��P��=]�+T����7�L+Hd�KdX�e�\���hS">�B:���R7z�N���Yh���t���������_���a�Q�x������n�����bv�s(q�(����� =IY�$���Q�=�t.tBz"I�`(��|)~�d�S��0�L�����	������R�x4��#	�n���J���@�$ii���c���@|�B;�w\-\���o��H�A����dT|\���A��n0�a���	\��$$�������-)q���vTV�U����U(X�H�Q����f99xNR�8���Z�
=��6�
�;�������� " �]Z/���C:&��R����5:�"������o�q>>�����y�rs�sq>���b��r����a��4GW�G���ysl���OHM�M��95I��k�o�i65�2�	�ZXt&7|)���^wu�������9EIp8�J�!>i~ZZ/<��d��o����U��g�9��=��QB�H���H�iR"e�.>���-]������' )Rq��P5a*s�9�����oOS�/�}�uD�������A
����1�������v~\1����t_��
�]�7zn4���B[��)��n��I����KTA�������]��$�#�=�������cP��{F��o(�mhJ7*����:kPYG��������2��-bo{�-a����2�F�:�������}��;�9kw3`��(�(�/�0���|�`��;�V��R�jM�bTs���h���}��lz���}�����s����q�8��q��:�����o�|�����'�r�Qs����C���uGkK�_�����!gN��Dmk�M�r|���<S�;E?/��	V9tJ'���"�z������m�fi���':����7�nq>���*%��n�������08(<04KM�::1�(*���
���:��A���B"k�����V�<�n^JTCE���8���d��[������~ �8��/q���w�|���y2������n�����{�b7PY/���klp0'S����KuE,"�U�d�djH���?2p�(��NN�b
X��1��	�@|4o�P�J�������;�H��"�+o�`=2�l�JA������KA?����;rq�}tq~�]������W�������2�6+�����D����L��2�J�0����T
�����cyu���fz��(���i@�F�Z�4�]���>�Q�,���W�J�J�M�w3��q��������~>.cun�G"��)%�Iu�� EyNur����c��Ay���d��A��l
���jk������P�������}��xh)\x0�r����>���Nl�G?P�'!zO���sh���z�W�&^�iN�E��eO�S����S) �*=�x�|t#G4.���J����Q~�$�>�R�G�j��W��h���ih�d������F���\B�nGX(MK�y��fe����W��3=��f���9X�L�8������5+5RC�����D�/����������Q�)���TV��>�=��Ae��&������|�-�CB����m��S�_Q�����������V
o/[�W�}1O+�G�t��+�g
�?��&��9%p>���;������G�LF3�j�����x�(��Z��L����o����
�Ay��Jw�$�����b-�z��<�I=��#>��}�z�.�k>t>�|Qi���[n��k�-������/�4;!!�������-l��&ii��kZ����w?���=|Nd���B��}4Nw�wLR�i�^>w"��T��U�N������9����/OH���8X!i���
�w���t7������>g�s��h�H9L�K��z���	��<�CIC
.����

��t�����p��WX�[�\8?�G������p>��7/217�E?_���S�Hc��)��#�����6%1u�I��fY�E��3����#z�����T�(/.�S�����+����4(�b���|N>���{"OS,�*2�<0Uqd�E���9���n4���(�<��k�`r�I����z5F��y�\�a�����(��+�~,g�7�0���Y�go���s����T�t�c)���KSn�T���{��S�Wze�	)�de_�K�Eb�yE��w�
�	{Si��Q���6�S�QJ�r��'��2p�w���/��:��\�\.#����\����=�����"�AJ.,��NHs�d�9�#�puQH�ap�����qc�-�2i��f����*�e�V�bWb�����:;��}���`BM��n,�0	������\��k���
�V[���=����������P|��L0�"tMM���n���oLA��p�&���V+n��t����.�����w�8��^70>��M�4�r���LNy�}G��0}F!����.��6�V�M����9)N��������N����#��T���0�h?�����9m�=��=���O�Ax;Z���"�r������
���>q�����g��r��?��DEq��/���U�m�h6;�.9�p0uu�m������Mj�"6�(f���G+`�C���j����.�`��7�m���b���SK�zp��E@|�����v��q�tK{�---HS
����5�'�7����}����wDD.�[r�P��mg`������;�W,/hs/+���xS����/_v���=<\?n|��?wu�/�S(���A�B>ej���"�^l�j��=P�CB��/.^��l��������c~��`g�m�������_Z��d��-)I����r�2���=�?dX�V�����9�1#Ge��u���,�,�Wo��"�4'O4B�E�G�L��	���>�������a��'����$(����+W���8(	�x��#�fAA�I�s[��MC��������P�jq��3["%A�@y�����N�����\Nf��>K�nj2���5����$�\�X��,4L����;|D���g��������0)�h����g[�������������n��������_tQ�G�u��������������3��Fi;��'N'�����~p�7_��������~�e�����]���c�`FF���I��Jy���S��m�zz���%%	0=>���� 5���7^[g��k���b'���^!v��,+k��n�2;RR�0��}�"�g��30���`�u�,��n�����es�������7�*+[���wl?�q�	�A
�z��*�T�E�xz�*�W(^yy1�]��G�KA�W�������|2��C���L���4��Bi�5�t������)?��G���@��<;��O�<���?��
H�{����������Dd�f�l>1����n����6������J��J'3�,�����������[NefE���Z��)aO==�����W^5z`�����:F����v�`
nAA,��n�&}��B��/�<m�W�_za�;�]"��op�w*����C���������c,z�q�����$$������h�<��R����p�]0eA|�G��b[||���Y�?8�b���s:���~�����	P�o�V�m={��� }K���Kym�Q�'����>���S���yz�ONR*�{��	�9�UU� ���{���1�mv�4B@�[��]���
�O������+�Z-�)>u�9;;���
�����~���8��/����@
6�n&M����,P�����`
Lc�x�NF�=HQ*���]�]�W�������������zV�t�9�����}*�d�[�ZB?.���6�[ �A�S@��
=R��U_�)��|E,���W�]����CR�%���]�����&;Z���A�;.K�E;��/���K����@S)*���i0�P��)��9rr�E���b���a�����>-=^{�@���y��mmV�Nmv=��Tb����R�	��i��Ac�	
*Mf�{�V����v�L���o��g
�������z��7�0������`����F�2vlvrRXG�
�f�~`6�u�(�$������ l6�����7&��'p�~� ���K���c!��W�%O����m��������@��?0D
pt��iU����pO�A��+6����.�w�K��R<=&�L&���T�%%pYD�q!�����h^�iIz�����j.b%H.#3����	�`&r��=��v���M+�XFVd~~��'PHd��B�G
�d?������[�1�(�B�q�<&���>�G�u��t��"�Q;Y	�mI�D���Skk����a��>0���CI@��(~���j�����'�����'@��	�|���!����S��� �00���-�P���zX#�������r-������2
o�A�'p,�����Qbg��C	���0��x@[@��������D��K���k!���F����7�2Dt�q�_��&��N��X#7�.N� �s[���{�W������S��������qGj�(���E��R��c����3\&���|J��� �#�#�V|��
�@���EG�l���t� e(5}(a�G�!��V������]�[M��M*>�\�E�Jo<�%��DBX;����1���B&��j���k�J����|�p��aikV���$���!v�V(;rT�tf�"��������\�����=&2��[9���e1������D��E�����HsN�%��/R��o"y�����
��QL�A�D��l�����KHt����"�5�����b�g_K�A:���� �M=�A���0��j��#%E�#Qlycc�?o_����XF���xn�����+o����}g}yY�?��L��6�fsj�J�SN��G�@��?�����V ��v����~���_]��9����&���.�O_|~9T��i�?� ��6&�=R#�������Xv���������5��EF-[zd���w�;
,����.��������
�EUe����teekg�=++R��,���Z��m��P}h��O�h���������rr����/��"3#b��\�A�C�#������q��*�5�z5t��������+v{	�s9�R�~�	PcA��~W�99y��e���!�.��3@s�����>r�,�]@�}���6m8���� U]�����\3{�T����es�����w��q�l���8q�(c��E����7^��dnm���[� �e5#����jj�`,�*w1����e���A@m��q�����n��#W����6����}�����
��01L�����l�|��P%�gQ��b� �u�,�6�,�o�V�\qt����_Z	�	�Ek����S��WW�m�v���������(P��r�%����7��7v=����_[a=&32*��b�o�8�D/�~����<~B�������aq����kk��*��#u�l���}��0���l�|j����CR��)on������]v�L����q���"?	%�F�`]$/�u:�w�?6<}����[N93S����� ���0��)w��&����.[rd��cq�!�7UdeGQ��:�<�������o
��Hp�f������g`c:�
�b����7�p3)�~NX�{����|������\�����v���V@8x�j:��?q��$PW��r��Wu<PjTT��7�n���li��������*,���g�z����������n�	M:���p:\������o_#�;�CR����O4��w.M)YY��P����h��
�-�B?��V�����pX��(�v=(G2"�A��Ln�qp[���0��/��x��3b���
(\G�����  �m�N���q�����[>ir���TV�w�������A�UZ
O��8�Z�5*�d�A��^�������.���k7W�B.�&��h�LH�j���V'��ON|��_����'wu��
!�-�O�k�+h����l��DeX��W
��u���VM�_�����2
��������*Q]
,v��E
~�Q��; ���i���j�<.;sf�V�p��FMq_�?�S������� ���
�,�k�,����Q�3����F��k������]}�@M@���3�MSk�V��������si_PE@�>��D ����i������[��-�*`������X#��o���=��L�QSn����^�7qR���Cdsp�N�\��(�U���K/\���;�i5������3eJ����k��?���ehCP3�H��fx���;��[{Z\P����S���'��]^�1�����O���bn�L���}��QVt�-IRfcM�8���[��5�������L�E�/������
���;���w�s�=%5��''��7��%m���E�z2W�01I|��q�(hr���.��?F-�UW��.���I��7�T����]/�(�P��"�S������.��	����`�0@]��T0Hg�|&S�L��GQy�v�lr��R��J�����7Llf"?���=���	*��N������b��V���&BGtm[,Y���7Guq��L��-�Q�����f��2 ���8��oR�a'0+u|P^�I�t �j�T`Q�kM�����fRR������E��<�8�E�ts�������mNf��/��)����Zp��.���E	4J��8b|6������I�C-$�x�G��&+���l��
R�d��P5M���%r`H��Dh�UJ��h7�A�K�!bb���A.d2�&%��I
Z%��%���K��[������tO30pYQ}����������nt��v��J�N�v�]�Z 3hKZ��,��#�s���2��dh��&���V�5�������d9o���E�-���/'H^�-�~�A
��bPc�������H�&��C���Q�
��%+���	p��Y,���|T�q��'���;/j�=�|`�������N����(�22<;w��l��%	z����;�#��.[r$3+��9���v���'~�T�W_l�t�<�L�H#Q�R�U1cfa�Iw�
�^u��������0�5JW8�EV���45u-_z����s��%l^��F/�|�WtKq�[nR�m���h`�d�~E��y�����-����DF��'�C.Kx������G^��}�ZcC����4)�����|�M���,�zgl����&$p�x����6�]�[0iGe�Y����yt<�!���*N6M��7yJ������4[�.PQeek[�5;;J���<�P3�qyy�,�x����C���\;p���{�T'��8����V���Z ���GL��������*[c�B�����w��~����$FS/�
��}�P����M`eC?\yU�?~?�_��qY�����&<�����w�Z��������q�>���������w=�N��l�����I�s��?q�`MQq|ii
�;p/�HC��"�F��HN��M~�rD��a�������A�n����d�����
R/ohh����0 �������#G���|�.����!��$}���N�j~����������rt1lS��#���|��/���q�F�yl��eK�o��Y]�������^4����e��������7\��=���<�y�c�e}8o����>����_���v?��$�N)n����v8\K�����?�|+<{���[������������w�V��C��edD������C��7U��~f���?�8���W)�����	����������SR���y��mb��o�lo�� ��X0����������>�������i'�� 1)9�������?�Xvh��V���z�F		B�d���V�9�O.��|%��+/k���'����������6|���!CR���
���G�����m��	d���Z���0����_���7MMh#�/M�����=���WlA�mi���A���|!!��F����+)14� r��X���:P�F��$K�bC��@����B�����p��	<���g���:��7�<�����������r�h���������
�8::��;�EE���Y���x��?>��
�k�7���G02f���A�8��>pY���V�@2(!!$**�*>!d����;�8P
rp��>AA���Z%��v`Ff$0'(CH���z���>
�i��=���K�`��}z���O��?.��t����wf���>xCLt0P���\�����Fxw�J)�q����N\tQ0�g�Z:��W��&�f)���4�_`K�x��bh�m�V�nf�,���l�������JM���90��f�Gn:q���;G|�`������h�UUmi��A0I����������������=!1��x���#3N�h��3j����+���������L��_�G�
��48	�H0�u����s������`N���@'O���V)�sr���Z���Y	�hom\|0�`X��a\�����F���8+����|<��g�L�qA������w�5K^j��d��k����������%-Rk���6�%V������6xII��i))h+rNN�����4���c�{Y�BA�����~�DcjJ�U��?z���s�(��i�P(��e��A��32��M���]JI�/�l6GH�z���K����yl���Myy��
j/*�MI������O�%���)����O?;	:���w�8x�dQkdy�Qa����(�i��[���4T�=+J%iQ��r'BN(��)>��S8��^/�/@��R;99������,a��O{R#�)Pqq��\.��	!o��C�G*���*j���{�������S�RRL.B)##���2��#�q��"���D��<c����y��BA ��b�L�^���M��}+�B�	�0�8=��fh�dvvN��dt��<uu����4'9B�J�L|E��+8NJ���K8���V�j�(��[����������'�&J�yR�F�e� �!�e�/��0B��1�;L����@:��K�������*��E_���S����`L���6>P"F�G�%�$p�4g(2n�r��7X�)�~�r�x���D�G�A�i<9�+�T��J8%:��d��>���e�^��V+%����'&�[��d|������}�Q�q��{JR!���D6��!1~h��,�M��$�1�5b�=�l!�!��FN��D7E��Y��!�[��dG�/���x8�9���OE':,�C�&dd�y�����
&����P./�
h,���yi v�H�bC�L�*�\-xX���x
�����q��sx�d���j�6-Ll��8p�\(�5��|�K>f5�id2�H@��)%6��$�S~b�[k[������&�LLt8��_u5�����P_�,[j����4����#G�(��ZEz����1z���&���v�����|���� 1T}������f������
���S�v�p\v�.<\A4<�m��	P\��NN"�M���������e\j��M2w��9���t���hF,�*NX�����49��
skB��R$(|P��C��Ud������`�8�("`��4�d.Yn����[v��[�������`:4T���62|�_Dm�*������i2���T�+������Qc�o�������o1�������yy*��9�h�""���%u��<�_Q�'v��_t&�(7���;����!�!����'���'u�}opF���;����2GW'+W���B���k;s�e�s��k!��E��H��	��g�7��RP�z���?��^s�k�L����ZF��'N���i��c4z������.��iZ��5�m7v�F�%k�b��&;��56����������k�2'%)6���	5�5��iMMUL��8@��f���tEF�b�*����-Q����7K�>��B��S��[m����bu���b��^��	B���[�����!C5AA�?+��"�+<\*#�RP�(o�������:�����l������	Wc#s�����n~�^DOlifcc�E����Jq:k�GE�u:���D�e�}�*�����/����[6Q)��GR����tV���y] R�P&$PaB�bp��N^���Gq���~��,3g��x�m�$�W_���e�]��g�����0���w��|�1#�V��p#�/����s!�-������'L���v���g�i���`�NQQ��LEr�2-U����y��G�8��/����_}�<��G��g�n�u��O�(�j_o!>r�`�]k;x�9U��j�#	Y�t�H/1���$�?^k��������g�B!�������_~���mkc�u�
Rb6�I��7�n��9���?��U*�B���2@#:F~�m��n������!?t�i�q0�j-m���:z�x
��[nj���Y���{����E�'u>��	�Cd���[�G�4��uF0��N�ff�?x�=>^���c��T}���1((���L�v�NOWa��t-���_�?>��E����7������<%QQ�*1Iu����gB��m{v;�p��J�hkkgN�&O��
���������|�Y�O�&]_=�[v2@3,w�%����$��U��D-�p#��9z�q����9s��~0GE�G�����wkM53p�z�n;hi�9���
5�e���:bcdI�*��*,Q7�SZ�5 �h��6�p�p5��7^k��>=����x�,c�h�0����@Z����jqAA���`��]z�/�X@��N7����J���Z�o���/�P�N'k�gW�Bh�W���hlD�����h���+*���Es;h���Q'Hs���\��MW]W���K�����������o�@5����������I�)h��_-!!
P^a�9�DD�O@�@��yA4�KJT!&YC#�9�o�l�cA��������O'��W�.�5t�J.����GG��[��HD�9����tk�Jj9�W��y�p�@�JC56p��i��g���/��+Q3;a�6<R6p��i9��>�����F
���Z.��3�!�X:X�pqC�����r�jg�O�C��wm�`{�	#��������<�����O"@��F�G�3��#G�i�b���U�~�6;��{&���Q����G��C���`8y�&J��&=�70�f_�����������707�b ���	�C]+�m.$Tf�o�%(?O��������SF�����;A;|���\��dX�������6����3��ZC�u�;�1p4������;�(�$�R6;����*5��J��Q�f�hp����r��&m����?T��>l�
�����"_?iqQ���Y���L5~��9*� 5�����#�R��w�
��7�J!RJ���dEj����5SOj����%�1`Vb�U������*
�(�J���;i�a�0�j���6�&�����VC�Z��������4���	]:XCj�������&���;�{
r�$O9v,��MNR$')	��&�HA�)W�&MD�4U;������9�tj%U�WM��`��:�-����H39�#�n�0��'�V�qvJ��M�����K�A�2Ny0���%8�Qp\��f�\A�N54:����w�������c����Z��HC� �V��9_4�QE,
z,W��@�"�!����4�+��EO��(������X�x�'���f@�M�����1(��(���_�g��|�n��F	A�)w79B9:���h�	��X���Aad���lO��(������F�U�f�jeY!I����'�;B��{�����{�G�dd�!GW�����?_������Jt>t<!0���g?�p
]���rP�)$�a��0-1����h�������~�����	!��'���E;�����SbN���~�@�Z�8���99vw��kA����>�wK��%�� �I�
kS �\.�A$P*J�d��A@h�_��]�d�����@��v�����$�7`�
U�d,#QL� �S��VrO@�q�A���(��^�����sB������|Z`~�G)��8������P)���<��L�A�\+3�W��;�8����+lM�:F�6��M,���B�S<�������v c�����K���MvC�B&o�eg���T�9��r{p�J`���.��*�x�!��Q�
@[�E��q��_�R
�cr_s#����D�a���:�]1��r�|(��3�+�d<�|O���d����r�5�M����Cz0��=��������'>xP�1���.�z`����\�d���V�p�Z��m��Y���YO�:�"H�^���q�8[w8L���=��Y��)��:B�vGZ�R��2o�>}�'1�R.�[em�f�{=5J����S���h
+��S��{r��l�"Df�v%�S�
xp�n��C�+	NZ�(���R��
-HK�KD������������X�o���S�t������P���C�X���e��&JN4cwCp������Uc[*�c4���`>��M�\f���j�g(��j��2{++SQ25����\/SiC�:VA��R�T���i��W��r��EX+��9�A8G6b�8�B����I��2�v��d��L��4
5���(��rm��z{3�����>Z������Cw�f�n	S���p��u�]i�5n����D9���uo(�uO�s�i�-b�d%�D����������s��rj^�����*��^�>F��*ML�{H��HF�������-.E��������n�^le,l�MH_Y�z{�ag�k�X��/��P���������hb#�j��L�k�"'jd*�����!��q /V�C�g����c�G#�~{����\N����9�F�1I��e[w:l��.D����-vg+\���J�����ajhZ�a'Lr��e�3��G�PC�P�	;�}R$���5�7������.��v�gC�����K���?�[��<�j�UV���'-�������_��P��
%��L;kk`���f���!>�gH<�����W�a>�����V��,�+V$h�?��K�'\�oXo�7��<�\�A�����r4�4���Go���R��Z����d�']aC)�Rw������R��]e�e�c�2�a��F���N
��@�"H�`�
�t�9��*Z�3w��x�
��@���X�AG��:����gBw�������%s��+T��P�cG�6_S_8/ w�;9�� ��FK���gm)�*0wU��O`�Q�2�1���������p�G��U9OS�X�!SI4N*E���������e�u{�{�%:+� yuQ
g3{�����t�3t�W
�[#'h�V��5��7:�p�t�����\�����.�3mZ�y�v$S�
�JJaD
Q�^��^�9)�6���R��hwvr�o���.C���k�!GJ�D���,1w@Op�r�c�P�([
�Po��!���Gn3��d�����;��v���0}y|�����pY�p
Li�Eo����E�����LE6���1][;�lg����,�����,~/��������'����
��Vj��%S�@�
,X�
Ua2m�"�I�6I��Xp�
`;xi$�v[���z
��g�M�GN�*L�&\��;���ak�\�j�n�	z�����mw�"e!1��M�BXw�s�,�Z��NW��uu�������8��rq\����� �.�t����\(C�����\����+���>�^g�uz}��9��0�T��n@a�Q�6���+�p>���`Y������|��0�9�OR��S}�
�`0�?[(��2��24����4n��L�YN���@���`'�8�������@�L�9��"��W�����P��X@���f���*�2W���
jZf��qi�r�����)���'^�o\i�Gc��v��r
a+SI"��{O�|���"�5�\�5f�X�?�K�Mp�����`�U����M8�@* ���
 ,7]?]��a./�r�0.�+Q�J@R���O>��P���B�i��1��D*r�����y2��&^h_
�d��X���rC��
���OV�tqQu	�uN��zNI\B1Su��(h�YEe�kD�E9R�j?���Z�rK��H�X%d`�i��M	)�B�N9�0&��9(U�<��`9��rM0T���0��9'��
�!o����4fp�Z���R���s�����G�����q3��{(�r�'�dQ� s���6��9 'I��n<�(J����0'8�8���N!�;����li�g��'
����C�j���w�p��|4Yq ����;�������-�C����d>Jj���"�Q�
\g'��Jqf4y\,%�b���h��I��q�j�2G��������9Ij<�xnq4�����p��t4�������8<�C���f-*G����"yk#�T{�`���A�!�[�v3��Hv
��[����W7P���CF�<�J���/VMx�9�d`#�W<��TW�e�uEkTQ�	@���o#�e��'�|-��bqo.�
|��:Y�'h�P>��)�(N$�6��E��)er>'�����a�:<2�f�+���M�Dc�q���y�%/�eh�"*����n���b�=/6�H))yY�����SGwS�S�6P�12�uW�e�L ��"��#7`�?���(���]"��T*��+O�v5��VgD"��F��,�TmVkAj�F56t�\lJr����5*=��W����6��
��H�Hw���l!���8����)��&��h�����'_T���e!�{3������E�$��,�������v��$���b���������
��3���+t>A+@<����8qk;���
����(lJ�����h���w����r�l��c(�q�}���
�9/?&�L��5������f�������k$�z����p<���$'7�B�����={*�z���]��Xs����i(/o���A�Y�N�[��� "��b��GT��������-�=x��P�w!T
�2rr�(��;z��[�%��ZH�����F�_�j�)�������fwjj����J���.��sE[drz���?�.'�R��D���|����m�����X�B�k�������>V�D��-����n�gz��	�p�%�����������3&����n{���I��z�j��t����'�����1i��d2N��IT=��`�e����������|�����`�������~���{wU����������`��L!�����v{Aa�����o(\�������s����6jn�e��
'����O00���w�[{(���}�e���45v=���/��V]���9lX�2���)����]1�����1��a*���/<�$"���xS�O?�.+�:4u����?�f0���,��l�f������&KZZ8T���>E�'�>��o0Xc�eY��
N�D_ri���8s���_�|� ��h/f�[t>��u���K��F���^"D���S����q���v�����5�8��/�R���W��7.��+/�����3g�9^�PW��_#����G��������t����*wl?����i���&�KW_3p��������X_�Z�W?�dfF����������`�k��z`����h��X-�e[Z���1|��#�u�w�3��K��z�������_\.�������I]������T��(�����j����������ut�������S(�Q�V�����=yJ����p�}���;~�H��	�{�V>������G����~_|���o���O�(�t����WY�������E3*2�y���U+�mX��y��uv��P�w�{��u�	�[p���4t���?�\w�@�Y[������G�c1�����dAh��b�o��mn�2��0�]�����O�gdD������Y�{wevv���N�n	�0���<�.�Q#�^K��_|y�7v�w�/?�z#����T*���33�t�8$�A����T�PzB�	�an^�F����c��|��,��J�2��BL:�I{�xSx�|�">�7������u�}�(s:E��8���Q��f���d����FmMu�MH2=����x��o��������w?\C x�=��W�|�h@4��A�����_����P��g[^{e����?|���+�-Z�D��_9�b���sK�.99O�h��i���!g��B/S��~���G��7����k�E�_{u�3�NV1sV�#-7��E����3�9�\��r:Y��o�n�Xq��}]��3K����q�Ix;pP�|h��w�#,?����[����������JH����3OU4��7����3�M���LP��y���Q����SK@����~&=#2�=	��������F����P����B�c�'O����lrr((Nk���+����5���#�a��?��1HR��w������&�4=j��B"�m�~��f��|� ��l���#�;�m06z���O|ll0>��;.��8>,�(�Q
0[(lU�#���x��QO<9��9mz�����0=�q�����P���.+���/y�]�a������gU��G��j�<2*�&9xRoP��������x��YE����L�{�C>�m<H���` ��J����0��4�-���v�+(Xs�%!!:��=2����lc�e��r�?�'���kj���p{��)��^?����Z�J����{��M|���G�����Rk��s�L
�LU�t$&.\Z|�.d��U�
��t�/J�HJA�
h����s�,��k)�Ki )�
�2�RA��e;���n�i���Y`�����,@
��O<�8JRr����������V(�B/q$?����P��nUO��)�^�W���J�	���#�#�4��nj�5��d|D�i�"J<f36n�I�A���8��^�_b������_��VxE0AN�D
�� KZ��jEQ��:��t���N��W?xd���	�-9Ncu�
_�TRE�~��,�9dh�����~.�0"9�o���Q����G�B��B"�SP�Z�r��|Ba��ZZ�9����~b�$��[��'=�Wo�v����	����[:��BD��Qz$T
���;��O!+A	�#$���E���1��{i��]��Cf� 
�A�����	0�����8��!��v���'�*r�@���*����IN*��"l�x�S������<)�_��
?��'�����&�w��4��.�Gd<�s���}��'N����^����C��g �T[�u��<�B��>����8����EE�s&�;t�f��2P�'N�mm�l�t2�����`�R����m��w�H*G������"
i|pi�orC2�����W��S�@�X^N*m6g�����0"v�IN1�I��	��@������G3�����|P_���)A��T
�����%���"y��U�B�&����~X���;G~���)S�~������+@�35EE���bM�S�z�KL�>:X�t��Y�y��U��k&��h�7m�hl@�5S����<nB���
��t��l��� �D�-� ����Y�!X{�T65�����u�PMUU����G�>��� ���LOM
[�����D�V�:U���i���s�� 6!�t�T���u�J""�<<aZ5�m�Um`����D0�e�8�l>�OO�x���{�V_w����2��m=y������9s�����Q+��&�YSS�*)T�i�Ih����R���K=��/��	�����yg�	�>����V�.KL�^�����

}��v��	m&A)�
9�`�@�>}�E����ow0'��h8r�.5-Q���.�{|�����!8X
]���e�&L���M���+�����w�8=dhj�bt����U0T��@o���������s��%��������hc����q�p��$�u0���|��u:���������W^\y�}�;;�o����_�_� �i�+Ha)?#g}�����[*�������+����������?�
v�UW��b0���k�;^~i��	9�<����n~������`zoXb����a���F��u'n�e�|,,�����d����7�����V'��b.���\���y���b�B���H�������2|D�n���]N�h���*/o��
�@�g�>	�8������$�F!���Zy��������p�v�q�9x����8GI�D��"�w�#�g�.��]������W�������-88y�������?1*:��K��+@����kV�*������@C�O7��WD�	D����@=��h'�K`?� O���3?Z�����m�)�^��b���~d|��I���
O�i�>�� �l�+�}��;i��? 	T��LM�{)�&��9�G�V$%��2��2MN�~�k�_~������!� �R������O:t�6>>��_��3�e�55u�����p�.�& ��m;�}�i�O��6n<���`���(���
��B�k��z��� aT��z'P���q��U)|�J^W��i��kf�.��.+�������|�e����%	��������2��"~�q4&"���+H�Q�9�Ol��'��?�84qR.��"�hpX��o�\b�et��# .�Sc���O<��?n;Z�Ej����B���v���.p}{� f�����HP*�:�5�x��%��7DD�JO�;o���8�$F��T���!B����gO���
���P	PP�#���^���������Z�F9j4�_8	7+*�=�|���i����{`_�C����z�J���!#����W��h�11jL&��C��q����bL&mMM��I���-]zt��s	��?!���']��u���h���--(X�5�t�?8���I��FE]}�N8nJ��f�@[M�]><^�����3
N�A�}���!!��`
�n`�W^����mR������>Pv�>9�������F�"�Z{S��9���7p>�H�������|��������yOJ�L��gur����zQ����Eu�|�����/~$�/>�n@V���#�6(~�����o	H2	�eA��H@p_U�
bN�U����h�����'�g�,zL�����)3f�!�%G����$+I��&bd �lvT�m5*#4L/z�-���R��)A������M���nx�yI8������`rj�&����=/Ld�7,9����������+��w��c�y~
�GQ����
��G�&..d����j�R��G8OU<�X���I �����xP�0�VS���0�e��%dfFDEK�L:�`�������)�=��q�.?����������+����!����i���AI�n���v�W>�-4�6�$�����:��&~���%���������R�%���4n��Z�
~dy
k,����0���}vi������l��]���hK+��.i)�1���O��=�@C�Ii��NI�7Q��M�����$��:�����_�z_��C��^���z�����uN�O�)�4G���\�OM���GE����_�'��_w�9}�V���
�m�����c�� [` ���/A���p�mW��R��
�q<gb��9}Q�tSH�gO���9�,0L�{��z��.�m'}���W�����"xA�^�����w�����5����#��J����1�_���K!M�>r�ds����L���3n���~�<����}>K��������^��F	��
��	�{Y�$I���&��zT"����.����(�,�i��W��?E|$����z@����4����s�����cg��6"����k������S�d��C�P��V�+���0�v��{A����|�D����o>"�������@�����~���:,��_��/�.x�=#ZZ������%	��qqw����s�s]��{w���i����o7i'O������������(��B�����Y�����|�w/BY�UW�dd�|����%I7cf��G�xF��\
�������_���|�4E����V$�*����Z��O<�h����G����'�ED�N�����%��q�?�^����]��o�7Q���������]�[�1f���E?�		�~������^������08D�VHD[�,n�����P0����Vj������6���CP{���������z������K�?e�r��7��V+5��������|��'vY���\���sW� >qJ���=fHw������Y|V�C9������C�IU*~k;�ws����QP$R���JER����B�]���������=��'d�>�����}I�������E�b���r�_��z�����BC�+�(�tM�^�����cMHs�xZvs�K�)8�1_���(�l�g��!G���)/���� �d�MF�!4�����d!;�����1�W,����E��8���@<j$@���B���V��$��������5����O:�RmC���B��O'��<5_�W:X}�h�E�
4���@^��mm�L�Q�P����pNNT^~���G�s���Q�C<62���]��,]r��`����zU�����a�eK�X�N�����g���&�{���c����*E�{kl��!�D
�#��;�%���#/�����PhIEL�&6���v�N�T�8-N$��s�{~>�H�r�����z���G�U��kuJiNf���o��]T������fW���))a�������m��/)I|�����1������VZQ�|��Cm6'	���.GlQ|9������f@���6��"��d���������������&//�O��w�(�857���,v;��
��-�W
|n�������X#`��{��gdqq<�DC}�����567�;�l;��YQ�d��0�;;ly��$��8��"e�;[[��������[YY}�>q�ammG����>����g�gf�O�����i�W��m���CC�;�����2�$�sl�1..���F��]-.'�P�a���B���t^�����������dg P��q�����b�k��������jpi�^�9*c��J�?pPr�A�y�I*P���R�|�r��"�~�`���i�����/��V(��@"�%;���������~����V+�BQ>%��S|h�G�6���.;V�������6���X���� ��I�11��
�����
����n����=KW���k��6���yg����Si��N�K4�h?�h�e��KMs*aF����l7\���K��������'2��fu����/���3������54t��������@������6���\uu�A�_���Q+��f��OLx���q8�����=��[��)��ir�9�A8g62��}�k�����MM]��d��U}����3!�"�{lp��X}]�)T���R��A���
Hii�������r�~/R��T�H�x��\�KoP��;�X4~B��������	d�c�QR�D,�K��I�����8���+�	���fM9p�����S��o��v������o�fu��=�xN��!����SX�����&�����������4R#L������T5��M0���g���e��];���F��D�
�2�4���7}����pe�0		(�>L��K��pc��Y��a������LrJ�[��/v{"+=�6��W�R.c9}z!0��D���)����)bZ^_�����i)P[���`V�����5)���2�7t���]��8dS�e�\���V�I��i����
]��R�C����N����V'��t����O�	�v������O���m���:!��"twc��

�h���]�����5�'M���	~���3f��,0���6�������h��J�YB���PP�F���B��''>��o�V���P)����r�X}yYCII����C�ja�&�������_]����O>=�bv��y&/?���5�T��4T��5��������no�9�h���r��\<!�p>q �^�K���2�Rz�^����<�@F��}����p��%����g���[9p`t(Qqn��I�%G^x���fsS���~� ����M���M��AK�_s��_�������B��:1��k��p����kA�X�`��W��h�EEq.��"�w�`�j�������_�#22x���'����)@+�I&�-�~�{�������uV�U���(@��r�q����OO�^����l���3
�^�/�[��
z��
'�7���+�������	���_"F�c/�f��M+zx��*�`i��0��w�v�����g�ox����o�����v�C$&�!���>�������9�#���g���C�Q��
2h@��p��W�����|�}�~/b�i���n"&�|�4�����)�6N�V@��Hc�>�O���� ����F��`#�O�L�Hr������#2�i���-��^s�@13p#�����8���V@iE7�\*R{���p_W����@J�|��_qe�Mx~~l>���I��H4��v��K�]�N����
R�D�bt	L�����8P!�{:E��@=@�v&���#J��������������]���C�y�O��E5��-R��Da�@-#d��!�DL	�Y�&C�\ ��~v2	CLu�>!_�t#�����b��HB�c���������epP<�u��'n��D.����I_��ye[�rJ���lr�By^%��IrBv�;����	@1�8���En�$���T!<
�R�u��}��#4JZ$�I1F��!�;�M�8b��<|��F�~����L	�,2��S����������|����[���V�N�

��a��f{G�
L`�A
z$(y��UG�58X+���O��������@��r#�o���V`*��N;�C��C���NAu��Q���?��$���	�FE�������m���;����!�\Z8�tDB������&��\"�'D��N�?���+Z!�;Rq����v��������x�-;V�e�����J��O��c�4��RY���J�	�p�qT^����;a���z9��C����/=�a���p��K��*��4>Tw����"|��D���/�G4~�u����n�I��w�W$���dq-QZ#�U*`���o�� ��t�/?��)/t������xH[�����]���!i�+�11F������y�p0�u�%%	B�F����%��{��
0�G�N��{�d�����Lv���� uB�I<���h���:��=zl&�}���������~���������];��������W��.�NU]���f���V��M-����(+������ ��_�0�j�������5}�&�������.7w��^��X���wu�~d���wWU�%%����������9���|�~��E�
��d0)�l���o������)�->x�`�����%��s��������mS������~�0��G��|���!!�Qc2a�V�(�~���U+���lK����'�_o��0p��=�0�#"��\!���)����M'����/R��d�Y���������
���v(��#��j;**��F���8���k� ����vR'C����N������8}���7�����r���+�����Q����n�d�J%&�`
���	����������_n�plXr����E

]��8Oo�����\5��G�o�T1iJ.��|�y�������th���(wK�����n��/������|�����N|��r���}���;G�I[P7|dzVV�����������n�pr����2��-_������~�����);;
�A\BH\|H`��-b��h
�����2�a{Z�hra/���c���e[��
�����'[�/���{�������j��3������� �=�0�"�,MZk$�������Y�v�U(��Y�|x����s�����
�OTU�k�!�v�Vh����yy���
���[u������W�����X�������*l{�-2�GT$�BC��mmV�`�1 O��x���w�3.v��l�����y����n9k<x�&>.$..�bqfeG����o������������ �;:lyy155��]&##�R]��g����1���z{���w>��o_}}U�c��_�bK�[����#����C��(d��|��'�a���im����?�t+ ���f6>U����+�;�����3���@1����E�Y��,Z�/������?��d�F���~��N	�!
`��
��a��?9����)0* P�.�}��%N���k��y����~��
�����c�d��j���ok��a�KF������U������
O�666v��-�$��G[����6�=�r��#�i���r \0�@]N����F��_�D|����K����� 7�E�K�/L6�a@�Q����L����#������A���$6��1�=��@�\���`tA���9s���)M��>�������9��!T4ir^aaH��
�t���4!H7����f��sK�"�� %�&@��!�}���_����#B�[�N`��Fe��!O;��`C��(����kB��`����y��9)L��&�X���?Z����LI
�E�����I�<��
���25M�''><��x�<=����?�tHJ|�	�A��S��?}�I�r�1�p��g�CG�B�q������`���%�[��)����]b(ySg����~L�������Uh��t�"G���P��
)���A��N`N<��9����^�"�%`	���$$3
xPP9�4�V
�2�8��G���6�w�}�(~r�*i&(���%N������m~��(	���O�#3�D��<%
A�=�j�/��1���oE�03�T��T$�C.�2Ao18��?.X�g��cq7���i~��D^G�_Oz�x�9�9
�$J*��s%E(����,������8z�HVe��SN����t�������"�� �VT��A�|"�b�B)0D@���TD�5�B~�aMGF� �<'�j��������Kt>�=�o!v�X9�B!MH����/�"E����s���(v������\�Ni�����w:]�Q,�V���}i�[#ZW�c�� �b�����p�����(���!"��J�zq2s|l.���3����@���X���OI�+)���g�z��zK4z)��y��o�i�����K+A
��SY\U}���u����Cc�n�������
�����qc�q���I����?�t+-���a0X�$��O?�kj����!���-�����-������sCHj�����9Vz��e���#G���-)�WT��Q��E���F[i����iLFd���h�;�B(t��Tq�x7�x����=G4z���d��
\�J�2�\Vq���dYO�����gw��^
�	&eB�h{sQ��O?�b1;����=h���ft��A]]�$��s�M.��%��
����~�=��::�����������v�:�b�1������u�'N4uv�B���n��T�q�TE����#���n7�9�N�l���TnnT�����Y�t��S��g_�g��J0<A
l�����v�A
������f�0�]��G������������p4 ��0�%0Ww�8;nB6h~+W
o���������399t�����zla0�����$����vz��_byy����r�bJJ���i��� @kG����:,�d���� y`F���������w���u�R��!��3�M^��i���>}�'N����;Z�%����w��ko�Y,v����V��@q���+���h���b�����h�-�fc��|C$�S���_h�
0&>�`���eeG��[T�2bd:P6�����C����=x���>�+�w����6^~e�O?���lq1l]�����y�oz����qi������\����~\����<u�����_zq�����,���IM	���y��U�'��<�����k�|�bh�c��>rd���-K���������\�ey�\0[�TA�T)����L�K�W���W^�Uuu;����^�?xQ�D�)f���g�n9e4j�6��C�0�S��{��GP��������:XC���d./����dL������>�Z]���������HK'�e=Z��+���00������J�� ��	��n}�����>��������4��A�S�Y��49'%%��X��$�M}����MZ����������,y�������#�|�3&�x�?�/�n7�p�� 5p20���}���!&���U�|����%����5���L�2�~l�7_������_����-�z�D����Ud�$%�`�]G�'�S�X������������4|�`'�����!��?�����3��oX��8���_~�����#��F�=����K��?��?j4Z�|��@.iia=2~�� /�[b�:�}�1&&8--"!1�������C��~��g��T��}��u����&���0dh��@c
Aj`�K�<�_����)66v2��lF'���Qv��#7'���Lk��j)�Mwk�����= ���n��_���6�cc�$~|BHuU���PT=�4e����m�FkHTW0~�d������99��FMeU�����~�K���e7��*k�T�a�
O�!e��}���$/���"������h���0�\��	�B�O�g�.LLE��t���4`���81'::���Nc�r�j�����,R�-�:���Mt�\�������!@Q80	]K�q�0P���RE_yu����%6���N��$���*�
�`���^P�h-`����a��g_�|�H]NN�
7
>^��u��[K�#s���=�g@�4�j�o�b�����C��A	���\���PmLL����������K����<eJ.P��������n�q��M'aJ�t
��p4�7=<2���T*�V�6V�j�\�L�{��{��ni��s5PZT��N��h@�d��YYd�#�S����{����Qd=E��.M��W�bE%��	�B$H�Q�S'��I���X�Wzz$�s �ka�r�_t�}qq� �F�B�8�g�����_JJ��x����^;jO����[�D��[&L�j��n1�B��S�:tX�(&6~����Yb�"2������B�L�g��J���H���'�A��>�8��T���%���e�_8B�3�9��,����R�m�T(���g{�*��x���m���@�-Ey��K�R>�9���84>��-��0.�=����hs?�n���i��'�����h�����!{���FX��k�,�T�����'MI�qd�*R9ci���W����A�w?�U8E�g������4�������!V&�icY��8v2��s�
�o,��W���rD�.����U�1����
�%FrQ(�x:������Aw(U*��qZ����M��B
���5Z�K���JtS$�J�rJ^.c��+	O!����)
40�L����"� �p���Q������Q2��E2�q$M&�)���c1���M�	m�Y��pn����C��2(�D��������f�%L����BX��|��'�I���_E�Q�����DI
�(�j�O����G�F����k���$
�P��8
 �m��������������c����t�Cs�l<���0�/pnO��%2���e��N��qN�����U��xtA�>�T��D[eK�����g��������Q��]����aC�����#�i�)�Ah.�p*
C�oJ(K��4�������|7��d:�W
'���2��U�h{��`�Z��M�P)Q�q��02�O�*����wJ�{4�t/q��c���1;��\)������������X��O�pShZ,���+����������g*4��o�*~hz��3��Y��0���#������*���}����ah���O[��W@E�����9�nirl}{U����7��4v�
��o�:(�hm��t��,�e������\V��jg�LP����A���ja�OTkCt�P���|�i����R� ���fM�V�U2v���pZA�Q��������������K�\�i(H����Y4&�Jg�[��FS
(�
G������f���<�CDnL��������r�R�W�r���j�u*���xUhJ�L,�m:^�3hC
8V�o��=+��]�LZ�b�)5,� �e���us>Q�t7���+eTqd�F�����(I8�E����>�������M�uZ�Vvr�������p��v��|b]���q��*V��V�����3c���p�e�d��b �U1%)'7���
�C��U[��N�;�X����n���M4x�K���h�F2&�Bo_y��:�<<�����L�16'~������e���M����5��!'V�~�c�^��'�[O.;��93/���(���2����P���G����.�s��w�'��������`2�I�3����~J��G��������[��	g7��Va�m���8����G�+kF���U���;l�����\���_����Y�C�.qJAI��?��re���<���:L�3��:>	���tg��d���v�	:R�h�
�Sk����������zhatq����8,C�i<T��`Mi]Y�Nhq^��U�"ZB��(M��6����B
z��Fx:�V����l.���n�)e1%�0G�j�����:p�nA�<p��KKRFg�v�Wr��7�a��E�8J�x�.i`�\��{������w����wA1�#���������[��~��&�z����`�^}r������U��u�H
����-���r0`~
��������E����/��p���E	�s��&����9�C�J�2&�~=�������{�T�.8���/����9o���&�Zy����c���
I��������A�kW�8.et~{ukHr(*��@n��E���N;��S�8�1j�I����)"7����e���<�����n��'.��_yf��u^5�q���_��8�\����t2�H���3�Y��aq(�N����
�i���^����������5��b���CQi)5���;�Z�����h�B�����6�R1����+w,�c�Gs\6gGU�����t�ut��0N�Z��J�d�.<H�O��W,������^����*�i"������F��.���������ni����qt��
�f~}��������Io�[[-���S��Tzu��j@@��!��(���P���������2�E�zv��U��������Vh���Kt>���Tp�1gV�����e�0�zC����tZ�!3����'�OJ���A������Ce�WuuXG<5��e���J�OO�5F��5�n�,J`1�s��
i�Z.c9����m��r���'1���%����U����:�������e�;O�U���iX�����B�!^=p��:U�������M�Xj�G���AwO(�ck���OO��rSZT�����-t�0s����h[
�b��&j#�h�e�}�|����}s��j<��>����
Q�i��
�/�����E
�g���������;G���=����������[Kn���
�����c�6�U���h�:n`
G;�a��wl��������c�mo�h��Er�������$Q�Qc�Q��"��~����2r���Ww���{s4�Iq��A��w��8���'
�L�JQ�#7F�����c(�=~`F��4�uP����WA��Hj���3���q/]�\`��u.v���J�g���������Uc���n+��2P q�jpfG�����#(�������?�u�����@(�LEd:��a������L.M���Aw�Gy8KpL��'g��"��p�^���I�D��i��2����������cpA��1��	���5G&O?	�hkh������K��6F==�B�9�1�8cb>.�^>3u
v0�(�]��{����z��qOa��4VZ2]=t>�T�9��|`�l8�,��T��I6\�x7�z
U�##2���[6��}r��)$Z'��-���L�F�n�B���x�A�i��sn�B�3g�=�Z;���Fx���F Rer��!Ef
�(��Jn�!C�T�x�8�1�v���!���<�!��Wf���D����z� h��G����L�p�U�Y���"�$���* 5����ap�����x�BI{OF��^�����{���p����=�#�����1Y�-(4����"J����W P?�r�s(�� ���$I�g$h��������:_�&��"E�����a|=����"%x2��rB��Qi ��b�)
Jn���XQ�
�wH�&'�[�H��'�\�������T$�B���|3X�����H:���'���� n����3�8�{��|�A:f����4>C0�(����7|�8�W^�rp
�����!��|��2%�`D��dq��U8�h��	�V���	7w���y�LpA�`�TP�X)@2U���D���Zh��NG6��9�<zM���(��z�(=�0.;�����9�G���� n������ ��iG��w@��+��%�5H�Ve#w7@c���W�b���z2a����%2�{h�����"��"L���xi�KA����@	^8��5?Zl��:Z�p��B*���-��U_��IJc��������5�>
y������:�l��:�����(��rr���2�?�4�n4�#��S^�{�/V]�"�zC�Z[�BK���]��������8�IU���h�	�F/'���X��h��#��"b���L�:{�Ez'�*�N9�*x�(d�?�g����_��D'���P�R%#��r@dr\~J9Z��T�d
��8�1U�vKb�A�B�����d��!L�z���7�%��-�.�G1�J.�(s*Be�����Rk�kQJ��(��/�rN����5�h9�10b�J���%\f�G!�;�>�@+(z�F��:N9i���tCS�(��W��w�'W��|�e���\�������F�0qg>��y1��������!��fg+c9���X�)��i�4���	B�
��~��|��9���:x[��Y!���k;�h�j*P���+��jXn�W��s�-�����J�F_��b�b�������%����DtQ�;�^T�T�B/c�G��� ��s��7C�(M�����u��]��	��\e�ofg=Mk�r�=��K�����W����V�A�Jc����N���!j���������m�ko���B�9�Ja�H���W�Z��4��?�Y�H�2
���)A��7��S>�u�Y�~gp�R
��d"C���Gu�������K�G#[�Zp�j���1�p�
�p�KJ"qy�(��u��������9��>�*���9�h���9A�kT�dru'?[W������B�r�b���>��H����2�`�N���m*U)��a������GN��t�(���+n��e��i�=�Z���[��5^�0Q_�Q����S�w����+����&\�&nf�A�b��/����*����������bf�N����P�LC��!���tex���[#&h��;�u��/�8L���>�3Iw��M����@����)^�z��]~����5�
;9�g]�vZ���R�B�����(S�����]�7�k\�8�G4(S��ki���5mNP�2[������n���T����*B
����H�n`�AyJ��uA�s@mH�S�����i�����UQ����U'��G����.Q&U)�+t��a�����_,�2g��}�3�i��O:�_h/����q�&�k�[����B�.�x^�h�>Fa=��&*�2T2V�r5N���z�!|���q��-"C�R�Q��\G�{P�2����.I�����W]�v6��`h��W�j��qZ`�@�����\u[�����"d�W�Aj��`<�����o���������������*�	rj�!i*;�"�<���&6#�e�W�F;��)�h}����]@U��t
{l�e���.%N�k�j�z��*��a���f�������F��h���8���{��]�+���3j�,�c��g>nTid!���w�������;��c�������ax�V�@���8D|B{8�q
��K�{�X
�e/�;Z��Y:K�+�@�P
t��d�~h.R���\�]pS��;8�����m��l
�
k���"e����H:A������-���m�#���������z'7 �����
*�V#�9�V�wM��3V����p�i����{Vd�2Wp��q��s�2|u%j\��|?�\fg��!jx����D����H?r����(�I��A�_f��e���2���!YEY��|uH���dr�1�RQ�A���3$+����]0l�*���#�C�Y.U4z�D<�w`��K����8*����}�B.��� �H3�y����#B)��2�8������1I�����JSt�rilT��:�B���_5usg�Ro�usj����;�]�\��!I����J����l����t��%��D���f�5&��@5V�B���<�4
P����K!}�v�I�:��k�4t���-��&�*X��)��0B��jY��F�J5	
�J��N���0�.��{2�_E�iqK�����{m����R�C�@�����[���VQjJfC2	���`����!H�ig�`_�Se]d��Fne���8��j�Z��8��\@VB*��Ze
Z�E���	��f ���EZ^�4�XP(d��9�yK�l��QvN&�eT�����t�@���C%L����)�����S���
�������vU8�oQ[�v.fF������
�~�`��xE���%�h��U
aU}o0�2�L?�T����P��d�d������Rci8ae�(�,U��J�R]�q�
���H��i���3'D�����Q.�W�*��K��C���Ml�W8D���|	�ua�^N�e���a_
���H!�s��0�$��\�w�
H�XJ`��+�`q��Q�<-!��k4�`y��8d���x!���Q��K�����	t���K3!�n�B�2z2?D��9����W����%0n���<������6���[����4+��B���g��	���&�������
QX[��X��9 �(H�AA'��Q�����Q�C��<����=H��{Np%���jIfY��a�zV)�T^����}e�����D�71��_���^�"���jV7���h8%��R������dx��*�yD�2';���^W�18������hW�,���OV
�.'���*N6
�<tX*Y��<�O5���^�F1pPrqq��v9](^��M'32#
H�36!Q��G�1�E���z'�������'�(RUUm�,���b������e��F��T��[�����b�Q����n!�3V�W�]��?9F�N�F���������8`;2a��e��r;�9�U7�$/GMZ!W�@��K)W��8F���1r������P���GI���!���Ld�8��B����Ot��%M���]�SQ�XN��hA����/����`a�O:$���/�����
�S��=TWq���/
���`#�p�\�����=;+����(�Jh��2n����5RX�����F� |3��7����q��������?������e���B�������U*y�a��q���o������~��`D�a����f������Y���}�u��7�<d���l��h���;v�����x����[OY��1c���6o���a�����H�����{��s���TT���12#"��woU�����3��Z{���������Z���V�LL
��7��}������_��Ha�8@����"=="7/}�&�F��[t>r�2����Zy��G��x#��0��vu�?�lkq���� !�!b���������&%�~�����0zL������lQ1	"�-��99N�k���������)�C�Vvv��`�3�F�lk�BE����|�-C��km�jk;"��Sq�E	�Z�����p��E|��V2y��\Nv����!9��Pid�!����
:t���;��?x��u����F�^5���o��u�t0�����x���m���Qbk(aW��jq=��������/�Z����7�X3mz���,���PPoP�0EM&���Z��������������<��D����\������������3//~��������Z������������8N�o�����������/�����,e��&�����������n,:z��E@1'O6A��7����"�	HM���UI��@mp:�,|H)H(������9[[,��R~A,Z��
�R����$�r�w���'��]S�}�����������=0��$q����!���((��zQ����Q� +����Z@�bb�	1&�VYTWP�����1�$���T#�H�D�Z�L����w{

c/��w��r��I��~���A�>"=!!j���z��e��f���wWB� ��I�uu0 C.L}����e�����]#h|�$����K��zY�UI{��9P���|�DH��V�J����� 8`�km���W
l�d��\q�������H���b�� ���I��D[DR��m�z
*-��o���M7�\
�o�����2��o��[F	�&�z%�7������>}�P�e��;A��u�h=P�.P(��7-����U��[���p���v�8�������P@�>]]v3���2M�����D��������������9���fu�����g�/��0	�*���D6t���G��a��`'����{g���^����F�C����P`��������J���z��K	C����F���G�sy_j����(�����2�������B3yJ���~;�����q�Z�
�Nr�O/���tn��%�a���
�2�
�������k��	7��yf�E�����U�x�R�K���J&7���-���?#F����c�i ����z��;z�l@�����t����J��C���W
X��XUU;�7��y�P����E/��� vS����v�����))���v�}��	���6��m,>�t��g���s�( w��U)�j�4]��y��q+W��T� 66t/|������#�+��y��$r��kz�m��!����:5_|���NI	�%�7	�H^��B���2@������
��W�'HPv�R`c�GO�k����� ���I)�'�G��@����l>s�e��>�IrM��dq���na���9
��Cg�D�<��j��l��Gx7�b������(`�����KDpJ z�~O�>Tr��'U�=��s2?1�~���@�1J�����(�+G�X�|TE~���?�
)��\<�DEa�E�s(].>(,�.@�HNQ��B���������c����	��� 9xc���A�)0]9�G��8e�����`�?����S�"!��;�7A&6�8}faP��D�&1�i�hY��6�T���
0|�E���V�Q�Y4T�RHiZN�I�H�#�8o76p^:�w��(����4!���'b��k$��!Yz��������z���8���	����~��\.�,�o*���1������_���=9DDz�).���+=�o����##^��S��:�k�P`I��r��q�x#9V������V�����'O4���)���L
�'����R��@���]_�QU�&��@�o���o	X$���-i��N���
�T��u���E+U�	9b4iqJ����.��~�)%��"1��4<o�B���%�Y��G��8�E�6�'t:��(��@eIAe~���A�p9Y0�(!����`�u�[��C���'+hd$:.,>�t�-Ci��\�'� ��n=�|�Qx�(ah�:��z�����.��z�`
�����A+��j��������PyF�'������`-���&��J�Z~�P�����}�H���!'9F�d��B�A!72��O���Q�9DN�]f�4<r��j�Z��'oIO6<�Y���@H�H+l6�`����-��������#��J����f��M'��C���z�����]�'��Nf��1�`�����~��H����M�h;���HL28���#�u��yx��@{��#���JOr�3��3"CL�a������2���D�������'g6�Pq${�g�l=~�G��U���^�Zq����/=ZW�YZ���?:m������i3
��f0����y]��[�����/*��),��B�e���y1'���H��@����sfV������]������0v��������
f\uu���G�����P���G�@������[�������n�i�Q+��q�7v���
J&QM��L�Z��,��U+�BB�c�g�C�3NTwI!A�-���?9� t�o�&�L�������^4� �?�x��o��9]Y�������(��Q1�h�������'��w���i���w�s �re����4%����+� x��������@�`����jH��X~����f@x��l �+����b���?��D�]w��5���>tX���=�;0?ZF������2vl;��O�?NQ��J���������y���o��F��?��T�����=��&&���(*�5&�� v����������^]v�L+L6h����������f���w��u�`
��������C��$�����IL4������_�'���8�-�nOg��fu*�20�f��/��q����^xn�S�L"f��
�l��\���8?k8�%� �i�<��SS��Uff����8v�~�e%O<9��+�9\����/������:4"��c�C)���t��g����S�~�e�����d�m�W��X(AK�kz�D��1��A�tX��lR������, G�,f]\��������2��1�o1;�q�}��%
�]�<:�Ppr(
�4hP�������������L/�<}��������,��<��B|�p���*,�,������]�~�Sp�������>�*�����9���A��m���k�z����0��M�!)��;7��\}���>�
���[JA����O?���}�����6�yC��|8oScC�����G���5J�B�O��	�ZY�wO���
�~x�_k!���KW�<���5��"��D�������������_�p�3T����v�u�@^��yv�����M�6�48��0����X���:P���o�w��z��?'O�R���C�#F�`SS���������p�8))dzL��F�/��w����yy1���]��T-�b����?v��#6���k�=Z� mb����	�\EO�Av�����/�)7N'�OW�h���? �H��]� �%&�>��$Qd?��E���^���NGF���k���Hj������k�A!���?M��K0��N-z�D�Y���4z�?bd��C5��\I���7�=��X��Z�<��+�AN�w<u��
�����G��;��������������)`�;��;�m�����p�Ko>q��e����7w����F�W��5}��sr�p�"���:��t�hj�Ttj2�	c���?1�E��y�|n��(z7HA�J�T)��A���k�6MH���-BCu@y�u�w	��w�&$��m@:@=<V��Q�`�����*=�TG�+�n��Pai�xf����8�.fH+��@�0
&N��Q3���
�o�!:��O�PY?:��n�JS�R��L�,��>�:�p~_`����
_��X��T)��q/iP���Gu��z�GX6�\j�[�������9���AU�R�{���E�i&5JD���'��q�t��������}o�����k�7�H�7��9I<���@����$'Z�C���k��<c�a����������c�NuJ2[���dzH%��������������[�3C��cx`�)�aq�7R����e��"��!]����(�����%^�;������]:_`���H�)n�3�+0������Eu�b��o�������w�����r�}=p�y��w�"�����KZJ||��O���J3{o,����A���|���u�5z7�_��%�	���rk��\w��?�������5�<h_�7�K+�%�,)���.i)����O<}�{��>
�<2��A���|���u�5z7�_��%�]�|8����������s�/�O����fO��6���|k���:����|?g�z�L�6��C���������~^�l���s�	�c+��6oO��W�w���z8�_�^o�X��O���+���|��KZJ|�~�O���/�������mW�������D�_��Yr����� v��b��E�����|��_�|������dz����������X�y���f{��yq��z��z��z��_�^�X^#���.�~�q��D�_�`=^y@����p��W��RE���[��\���HQ��x#f��try����K�������:�;%�E��r������y��^��$�!�s>o<�|���P�����8w:��s��>�������~>����)���@������%4���BA3Y��U*�8���2�Z�t�Hb�rR�u�����\p0��j���noO�`��K��rfu����=�z����@(>��3����Z��nU*�Z��fc���L&���r8�S-�F�V�p��t�V�Wz�cW���^��W���Q����L���'P(��6�;o�3<�������eim�N����
�Z��������$�����'-�Dh5�DyN�<���[|�`0��{h��~`�z���������DE�&L,\����EO�\������#�,���6}��_m�o�H�����z�����W��v%�|�"���}\H������7[n�e��3��^�V��D����|+.�d	&��O�l���^������K����+�P�t:�����f�k�,�����~���?��t��l�p��K
ZZ�vo�o�DL.���z��������XJ%]S��f������;���YV_�n��v�>=dh�c�,s1lI�$�����������������T_��~�)���,�Q��+�*��\]���0]���N5GE�h�r���6�	���-���^��@$>t��$@��h�I`E|�����F#�rQ������je�/��� �=�q��f���e0>�>+B��4O.���O�'�w�`���}�x$^H0(�W
bN��3,��0�q1����"*�G���j5�	;Itb����
=�����ZD�$^;�R�/%�8$'E�CU9����8�`H�{8��s��<�������B��^����^�Q����
-���{��������t�Eg"���������@�������&"|v�G6I�4�>!�PPrj#:I�O!�4B��	�.F��N��+d�q+�)|0{|�8�<[��q������"1�8�L�h�4<28�"�S4p�TD@��Ia��p>��F��E^zs>����c���!13+r��>+W[���3�N�cq�(�ju|����f�������������������)���G&ED\y��"B:

���nn�qpX�}����O�hz���W_30#=|��`x���
��-���v@L�Q�? �<��"Q�H]b�+���qc���9���i�'}K���_���Y�����B����9|����1��<-��b��P���YM�[�~���lz��G�=�D���4�t�!�+�I�� ����|�������~����V+b8P�o����Y��|����_�=x��v�����y�l�R�G�%z�P-v�O�f�R)����}��;����?��E1�����BN�7�^u������n��S�W�8���nKN	��h��A�q������`]<�p���]�Y�������T^^L���w�Y�l�Q�|��}�m=U_�	4:���uk�WW��p����-=b������Y��
���K��I5�q8\?����dB��0�]�����������--���WW�?���$tPQa��I?��O��g�F����������)�����b��:X�c��������Q>9_/���TY��t���X�����z2�~�Ag;k��Z@pAAj�Qnw	��rYmM;qD�=�������3V�*kk�/k<$%@-�ZDp�����Y����&�T��
D�MB������N�OM���4}h��P^wj1����&���{����-����j����=g36�{��T��q�����\���W�����.���Ab�i���O>���n��������O?�����IHL�q:���k�|���7g-�~O[�u���K.-�h���3�x}����0U������������#�w�8SU�VP�����7G�U>����'d���
���+`��#��M���z��MMM������q�x!�@T�my�����B�1�S$,�QN��L���p,�_�D D���(��K0�,�a?HA����hP���J�J��TkZ
4{�T��/�rW�� p��;:l���:��C���N�5:c���?�8��=1����������_]�������`'������^���
��x��_q�)!�����tHjmm;p}�r$�C�+glJZxq�h��
'rs�L���-0gxp��ow�}����/�l�8)W�Q��_�/�������`�_<���}��� 
�|c-���>���p>r
����� Mb�	�2�E��u�.QV��;n���ISrkj���C@^9\�u����z'����#3.�S^�0jt&t��MO==	���ai��_Q�\~�!;�7.�>��/�C��_97 �@�����G�}�l��m�a0�s������s��n).I(���fu99H���D#e\����.���V��6����K�2�h���[�e?��_l]��������2[�p3v|����j�2;'�������6+E�M~�V�0�u�.��{w��7������>��vbrhk�`��~������H��u��m1q!�bq��9����'VII�P��pql���|292��Mua���.����g�:~�<6���C��AOA~!�A��bef�ONLK��0�N���`����l�����(�@|�Q-����#g��}�U���a��{~
�-��c]{�@�� �RoaEN�
P�����[*��3��N�hI��;�A�
7�����G���#o�iHyY}TTTq��M���O���������\�Qw�9";;
���A$ x�IS}��4 ���0P`�BE @�+)I���\7(+��R%%{�����.P�u:������f>��X���T*9�j��o�cJ�$������f��P�
�9�H���hSr�
gu����$�\��E��Q��edF�0����g��<��r3`�� ��pG���?�355\,�$%�����EK�_�9��0a�'L(
+���x�l	|h����`��pc)��`��c	X����}�'y��`9.*;o��Lb��rI	�V 39E���
jr�9�`��I2H*b!~����w�n�"�������]J(�1���Ixx��+�'�~:��_'�=�F�;�.Z8�T������&�p�}�|A/V�N��i�
��y���M�L`N�&��C~y��6�=���Xe�5�z�?�I��	����D?����!�A����R�E'F���o��S�k�c{����q��C��B�$��Rn�/�CA�8).(������E�T��=�S��9�9_�*����"NG9�OXt�S�@ar��=���/W�%��/u2K'0&�X��Qd����D2g
#>GZ��=L"��)>�\�K���*v�O�D�����#��;@�w��4�����N�n���rpi
n���x*.����$(��Jp:���'"���
�O�5z����#:#3b��f����}�`��!���������0�j��S�	���WbhtV�8'Yb��p�4%�s�}���=����+XZ�Xf]�!,Q
�J�;��HJ>8tWJ*"���y����H�^'v��Z��; ����G��f��o�mZ��UU����yj���N	�������?�����-�j�����!���w� �����%�������!�f��\���B��5~���>Eq��3^xnY�Q���TX��DP�<h��l��:L||$�J�!'���5�"�{���{��oTFF�Ds�%$D�h�8�-6�����N��&*�T}}����p�B�����f2iA���*@�����������N����}�z�YU���(
�#���6+�v({�X]JJ8q(�}DDXK���y���"��dJ9�
0�@��~W�?�i��
VaNN4t_vN������>����k��a��n�n��>����R_�����q�	��`�|���/�����1������)y�������]S�S�~��9����ov���amm���y��_��N���6UV�����&�\����lv��7���}����	�>TZ��g`j}�������+������^Y	1�9s�e�&Tp��t���?�
#"2{�i�)m����j���Y�N��z��_�T(e@I�&����a��tE����	��9\_���s�����|���yP����a�������1o������P���,��E{��;dP�$#�4@)2��')>��b�x�S��u�[���T|��e�>����O�?�����kk��J>_����as�XIk��ER[6�_K�%L������+44J���%����o������{��^}����~h��#�!{�T��4�����6�i��l����a�	
����Fe��/!&- �{��qr����[��5u�|2|��N�����?�d��o�����co�u!h&.�9y�q��d`�J�a"
�~�U_����;|�D�������P���{��a0�fCC�^�*;V_TG���������ua�}K(^��D�d.\N'��f����Wj4�;�	\���Z�N���(,�h��S��7����EA�����Hc��e��(��{4o����������
���ax����>n|60��c
+�B����
�]����������5J�����:�2���5w��UTY|����{�!�t ��.AB���"�|��"��Eq-��Z(*Y��H���%(�"|�*l�H��������g��I���?�e��%��w�����;������=�m��+��&W�KI����0?.��X����	v�T= %�+x�����0dHZ@�	p�����x���PK����k�}Qw@������
��������� L����{n�3K����>������0 ����z�H_���tI���OM]�����A���`�=�����d�!}����~�������G����]W��@KuwMr6�(�=�$d�����I��?��Bs�-/?��7���WZW�Sx�_Rr��qY[6����G�o���\Y �NvIK��67/:�c�3����v����j��r$� ����%��������.5��F����_���k��>��h�$%����������v��}��R���_��7�2,Y�x�N��'�����Gf�
����!8�� �R5����'��<~BNcC��]gF�g����uR���<j������1��g~��i�[�Sl4�[6�����
#��` ����O,��9��@W�~���W���sK>>tT `�O?��|�i���������D�b�;������K�ch��x���}��Y���-7iR>C��6���f+~).��������������z���~��=6�f>�p8&����f��j2���rJ�*u�)}5�4�<g��T����Gc����-�UU�
����]X����_b�`��������%�:1�����`��wR��d���;��VLM
_�l,��~�GF]����_��l�)E[������b\�o�����%�G7Gz�U������"X��_�;�U.F}�\�pUoMb��R�k_����]3���3�����_�R��/IxG��SWE�P~�z�.����6T6$I��b��Jg�\�������B'��D`�_At��&gTO�?�?�i ���+
i�^?�����������fU����t�HRuzF
T]�8F�drc�m���[�e���)�����+�1�W����9�Y&^h�:�Y�=�k�Im��C��_5?iYPR#(i;
����d+!����(�t6���(X��U�+���K�r�#SRu)�0����g������dY�fV{��$=������hp�����*~f
���L&���@6�[�� �V-��u�V���0X���Q�D�ajj 
����,�_.���.?�jeU
B���+����R�v�l�H�'C)��@)\N�0�����5F��r�\]�����<��H�)Y*-*���ID�����E��*�a�h���G%I��S�0��3�v�����A@U%�2'��`��N����L�x��O�Z�oL`QV�����n����#"���Hfj����zd>l��Y�������k��A~��0��|��������[a����]m������>u���&L����
,�h��!�C��h}�X�����m����KND����/�UN�f���c�f���}}�Y[��9�'���U��� p����,Z��*��-�C�:'�g�|r8!VP@K��	oQ�R�'�
Y��6����!����;��a{l�.=]O�yu)�:�w�`L(��.����=��[��������g��CC������&�Q�aa�����>�<w�e�rD����+���^��X�����**���{�o�a��K�\F#��#�;���+��o��^\�����c��@��2%%�32��z������;�--����)]����"������w���n�x�#�`���?6���M�f9�����]99��B����t��.UN5}����Y��T[�����{���#M{vu��K�O6CS�o7��{*LV+��*RX�Sz{};���?7��z����	���M����;�;:��L�����B[B�n�KA�n��v`��,�h����6A`J
f��N���c���t��d�=�y���]�����g�+2���cL�������h�
F�f�#����n$R���-!!�5��d��vM��cW,o�2�2q���U��?x���@�.�AA����|���N��'��l�T�
�������r��fe��������C��Q�������\a��6���#���}��oo'���1s�=��� N��r����Q��*���[�s�����.-3�]���^i�*[n�8��%���f)}����5��n��lu�3fZ�����`������}�A��xq�q��['��u��9�P�u�}]M���,Z�R1������8��/����bbM�Lv��GOoA>5������
��)�5���|�7�P}�;A1^�:b]�g�|?B�`[����lMM�/�,��\=�����m[��gXa��<����5��r�6�ty�������������s�2P6���X,��,�_` w����9�nj���j�\�
�ha�~P�a�6;dl6:��{���uo��<������`9%"B��=1I�'�'��� ����76z��u����),4���W���0~�e&"L�p0���O�������j�ii�]}Mv{[r
����7��n�����1���@6[�`��%�@�n	�/%U�����;!4�{~y��������&�D2�VP�I�%/�S���]n�X�q�|b���Q/A��QA�FH9�h0����A�
��Y����~�(��]����>�����c�+_��l����2^���Onr�q�H�6MT�8�L9bd�=����]�8`��O�<�t �J���=��X��R#�,P$�{���o�a���CX90z���qJY\'0���Ax�~vYk[���J(�`�rC��d��M-����5���v��,y��C��I;�T[�������74���"��DlV�g�� ���y������[���|�VR��A[�'9Y�2��A��.7�56��q0�,��P��X��C�3��)):P3�T�7{�`{d����<9*.62j��F��w$4���
��h�WA?���BG�]ZfLIu��lb��3�t���)����������%���hfV�j�X0��w���@,|P0�����L����j��DD�������t�Y���cF&,
*lP ���9��> ��+��.xRS�3gs������w��G��e:���������#<B��c��	:��U��� .��@]�@C�8��������|=���2U����+�X��������
�}�v'����{��@4Y_x�gt�����L�d��YI��]F�6d����I'�u���t��P���QB�B[�=L�3W�4E��^WTK�2���=3�'�n�>ddBa���A��U�KzG���j!y��d;�KL���V�"��
�Myo""�M�
��0�������ilF����!';�%gli'���A�y���+?��/0��#"WG�I������BH�%��z$��tH�!C��Y@�������b(���C�#�Y�����k���	�����&0��u��	����3��-(N��D3{P���t>IJc��!-{�W'3}����S��8���<�ky�����E��r'�J>�Bi������T���'�KRi��d�{�O����@����T��^-��a��B�S�|>+

!�|���V�4��%.=ej�C9���@b`'��������b�iFA�n�����|�������w4�-�N���p
d����nM���YZL�����"�4_/�A^����
j����hu��x��X��Y�_��-��J7��P
U
�7:V}2��:�LL�i���=Z������7���S-��IEND�B`�PKO�Hsettings.xml�Z[s�8~�_��u'����L��q�����7��F�<��!�~e.���
1v���<������s\��
pe	G��(��R���>"��w�>*7�����`��^@"�8Bn�y���v�F�iP�o@�^�����5��nl�m��0"�7�B��Q��q�!>�@��Z�4��Y�o�(�����������~��*�VW�����R�)��4u��������;��3$`�������v�H��%��w�)i�~<� �\u����~Q�C���P�������K�7a���HWOF� _,�����]^����h�H��JS?~<�,�">\A�P��=�9#c�����}�@M.���E�M�&�z����k��x�%S|1��2�WF�iD�S6�	��i�����Rc&'t�2�D��
1��CW,(;!�!�+�@�����������{-��d�a�z>���Yvtf���Y���{�:?�E��A�r�wJi0�0��t��n5U��aB�?E��	n-h�e���]J1Di
�t���?td���Y� ����yyD}������������	����%�t���?6�6UzF��?��<F1v�l�j������pK�d/�8b0���Bo0���c�L!����c��!eA�K���T�b��1M��V�:��/�
�k	V�.U�b����t�e�
�:%����H`D��\�������\�CG�Y[�>���J�`B.��]��y��>� ?��\-�T� �t����^_!n����Q�����v���
��������2?���7%�P��e<��:���Wb`����>����[���w��/��#@F�DR�"�a��6���x�4���'c��A� OSx�6��j��PD���8��h!�Zi~�{VU�������}]bbo�����w��e9�o���@���G[�V���V�!��Q,}e����Kuz����M�?�������:k#���#�wT0����Zz�\?L�j�����{RWFP[x�����y��5��;m`���'����[_-�@��g����J�On��O'�&��iw�8���u[{�����X�
��3��e�#�6~����v���0����w��sb�y���r��������
�v��WM�n�:��6�f��������b6V����m(?d4N]H C^e�3�;���&��X�/>=�HP`/*�-�F�#[��b����~Q�WR)�K�ac��$a��� ��)�^-�)�E�q������s����W��������;����!�^;��m�- �������
�����s��,��u�Z}�|��a�q�G �����'|�����?�����V��,nT�`��q`�����,%�_Og/�/~W��}`�PK{�#�{a(PKO�Hcontent.xml�]�������O�R��dS����{����j���N�Y���MA3��T_�k��V��_�dARRSH�����{��!�s>��/~��8I�8�Mp��� �������3����"�N�Lb��Q6��(#��:Jo����U��^�7����M���Km�n��o�w�W��9d&/W�3����mwh�{�7�����{d%���A��OcV��4Mc2����{\<�A��v8����x����="-Nfc������e���[���h5��8����1��x�v�3����m��h���	��x�w0����	�n.�l�������Y�f�a��^�,gE�]QAvQA�*������u������o?��U�`}W�vg��$X2w�l]���x�jNP*{�.�uc\�]i�x��cd8�4�O6�����x�86h��#����V���H)p\��6N'�G���~��x��4�7Q�y���$�$P{j����l;0Sv�%�����EH������,�L�6%��1�����?�b[O��;.U��$��y��J��~�d�]���*��0P~Z�$�oyaAv�������\�i�'�}����o�t�1q<Z�D�����
����,���z<��?qO��4E������q~o�/�daX��b.����mPbA:�^�a4�|<�`?L�|Qb�����;��v�.^%N������������"��4�>,o2���t�D/�h��2�|���	O����|��UL��/J����������+/��+}��_^zp������?�N�Ui���s��E�>�� |�����x�5l������V^��`�8������2Ii���1M7���UF�7�Q����?wz��`��u
,%�K�ZD�
e��hI0'Y���4��O��it�	����<q��1����������
;�x�m�F,V���j�7]sL�oF[�����z��m��4k���o��,�ov[�Y5��i�7��z�����m�7��'p���0���$~�c�\�rU��/�q0�g9�&$/?��*��x�/U��d������|o..<��&��7���94�Va���J����$H�����g������hO���d���Z�1n�����&/>d�;�������/��cB��c�[�Eq}gD�I>#�� #wA������~���������9!��f#�0��YzI�+�89�9��{��Y�{F��������e���]��~(d��8
��N
��T�c���E��V���(��f���s�9�N]W����n�kY��K�);�o���|��>�l����PnDB/�r�Y,bE�3���'�*J3�����8N��x�#�����4���[\m��40���{a0�Fi�J���4x��M�{��/���i%
G8�f��������y�"�C./�dFL�Osg��r�
�`j��J��� �v�����lNz<��#v;���D��yy�Me6�A���k�nq7
�A�����z�lE���!��e�VQ��9���
����8����so�g�#S���;��<�q������BW:A���*^��c���jgi-�C2����p��6���7^�b�67���>�������U����4���;y�7�Oy��)�	����:���-������GX���3�mr0Yx��{�4U� Np����NM��
��U<��w�jf^�X����r���kWO^/fi�O�:2a}�-N���{��J�%����5"
3f(��f�=<U��0CaF�1��P|;�9�+�P��0�[���uU("��v��������s���6��;������$�J�Y�P��M�%|���LoezE(N��J���+
����^2����B�
���~�u�s|
:(t��C�U����)tP���a��j��-���;��(�T�'��,��6��<p����;�bQ2w�W)�3�F�B�[��]�QW����-����b�m�}�u�������ws��u
��=KV�h>�����Hq��3'��j]��	;H�0�*�w�3#��J��a+��]L�IPq�vd����z�CwpC�.x%�V���\4E��5�nv��s������������M����~mqi������dD��<N������:���!���jL&��i1��d,������u��A?��>�@��;����qO�_N�^��N�9���/�����s�,�dJx���P^�Je~f���<��H�������s��M�Q��xR�>����R�����������RD7P'Cc^�����:ze%��=M��#�L�I�U_���������w����M�wB$=bb���HVe�??����ql��^x�
�3a��
��p��~��{�������^�I�?v;����W���; _.���u4&�|�u���7
�����!X
��d�x;�^2�t�c���w��I6k�����A���W��G3������A��3~���l���zeK
r|�I�yp7����4H��#�5l�]�����vu�k�^r����W�+�^�6m���p�>��C�����*H.�O^��"��i�`����h�*>�UK��w&���i�u���/�����'�
z�Javi��C�83���G��Zq�O��uj�O(3�F�maF�����H�A3�*���T@�Zn���S�����S�f��DWl���2M�d�y��&��`LQ�
R�w%��iA������&UC�������/�`�gi.5� B
������e\L\���|�pO,��v6����
���Mc�*�4�p�9���]�"�����G���e��5
�f;����C��/t_9��P�cAcP�m(��������K�H����R���7pV��������:�G����4]T��J�:�"�H3���K=���hd�*�~�A
���t3x���u@��Z�{z�
���y��f����	�����3�!����a=�#R/J��7A��q��;he���U�����C�CW����YT����{�{��`�\uC�}���NE�9p����h:Y��`xe��fQ�q��VE�:��D�Zhm
R�h�������y����!�i�8������8����� X�[�����K�E;��h��N���f�Ug( *�lTsI�v��x2;����LX�%x�N�C��������V���I���DG���.���x�����j��2�����A��2�5i���k��M�Y!���W��;�K�����i�f�6�a�=�PK�.NE���FCs�n�W�^�Ho���>;\k����Z������n�5&���}�uS���3�&�/�5�~0Z��U
�s���o\���T8��[���k������k6>(Np��S'��0\zG����6��Y���Y��a��K��=���-1���WY����&�)�
�X�K�v`�=�*J�b��[�����^SC�\ua1J��J�p�������Kt���>ry]8`h6����	���-}��Bj.]�?@|�>��M�� ������5!
������z��_�/(|�y����� z`Z�L9s�Q��l<?��2��C�T��";Q��,�/21b���q9�����q_�����K���?�gR���8��]Wx$�U�i2���,$���{�eg+��\��L�K?���a�����O�qP�����i������2��'9�J�:R��"~f��R����e�����G97'J�(4��r���/}���UT����t6�b5\�������:];��`�c�
��t�����#~e���z�Kc|��}�!%��0���0�H )���]'�������=��8�
O�U�]d�����\���e������b?�#�5/G��2����z��������a$�����.�Y��:����U;)�r��cX�jY�@���%*��l�9-�3�p�(S�RH0����j-�K#I�e�v���������I�N!)�����$��9�u�Z�+HbcJaN�����;�Why��D��0�����s'��"���#����is��1g�N�(���-)�I�m��`�X����1uR@�	9,�~$Z�x|b/��x��$~h���#EH�d���$?,��v�G�3�����@Y�{6*m����h��x-�������J���������H�o��Q
����-���I�����C��0�Z�&{B��7��\�R�}����K
�����-F)biP.�^���:��'��N����K���)�8�?F}v��h ���vE��v9$m'��X���Z�[(���) ���oW
��$����I!��.h�!e),9*�62�0��02����c�(GQ����7�IFH��J�{z�h9 �m��bP�87�����NO�P��)����o'E�G��]�GmI���J�eC�u����.M�'���!?�e����Q9 G�U�i�^�$�8)��e���#��(I��k 96�)`��{/H��&JQ��*M �����ss<`��%����WR��FU���'��J���4������S�6�'>�2o�w�9{;�5]'�{O+T[L�5����V�~���ayd��a�dX��+�����?�n�O�~���V_��'�D8�>ya�>/��[~{��Kc��]���E:�_��O�Ux)^�Zx	��e-���� p('�j�B��x?�1iK~|R��PM��I~t�?��`�X[�?�l��[���|D�X��7n'�1�:������@����n�������U������^+������[ap���l-�iI��w����!�
�p��Rae���T������B���w8�,p���P�� MA�I��V&Y��I�?b����sz��TP���gP���J�����\��U.���������n��O���'T��;\��'u��I����"��sU��5��}\��/�,x�?D�2���B6�l
�T]��nOz��Yx�����1NT�M�g@�^��C�Z�f�@�	f�.P�e����I�p0�N�c�r���P>���rOw�"�j������U6-���ZF�^��<;rk�1�t1K��������Y��N�����e�,�([L�) S@v�0S�x��f����8�����wD��)XS���M�a�}������L�Y��v��x�g?��D�U��:fE�Zq���/G�I�F�����Xt�k��4^���2I�����KfX[zI�
���
�`-���!�}S���z\	��n�`�S8���sAU�VA�^��Dj\�|v��^��y~��R��A������-�{Ly=��]�R��m����~�������n+Lk��Wv��y`��[��o�����������~��[�����{��&�����?FA4�OE�Uq���Y�WsUo}$w����c0��d\�r��(/�q0�g�C�4����%@�Mg��s�G��mra�X|�7�g�+�8���8��2f�x:J�h��������ByY{k��i���_[g��u�	����,��1����1��,��1����1��,_�c��sL|u������Oa}��'xz;��,�l��a3X,C�������!&O�l.y~�"�z;���7����`A�����/C������:~Q�-�lp`�
p: ��;'%�6b1?��4�>�����4�4Yx��Z�h�~����.�I:�>G}��c�+��"��{�l��2{����K���q2HV�%��%�����f�Y�� _��V��Myy��T�3��"��C�`������[�y����G���T��e����
]C6>�e?��44N��)y��a�#������I�ZG�i{�~��RbY���4���/�l��l��#���B �%X��@����Xz!�B����E�z��� �S�5� r'�Bp��N�d�
=��Lon����+���'���<�������~T���`5m�r p��f�;T\�����QG@~�,�A���g:�D�e�^�A�M��5�$��&&;����ah�!de�B!�)������aSj]��Pb
�7@��I+T�g�J�������;u�\���[��c��B��~T�\>`��`uq?
A���q�	�2u�Btz��V�EZ����|�ul��I�BS"���B�a����"����V��J!j�t��5�X��S�F�B��������u��I.�M�$�&��V��`�[b��� 6�_7u�,��Vn�D�b��k!���Ct:.���C!�)�6�����U;��2u��i�C���>OQ,9:r5�uY!}��P�@*��Tr���Znj.1��B���UU">`���aOr�qu�uM�h	�jW�$,�rl�����y����5\�f�]w(�0e�C�<�c���� ������b�Pb
��_� ��l��C�	�@	Y`���>d�"Sh�6�7��%���{��'j�l�1t�����������GC��`�m@�^��bh�Pb��*����� #dU)DA�
�m��o�n���P��<��3Ng�G�Pp�,XB�(:�m7�(96D B�
}�T�0�1���@,XdO�F������og���Z���'>4p5���
ALY�L
c�m{A�`�45��L�h�T)�����Y2?lL�Pp,T�h9�
X�l���d����a�^��D���]h�b/��-`@H�Y�Hg�H�b�m�4D��YD���;�T�u���M,F|�������.k��J!�)K��El&��Q���@�(A��h�QW�12 +�D.��1*W!��Y��.�	44Pn_��c�B#��e3F�D)"�>!je1�C�B�"����u�������q�:DlL�P)��N��yl�
N�e��en����r�sS�4M���

�������{B��\���H7L��p�H�r������������`�����yd�r�����U
QLA�65�r!�$�P�a��
; �4iQ��}U
N�e��em��
Y�f�M��SgU�x����.�
"ck�.��Mh�f��D����-�ap�7�;�dA0u��
�0:�U
Q�q@��\�Lq�@>+��t�#�
���B��3�-���u��h���`S$jW��vQJ�@��{R!�T����
�� 
����\��;D5A���	9XMF�i��4
�(�����L��gK�9�����x����$�W����G����PK����al$PKO�Hmeta.xml��Oo�0���������T�����&-�z���6����1%��3�����?�kCu���l'���b��FH�����cX���Se^^$&���]����;6/���������:�83���]�l
�����O�~9w`�
Id���,�������Co�D	�A���a��c�����J��s�����8�)��zg�P��l�}�������3
N�_8E�r�c����p��D������a�
I�&�e��<���H^V����]Y)]���X�D�`3�"4����� ����N{5�����k��8���}�/W�5��$��4x�����Z�6���(�V�{��?n��|����9X���i�=�R���R�o8��������<�t�n�������Y���������������Ww�o��?PK���MPKO�H
styles.xml�Z�n�6��S*0l�h�v��^�\�6�-�����1%s�H���d�{����d��HK��h��� ���~���r�n�J6������b�D3�S�Q^\G���=�n�]�<�)�d"�K�5R��5a�6n�:�%����p\���FT�{�M��K��l����5��S�
�#�o��l�m�L��Ta�N����*|��JEYaMO��c����vZW�8>��j.d/��ulg��i�U�d��1a�,���|{lI4�j���M�uyK�dj��g^�$Q�������-���}19�������qf��PYe�Ce��eK�w�}��I����c\�r�Z��*����M�n�!��F�%�5w�$�{n�������<�����qQ���ED�&�=Z�Mj~KR	��!��b�,C��t��S��zh!��
��bH[H�����S���_���_��$6��@����"��y����A�*"�����m:���{�l����N�*��}��}�9d�(t����������s9N	�H������0<s�����QKJ�Krx����he�#K��{me�
�a�R��w�)T����,��-yN��Z�fo0W=�|�+�>?���Q���*�g�Kj�������w��Y-����&����\���=#{�q2�|����'�����G�g_:{K5h��O�5��f�u�~O�q��^�kn��u�����+,q!q�C��J$5��M��
eTi�M�	��8-���>��$F.6����nR�:Z�}_��NU (������x=����4
��=jM��������S�a
l6>�!�G,�4
�h�;�B�)-�;c��=Z]<y�/��9&69e,�\.�y�[V�
���r\ ������A�(�l"����}��~2�1&p����	����U;�9�j���F��
�6��t+	�vNS�gLg�<*E�+�Q���g�t��d�;��*�L����lu�\�����V���peo\�eM,�����1x.�
�R�g���JON3F4tz���������a]�q����y\5�z����u��^�I_W�^&�	h�<��C0jR�\F��h��t������}K-��W���^0�"s���Qp_�
���F����a��/��N��� Zb�*��[F�{���uu��%��O%%JR��(�y_f�F��nQX�>��9t��
a�3�*���|������:�*�0�:6�'y��g.i������x�[��`��:+�����a^�P7���w�X�4�g��{��G�D��l9p����w�?��o$Q7F2�Un�����:�v��O"���0J0�
@43ohL�1�h��x:����n����%g�^�t��/N�����KE�y;��8&
�p�)�=�.n�^���
2�6q�����(?*��R$r�
q��<i�����0g6�.��R��)�J�^�"8B��������.�<��.A8E�K��|�����=���E�:7��I3����';���;���B����0s\�'$�KsE��X0�Hn����AwGo���
b����E�)��ps�1�kN|��	�[��|gfn���'�������j>{��/�p�>�J�������;���,\k��|vZh�
r'g;��������:�U�D�UP����4����v��8g���~�X5�������\����HuB�����aM���g���&�����y0f�>z������L��:���������OSF�]�1����`��?H����T�6�=zFDg���M�����"��i��52+Y<A�%Z�N�$�?I�Xa���f���a��U/o���z�\��rui�������������Y����v�n��_�q����b�;PK����WC!PKO�Hmanifest.rdf���n�0��<�e��@/r(��j��5�X/������VQ�������F3�����a�����T4c)%�Hh��+:�.���:���+��j���*�wn*9_��-7l���(x��<O�"��8qH���	�Bi��|9��	fWQt���y� =��:���
a�R��� ��@�	L��t��NK�3��Q9�����`����<`�+�������^����\��|�hz�czu����#�`�2�O��;y���.�����vDl@��g�����UG�PK��h��PKO�HConfigurations2/popupmenu/PKO�HConfigurations2/statusbar/PKO�HConfigurations2/toolbar/PKO�HConfigurations2/menubar/PKO�HConfigurations2/floater/PKO�H'Configurations2/accelerator/current.xmlPKPKO�HConfigurations2/toolpanel/PKO�HConfigurations2/progressbar/PKO�HConfigurations2/images/Bitmaps/PKO�HObject 1/content.xml�\Y���~���@�  �����"���^,b'���![c�M����������8=�60cVWw]_UWQ�/~�K���,�Xe�Kj��Bf���ls����O�����Oj��C��T�KeV��*����Y�jV/��"[����U��\U�J�2�v�0����P��>���f��+yWM��y{����kf�;*����5/8o_������X+�z�U|��]g�/����W�u{{k�rS�
!�z�W8���]��\Qh�Dja�EMju������i^�R�K�e1�5A<�jy��������������y^M/���4��Gb�[�a�����=�t�,�;pUX��d3n�_)���74	Z�����q��d�-�J�=<�I�{\�cN>j�!o4L{�kG�G60�Y�����������p+�`��g6����l��B����U�\U�����	�b�n�*M���^�X7E���:�����3nby�������j�A���A�U3��yn���%@�%�����Z�20n��}�.�E��������1��'��\��A9�e������c�2��i���=���n�q:5T�><��L�e��14|����}9B�o%���-�� �U
j[u�.�.�Z�����h����r�Y0-��4�j�7�/�7q�L����5r��Es�C��.[�:H��^�4��-�"���q��E�r��G�;��L�b��S��iq���U��2�l�{�E6K��x�ZA���*��#�y�K��T� ;��o�D����0��`��)�F�����f�H�j�P$Q���t���e�������	=�t����!��3��wy�s�G�����,Lv�4�q��e�@
�������Ae�4����D7��y�S��m�`'2�;	<U�{TJ���������i��t��(.�$�7��Z&�>�R�6�8�4�-�.��Y!��.��z=w@'.�kp���}�E7���2J�+�H�wqi�����Q����8R� ��'�O�E����Y��	�G�z���L��6Y�=���K�IOK�KW��>�V�Q����0-�'f�m�������[��a�^�]2���rz�O�#���0j��8�+o1G�?r8�	�������8����qp<�p,��[��3G�R%q�r��d��4���NG4
�?5���
h�Ut�?�:]]40��Z�Hb�t��j���z�S���LS����m!��K���|���w��0	��uA�i�*����y^��d��[�	��$t�Vq�=Z���I�^�{��W?��_��
~������E���Y�]�%/d2Z�-T�����jY/,$����]*������:��{E�0H��uV�X\rB~^�[�����z-�K����I��I�FfQ������cRZ?�hi�
y�Y�")����VU�T�d��Bn
�=~$��-��������-(�l�{\�;��
��<Q0wB,F!��O|��
2�
�(*������~���9(�/~�_U�
�|��N3|�^�]S��bll���8A�������&qp����=�n�>��>�%H0������*UDqT�(���I��M�@���C)-�����h����j?J��E�]gt�������7��#�"D��o����`s"mH�ok<y�)�2��oB�}��I�?n8��!H��BW�'��;��w���]q����n�?��Q�������8�c^�Q����i�m�8=D��4��*�^��=�N�sn�d'K���x."��[�k#c�����G����9�.������K�b�/	���QS�F��������9n{�m�_�m��n�t�m����Og��Wq��>�s������9�}��m�A��;
�.�n�q.���8P������^U��D��i4���-t��0���5D-�q;F��>��}-��wVm�u��Vz�n�r ���X�n�(����m'�����}n��������FB���J���!k\��,��g_0�ug]X��/d����>�Z��;
E�C������Z���E
��=[�	m�������y8%3���.����m���t����%�9n���o
�w}wjL}z������<����e��=��fB�9�}�C<�r��Q�^n&t�? ~����3G�8~{�����������jm)A��Kz��S8��9H&�%��=�t��KF��$s��������s��q�6�����'%��6��6������������sI�����{| ����(�g:P�l�b�\��`��]��%#�����[��7��gL�'�@�u&��9�f�@f@��b�pa
!!�g�=#����K��W�M��3����a�_�m@���P7�mr[�"�c�L6s��L��w��L�m����<T����Jg6�y�	�������s��x63m=���3I�%ps���`@�K2����B9�p��W���4r�����c�u�tL�G�C�����`P�g��B��@�
�< �&YP�d������jP�	?����7	ZpD�	�W�C\@5n��y$��c.�0����g��������{@~5�yH#o.�	����R�����3I�0R@�;t=&�7�c����6�U��$���m�%��m���jp�����7:��!P05��\p�t�����
&�#�#0v���7�hD���B7h0���_���ox�z	��(d���A/#���S��CL�_��<������$�N����<W}�f��\8.��mO}$�)�������s�����������r�PK���I�
�EPKO�HObject 1/styles.xml���r� ���)2��Cog&��>@��c�)F���}�.�&)�K[�/	I��i���cJs��$[o��
.�C���9}JN�?{(KN.��5�&��C0��b��h<$����KR3�
��09�pHcj�����=�
L���-y����P](���kk�K�Z�%����-�Ap�~H��4����}�U�l��!o��3��Jx���	��i��34������>1�������jW�8=�_/[��-��E�T�&:�H�z�������o7��h���.�+n�
pz�D��
P_�e�)�\�'Z�C����k@�9�2����l��=�Z�[g��J�U���#;�v�����aq����<����l�<�����#��|�*U����V�#�|)��3�ex�!�1�py�������:7������-uK��������&����t���/��PK�����yPKO�HObject 1/meta.xml��=o�0���
�f����`�:t�����E���`#cB~~�)����}��}�s�=7�wB�I�r��x��RU9y}y�7d[�e�p����oPY�A[z�V������(��NvL�
v�r�[Ts[��M;�Z����m�0t��6�i������W��M�(�k:i3;�Wjd�JZ�k��O�..
���LWF���.l����I�p?w�ci��!a"���-F�b��(^dN�B����;�n�����>�h���?��6��:���5���$V����e���~����P|PKTnw"JPKO�HObjectReplacements/Object 1�]
l�~��;�����svB8�8��A �9.�"?�H�0Gj����N���(?*(H@C���!��J�FED�m�Z$�� ���RT"@DJ����y���C{�I���h�������������q6��oH�
A'��Vpo�FN����;,������b�"����]�}����qZD�n<j��"!�H��F�B�`�<-������p/�Q��/O��[\,4/�$��FF����_��_)���������S{J�N���>L���a<i�-}����n�:14��fhMb]�F�F�v�}�K�;�Z��*�>Vpp��T�i'M���[h��$�uC��0����"'��3����M1Eb��Y5�+Fso���/�u$C�uL�{,x�teb�;��^D��w1	|l�ETx����g�ETx�����^�"����~���"�
�[�I���^D��w1	��U�Qm��BL�h9�ETx�S��>[�j�x�X����g��j�b����ETx������^D���
��
���Qm��B,x����ETx����a��j�b��G��^D���
��
�8�z��+��7�G��Qm��B,x���"��y���W�o����V�^!��?Vy��{�X�����q�W�ox�_fx�X�����Y�W�ox���x�X�����A�W�ox�_cy�X�����)�W�ox���y�X����/h�,����0��r�?(d>"�[q_!������n�3�mA��y�=�y~��e�k|���<7�{�t-�J��i���;���t��2���kU6���@��wL�U�v������|�����N63��F����W���^�g||��]�%�3t(�3�WwN��/d|K$�����V���43�C���x�{���������b|���p����:�
���HM���
J|���2��J|�x�u�2��j��[��gUO�Jg�P6gP�(�F��LlI
��oL^�7�p@�(�iV�~.��y��Xz��������\6���SJp;���4����!�?�i��}����16���q�rg.b�w�K���8��S�.��'6���&���Yn���j�1��u��b����&��a�Q�N�:�p���Bt&j�.R��
���:6���Yl*j�M}���%�M�aY��ZkS6���m�XV�����MOw;6��O{l<�i��+��b��jSQkm�����86]�5G��9��M��wl���b6G��)�'�{��.����.�.-�����"%k���4�m�����&?G���3�`k��.�v	��%���&�QM)L�����$JHwRk��7V����%���������}�w��o�������._����-�.ne�v��o��o����F�o�/��X����{}�����@�{���
_�]a�[ ��������o��W�_p�o�e�uX��uIH� �/Em��Mp#�(l�x~>����H�C�#�G���.�
.����XjP�n��$�1��
��Q��������'���u[�,��D;T\�1�i.@I5�),����yd=��G�~�p��5|��D3�+-���X~�J��������k�K�7T�����25�\���fU�Y�jV�����f�U�P��W�Y5jV��U�f��Y
jV������f����5+�f��Y���Y��j�5k��U����Y�&���%�^����Oc$Y�u{1�1�����v��Awu�\N������3�y��O����a���5@w�����y�q;�r\F(�|����n'e�I�9	h�+1�4ch��X�B_#������g��F�gz	z�
x�������K[�?�J���2	-��	(�,c~*I��er��T^Y&�T�
:���H@�d���JRjV�,oW�)Y�%�)�9|/f�������mc
�����*D���������"�7����n!��+�~d>�E4o���8��B4��<}�Jz�|����n!3�{�Hw!�9�W(���0l������<���p�����(��8,�����Z�Qb����rs�m)���e8�y�/q��*�8(9��B�|&�6J�xh�St��e�_�
p0�2~��:���O9����0��8��D�Sx�5��v�D�)���S���������J�|���d� �g"Hw`��4�9Q��O��n#�Z?�c#[3�������X�.g'J��8��O����p��"�]
I�KK�PlB1r�!���:�3?Dn�k��i����2Q�y����V�]N�(�V���#$��V����|<Z@v�����}@�f�����	������X�����S;N~������p]�V�T �O�x�H�{��Z�!9p
gx�]��!'h���3�f��PKI�C��	|PKO�HMETA-INF/manifest.xml�UMO!��+6����Y��AS���=dg�>V�����nkLMW�3<�{�&K��98/�)��^����4MI^������F�Y��b3��:���$���r/}a�_�(l��"h0X|�k�.�10$��l�WKy\�V[t��[����}$���J�W-������c������0��I}��W~�����<��p�<������cEj�K��T< ���4�s3�Y��E,����c�j@~tR�+���&G]U�Au�[���Z6��)��q!@A�c"8���O��[��Ih�T�2�|{������:�uC������T�w=�/�'h�H<�9��2O�������d*L�^��j�_k�4����������CM#�+������d�	PK���-��PKO�H�l9�..mimetypePKO�H�g�SH�H�TThumbnails/thumbnail.pngPKO�H{�#�{a(��settings.xmlPKO�H����al$��content.xmlPKO�H���M!�meta.xmlPKO�H����WC!
��styles.xmlPKO�H��h����manifest.rdfPKO�H��Configurations2/popupmenu/PKO�H��Configurations2/statusbar/PKO�H6�Configurations2/toolbar/PKO�Hl�Configurations2/menubar/PKO�H��Configurations2/floater/PKO�H'��Configurations2/accelerator/current.xmlPKO�H/�Configurations2/toolpanel/PKO�Hg�Configurations2/progressbar/PKO�H��Configurations2/images/Bitmaps/PKO�H���I�
�E��Object 1/content.xmlPKO�H�����y��Object 1/styles.xmlPKO�HTnw"J��Object 1/meta.xmlPKO�HI�C��	|UObjectReplacements/Object 1PKO�H���-��'
META-INF/manifest.xmlPK{�
cache_hash_index_metapage_base_01application/octet-stream; name=cache_hash_index_metapage_base_01Download
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 178463f..6d3ad78 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -454,7 +454,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -776,12 +776,13 @@ _hash_splitbucket(Relation rel,
 	obuf = _hash_getbuf(rel, start_oblkno, HASH_WRITE, LH_BUCKET_PAGE);
 	opage = BufferGetPage(obuf);
 	oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
+	oopaque->hasho_prevblkno = maxbucket;
 
 	npage = BufferGetPage(nbuf);
 
 	/* initialize the new bucket's primary page */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 4825558..0ccc53e 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -125,10 +125,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -186,10 +184,26 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage)rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+	}
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -205,46 +219,43 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 									  metap->hashm_lowmask);
 
 		blkno = BUCKET_TO_BLKNO(metap, bucket);
+		_hash_getlock(rel, blkno, HASH_SHARE);
 
-		/* Release metapage lock, but keep pin. */
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
+		if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
 		{
-			if (oldblkno == blkno)
-				break;
-			_hash_droplock(rel, oldblkno, HASH_SHARE);
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
 		}
-		_hash_getlock(rel, blkno, HASH_SHARE);
 
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		_hash_relbuf(rel, buf);
+		_hash_droplock(rel, blkno, HASH_SHARE);
+
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
+	/* Done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	/* Update scan opaque state to show we have lock on the bucket */
 	so->hashso_bucket = bucket;
 	so->hashso_bucket_valid = true;
 	so->hashso_bucket_blkno = blkno;
 
-	/* Fetch the primary bucket page for the bucket */
-	buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
-
 	/* If a backwards scan is requested, move to the end of the chain */
 	if (ScanDirectionIsBackward(dir))
 	{
cache_hash_index_metapage_onAmit_v3.patchapplication/octet-stream; name=cache_hash_index_metapage_onAmit_v3.patchDownload
commit 0aa39891b73aa13cfa8499758984263bd75ecc84
Author: mithun <mithun@localhost.localdomain>
Date:   Fri Jul 22 11:35:37 2016 +0530

    Commit cache metapage.

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 6dfd411..4491602 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -447,7 +447,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -865,7 +865,7 @@ _hash_splitbucket(Relation rel,
 	 * vacuum will clear page_has_garbage flag after deleting such tuples.
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_PAGE_HAS_GARBAGE | LH_BUCKET_OLD_PAGE_SPLIT;
-
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -873,7 +873,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_NEW_PAGE_SPLIT;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index b0cb638..2fbdd5a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -146,10 +146,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -207,101 +205,74 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Conditionally get the lock on primary bucket page for search while
-	 * holding lock on meta page. If we have to wait, then release the meta
-	 * page lock and retry it in a hard way.
-	 */
-	bucket = _hash_hashkey2bucket(hashkey,
-								  metap->hashm_maxbucket,
-								  metap->hashm_highmask,
-								  metap->hashm_lowmask);
-
-	blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-	/* Fetch the primary bucket page for the bucket */
-	buf = ReadBuffer(rel, blkno);
-	if (!ConditionalLockBufferShared(buf))
+	if (rel->rd_amcache != NULL)
 	{
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
-		LockBuffer(buf, HASH_READ);
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		metap = (HashMetaPage)rel->rd_amcache;
 	}
 	else
 	{
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
 		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	if (retry)
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
 	{
 		/*
-		 * Loop until we get a lock on the correct target bucket.  We get the
-		 * lock on primary bucket page and retain the pin on it during read
-		 * operation to prevent the concurrent splits.  Retaining pin on a
-		 * primary bucket page ensures that split can't happen as it needs to
-		 * acquire the cleanup lock on primary bucket page.  Acquiring lock on
-		 * primary bucket and rechecking if it is a target bucket is mandatory
-		 * as otherwise a concurrent split followed by vacuum could remove
-		 * tuples from the selected bucket which otherwise would have been
-		 * visible.
+		 * Compute the target bucket number, and convert to block number.
 		 */
-		for (;;)
-		{
-			/*
-			 * Compute the target bucket number, and convert to block number.
-			 */
-			bucket = _hash_hashkey2bucket(hashkey,
-										  metap->hashm_maxbucket,
-										  metap->hashm_highmask,
-										  metap->hashm_lowmask);
-
-			blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-			/* Release metapage lock, but keep pin. */
-			_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
-
-			/*
-			 * If the previous iteration of this loop locked what is still the
-			 * correct target bucket, we are done.  Otherwise, drop any old
-			 * lock and lock what now appears to be the correct bucket.
-			 */
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
 
-			/* Fetch the primary bucket page for the bucket */
-			buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-			/*
-			 * Reacquire metapage lock and check that no bucket split has
-			 * taken place while we were awaiting the bucket lock.
-			 */
-			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-			oldblkno = blkno;
+		if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
 		}
+
+		_hash_relbuf(rel, buf);
+
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
+	/* Done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	/* Update scan opaque state to show we have lock on the bucket */
 	so->hashso_bucket = bucket;
 	so->hashso_bucket_valid = true;
-
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
-
 	so->hashso_bucket_buf = buf;
 
 	/*
#2Jeff Janes
jeff.janes@gmail.com
In reply to: Mithun Cy (#1)
Re: Cache Hash Index meta page.

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number
in hasho_prevblkno of bucket primary page (which will always be NULL other
wise, so reusing it here for this cause!!!).

If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?

What happens on a system which has gone through pg_upgrade? Are we
sure that those on-disk representations will always have
InvalidBlockNumber in that fields? If not, then it seems we can't
support pg_upgrade at all. If so, I don't see a provision for
properly dealing with pages which still have InvalidBlockNumber in
them. Unless I am missing something, the code below will always think
it found the right bucket in such cases, won't it?

if (opaque->hasho_prevblkno <= metap->hashm_maxbucket)

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#2)
Re: Cache Hash Index meta page.

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Is this really safe? The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is. I do not think the hash code is equally robust
about stale data in its metapage.

Idea is to cache the Meta page data in rd_amcache and store maxbucket number
in hasho_prevblkno of bucket primary page (which will always be NULL other
wise, so reusing it here for this cause!!!).

If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?

No, because on other pages that still means what it used to. Nonetheless,
I agree that's a particularly ugly wart, and probably a dangerous one.

What happens on a system which has gone through pg_upgrade?

That being one reason why. It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#3)
Re: Cache Hash Index meta page.

On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if the
bucket is not split after caching the meta page.

Is this really safe? The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is. I do not think the hash code is equally robust
about stale data in its metapage.

I think stale data in metapage could only cause problem if it leads to
a wrong calculation of bucket based on hashkey. I think that
shouldn't happen. It seems to me that the safety comes from the fact
that required fields (lowmask/highmask) to calculate the bucket won't
be changed more than once without splitting the current bucket (which
we are going to scan). Do you see a problem in hashkey to bucket
mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by
one additional table half or do you have something else in mind?

What happens on a system which has gone through pg_upgrade?

That being one reason why. It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.

Good idea.

- if (retry)
+ if (opaque->hasho_prevblkno <=  metap->hashm_maxbucket)

This code seems to be problematic with respect to upgrades, because
hasho_prevblkno will be initialized to 0xFFFFFFFF without the patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Mithun Cy (#1)
Re: Cache Hash Index meta page.

On 07/22/2016 06:02 AM, Mithun Cy wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

Could you provide a rebased patch based on Amit's v5 ?

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jesper Pedersen (#5)
1 attachment(s)
Re: Cache Hash Index meta page.

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.

Attachments:

cache_hash_index_metapage_onAmit_v5_01.patchapplication/octet-stream; name=cache_hash_index_metapage_onAmit_v5_01.patchDownload
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index f51c313..a8978dc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -474,7 +474,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -892,7 +892,7 @@ _hash_splitbucket(Relation rel,
 	 * vacuum will clear page_has_garbage flag after deleting such tuples.
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_PAGE_HAS_GARBAGE | LH_BUCKET_OLD_PAGE_SPLIT;
-
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -900,7 +900,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_NEW_PAGE_SPLIT;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index e3a99cf..6e3fd13 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -112,13 +112,20 @@ _hash_readprev(Relation rel,
 	 * comments in _hash_readnext to know the reason of retaining pin.
 	 */
 	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+	{
 		_hash_chgbufaccess(rel, *bufp, HASH_READ, HASH_NOLOCK);
+
+		/* If it is a bucket page there will not be a prevblkno. */
+		*bufp = InvalidBuffer;
+		return;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
+
 	if (BlockNumberIsValid(blkno))
 	{
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
@@ -153,10 +160,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -214,101 +219,86 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Conditionally get the lock on primary bucket page for search while
-	 * holding lock on meta page. If we have to wait, then release the meta
-	 * page lock and retry it in a hard way.
-	 */
-	bucket = _hash_hashkey2bucket(hashkey,
-								  metap->hashm_maxbucket,
-								  metap->hashm_highmask,
-								  metap->hashm_lowmask);
-
-	blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-	/* Fetch the primary bucket page for the bucket */
-	buf = ReadBuffer(rel, blkno);
-	if (!ConditionalLockBufferShared(buf))
+	if (rel->rd_amcache != NULL)
 	{
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
-		LockBuffer(buf, HASH_READ);
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		metap = (HashMetaPage)rel->rd_amcache;
 	}
 	else
 	{
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
 		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	if (retry)
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
 	{
 		/*
-		 * Loop until we get a lock on the correct target bucket.  We get the
-		 * lock on primary bucket page and retain the pin on it during read
-		 * operation to prevent the concurrent splits.  Retaining pin on a
-		 * primary bucket page ensures that split can't happen as it needs to
-		 * acquire the cleanup lock on primary bucket page.  Acquiring lock on
-		 * primary bucket and rechecking if it is a target bucket is mandatory
-		 * as otherwise a concurrent split followed by vacuum could remove
-		 * tuples from the selected bucket which otherwise would have been
-		 * visible.
+		 * Compute the target bucket number, and convert to block number.
 		 */
-		for (;;)
-		{
-			/*
-			 * Compute the target bucket number, and convert to block number.
-			 */
-			bucket = _hash_hashkey2bucket(hashkey,
-										  metap->hashm_maxbucket,
-										  metap->hashm_highmask,
-										  metap->hashm_lowmask);
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
 
-			blkno = BUCKET_TO_BLKNO(metap, bucket);
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-			/* Release metapage lock, but keep pin. */
-			_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Check if this bucket is split after we have cached the metapage.
+		 * To do this we need to check whether cached maxbucket number is less
+		 * than or equal to maxbucket number stored in bucket page, which was
+		 * set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after metapage was cached.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
 
-			/*
-			 * If the previous iteration of this loop locked what is still the
-			 * correct target bucket, we are done.  Otherwise, drop any old
-			 * lock and lock what now appears to be the correct bucket.
-			 */
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+		_hash_relbuf(rel, buf);
 
-			/* Fetch the primary bucket page for the bucket */
-			buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
 
-			/*
-			 * Reacquire metapage lock and check that no bucket split has
-			 * taken place while we were awaiting the bucket lock.
-			 */
-			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-			oldblkno = blkno;
-		}
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
+	/* Done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	/* Update scan opaque state to show we have lock on the bucket */
 	so->hashso_bucket = bucket;
 	so->hashso_bucket_valid = true;
-
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
-
 	so->hashso_bucket_buf = buf;
 
 	/*
#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#6)
Re: Cache Hash Index meta page.

On Tue, Sep 6, 2016 at 12:20 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I think you want to say based on patch in the below mail:
/messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com

It is better if we can provide the link for a patch on which the
current patch is based on, that will help people to easily identify
the dependent patches.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Mithun Cy (#6)
Re: Cache Hash Index meta page.

On 09/05/2016 02:50 PM, Mithun Cy wrote:

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" <jesper.pedersen@redhat.com>
wrote:

Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.

I think that the

+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

trick should be documented in the README, as hashm_maxbucket is defined
as uint32 where as hasho_prevblkno is a BlockNumber.

(All bucket variables should probably use the Bucket definition instead
of uint32).

For the archives, this patch conflicts with the WAL patch [1]/messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com.

[1]: /messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com
/messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jesper Pedersen (#8)
1 attachment(s)
Re: Cache Hash Index meta page.

On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <jesper.pedersen@redhat.com

wrote:

For the archives, this patch conflicts with the WAL patch [1].

[1] /messages/by-id/CAA4eK1JS+SiRSQBzEFp

nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1]Concurrent Hash index. </messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com&gt; and
Amit's wal for hash index patch[2]Wal for hash index. </messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com&gt; -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com together.

[1]: Concurrent Hash index. </messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com&gt;
</messages/by-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm+qn0jZX31-obXrJw@mail.gmail.com&gt;
[2]: Wal for hash index. </messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com&gt; -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
</messages/by-id/CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com&gt;
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_metapage_onAmit_05_02_with_wall.patchapplication/octet-stream; name=cache_hash_index_metapage_onAmit_05_02_with_wall.patchDownload
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index 84cb339..095fb7e 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -527,8 +527,12 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	 * primary bucket.  We don't need to aqcuire buffer lock to fix the
 	 * primary bucket or if the previous bucket is same as write bucket, as we
 	 * already have lock on those buckets.
+	 * If page is Bucket primary page, then prevblkno will be set with the
+	 * value of maxbucketsize when it was split/created. So we explicitly check
+	 * for LH_BUCKET_PAGE.
 	 */
-	if (BlockNumberIsValid(prevblkno))
+	if (BlockNumberIsValid(prevblkno) &&
+		!(ovflopaque->hasho_flag & LH_BUCKET_PAGE))
 	{
 		if (prevblkno == bucketblkno)
 			prevbuf = bucketbuf;
@@ -601,7 +605,8 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 	MarkBufferDirty(ovflbuf);
 
-	if (BufferIsValid(prevbuf))
+	if (BufferIsValid(prevbuf) &&
+		!(ovflopaque->hasho_flag & LH_BUCKET_PAGE))
 	{
 		Page		prevpage = BufferGetPage(prevbuf);
 		HashPageOpaque prevopaque = (HashPageOpaque) PageGetSpecialPointer(prevpage);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 716b037..091532c 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -439,6 +439,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
 	for (i = 0; i < num_buckets; i++)
 	{
 		BlockNumber blkno;
+		HashPageOpaque pageopaque;
 
 		/* Allow interrupts, in case N is huge */
 		CHECK_FOR_INTERRUPTS();
@@ -446,6 +447,9 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
 		blkno = BUCKET_TO_BLKNO(metap, i);
 		buf = _hash_getnewbuf(rel, blkno, forkNum);
 		_hash_initbuf(buf, i, LH_BUCKET_PAGE, false);
+		pageopaque =
+			(HashPageOpaque) PageGetSpecialPointer(BufferGetPage(buf));
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		MarkBufferDirty(buf);
 
 		log_newpage(&rel->rd_node,
@@ -886,6 +890,8 @@ restart_expand:
 	 * Okay to proceed with split.  Update the metapage bucket mapping info.
 	 */
 	metap->hashm_maxbucket = new_bucket;
+	nopaque->hasho_prevblkno = metap->hashm_maxbucket;
+	oopaque->hasho_prevblkno = metap->hashm_maxbucket;
 
 	if (new_bucket > metap->hashm_highmask)
 	{
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 3d8b464..409ad6d 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -112,7 +112,13 @@ _hash_readprev(Relation rel,
 	 * comments in _hash_readnext to know the reason of retaining pin.
 	 */
 	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+	{
 		_hash_chgbufaccess(rel, *bufp, HASH_READ, HASH_NOLOCK);
+
+		/* If it is a bucket page there will not be a prevblkno. */
+		*bufp = InvalidBuffer;
+		return;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
@@ -153,10 +159,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -213,103 +217,87 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 										   cur->sk_subtype);
 
 	so->hashso_sk_hash = hashkey;
-
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Conditionally get the lock on primary bucket page for search while
-	 * holding lock on meta page. If we have to wait, then release the meta
-	 * page lock and retry it in a hard way.
-	 */
-	bucket = _hash_hashkey2bucket(hashkey,
-								  metap->hashm_maxbucket,
-								  metap->hashm_highmask,
-								  metap->hashm_lowmask);
-
-	blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-	/* Fetch the primary bucket page for the bucket */
-	buf = ReadBuffer(rel, blkno);
-	if (!ConditionalLockBufferShared(buf))
+	if (rel->rd_amcache != NULL)
 	{
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
-		LockBuffer(buf, HASH_READ);
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		metap = (HashMetaPage)rel->rd_amcache;
 	}
 	else
 	{
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
 		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	if (retry)
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
 	{
 		/*
-		 * Loop until we get a lock on the correct target bucket.  We get the
-		 * lock on primary bucket page and retain the pin on it during read
-		 * operation to prevent the concurrent splits.  Retaining pin on a
-		 * primary bucket page ensures that split can't happen as it needs to
-		 * acquire the cleanup lock on primary bucket page.  Acquiring lock on
-		 * primary bucket and rechecking if it is a target bucket is mandatory
-		 * as otherwise a concurrent split followed by vacuum could remove
-		 * tuples from the selected bucket which otherwise would have been
-		 * visible.
+		 * Compute the target bucket number, and convert to block number.
 		 */
-		for (;;)
-		{
-			/*
-			 * Compute the target bucket number, and convert to block number.
-			 */
-			bucket = _hash_hashkey2bucket(hashkey,
-										  metap->hashm_maxbucket,
-										  metap->hashm_highmask,
-										  metap->hashm_lowmask);
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
 
-			blkno = BUCKET_TO_BLKNO(metap, bucket);
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		TestForOldSnapshot(scan->xs_snapshot, rel, page);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-			/* Release metapage lock, but keep pin. */
-			_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Check if this bucket is split after we have cached the metapage.
+		 * To do this we need to check whether cached maxbucket number is less
+		 * than or equal to maxbucket number stored in bucket page, which was
+		 * set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after metapage was cached.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
 
-			/*
-			 * If the previous iteration of this loop locked what is still the
-			 * correct target bucket, we are done.  Otherwise, drop any old
-			 * lock and lock what now appears to be the correct bucket.
-			 */
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+		_hash_relbuf(rel, buf);
 
-			/* Fetch the primary bucket page for the bucket */
-			buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
 
-			/*
-			 * Reacquire metapage lock and check that no bucket split has
-			 * taken place while we were awaiting the bucket lock.
-			 */
-			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-			oldblkno = blkno;
-		}
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
+	/* Done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	/* Update scan opaque state to show we have lock on the bucket */
 	so->hashso_bucket = bucket;
 	so->hashso_bucket_valid = true;
-
-
-	page = BufferGetPage(buf);
-	TestForOldSnapshot(scan->xs_snapshot, rel, page);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
-
 	so->hashso_bucket_buf = buf;
 
 	/*
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index d1d30bc..a4396a8 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,7 +60,14 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
-	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
+	/*
+	 * hasho_prevblkno stores previous ovfl (or bucket) blkno. And, there is a
+	 * special case if given page is bucket primary page then hasho_prevblkno
+	 * will store value of max bucket number during time of that bucket
+	 * creation/split. This will be used to verify if whether cached metapage
+	 * can be used or has to be reread.
+	 */
+	BlockNumber hasho_prevblkno;
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
 	uint16		hasho_flag;		/* page type code, see above */
#10Jeff Janes
jeff.janes@gmail.com
In reply to: Mithun Cy (#9)
Re: Cache Hash Index meta page.

On Tue, Sep 13, 2016 at 12:55 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen <
jesper.pedersen@redhat.com> wrote:

For the archives, this patch conflicts with the WAL patch [1].

[1] /messages/by-id/CAA4eK1JS+SiRSQBzEFp

nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com

Updated the patch it applies over Amit's concurrent hash index[1] and
Amit's wal for hash index patch[2] together.

I think that this needs to be updated again for v8 of concurrent and v5 of
wal

Thanks,

Jeff

#11Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jeff Janes (#10)
1 attachment(s)
Re: Cache Hash Index meta page.

Attachments:

cache_hash_index_metapage_onAmit_05_03_with_wall.patchapplication/octet-stream; name=cache_hash_index_metapage_onAmit_05_03_with_wall.patchDownload
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index 6292e6c..be8d93c 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -528,8 +528,12 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	 * primary bucket.  We don't need to aqcuire buffer lock to fix the
 	 * primary bucket or if the previous bucket is same as write bucket, as we
 	 * already have lock on those buckets.
+	 * If page is Bucket primary page, then prevblkno will be set with the
+	 * value of maxbucketsize when it was split/created. So we explicitly check
+	 * for LH_BUCKET_PAGE.
 	 */
-	if (BlockNumberIsValid(prevblkno))
+	if (BlockNumberIsValid(prevblkno) &&
+		!(ovflopaque->hasho_flag & LH_BUCKET_PAGE))
 	{
 		if (prevblkno == bucketblkno)
 			prevbuf = bucketbuf;
@@ -602,7 +606,8 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 	MarkBufferDirty(ovflbuf);
 
-	if (BufferIsValid(prevbuf))
+	if (BufferIsValid(prevbuf) &&
+		!(ovflopaque->hasho_flag & LH_BUCKET_PAGE))
 	{
 		Page		prevpage = BufferGetPage(prevbuf);
 		HashPageOpaque prevopaque = (HashPageOpaque) PageGetSpecialPointer(prevpage);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 8e00d34..4b5c27d 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -440,6 +440,7 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
 	for (i = 0; i < num_buckets; i++)
 	{
 		BlockNumber blkno;
+		HashPageOpaque pageopaque;
 
 		/* Allow interrupts, in case N is huge */
 		CHECK_FOR_INTERRUPTS();
@@ -447,6 +448,9 @@ _hash_init(Relation rel, double num_tuples, ForkNumber forkNum)
 		blkno = BUCKET_TO_BLKNO(metap, i);
 		buf = _hash_getnewbuf(rel, blkno, forkNum);
 		_hash_initbuf(buf, i, LH_BUCKET_PAGE, false);
+		pageopaque =
+			(HashPageOpaque) PageGetSpecialPointer(BufferGetPage(buf));
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		MarkBufferDirty(buf);
 
 		log_newpage(&rel->rd_node,
@@ -881,6 +885,8 @@ restart_expand:
 	 * Okay to proceed with split.  Update the metapage bucket mapping info.
 	 */
 	metap->hashm_maxbucket = new_bucket;
+	nopaque->hasho_prevblkno = metap->hashm_maxbucket;
+	oopaque->hasho_prevblkno = metap->hashm_maxbucket;
 
 	if (new_bucket > metap->hashm_highmask)
 	{
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 0df64a8..41a3cf0 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -112,7 +112,13 @@ _hash_readprev(Relation rel,
 	 * comments in _hash_readnext to know the reason of retaining pin.
 	 */
 	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+	{
 		_hash_chgbufaccess(rel, *bufp, HASH_READ, HASH_NOLOCK);
+
+		/* If it is a bucket page there will not be a prevblkno. */
+		*bufp = InvalidBuffer;
+		return;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
@@ -153,10 +159,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -214,96 +218,82 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Conditionally get the lock on primary bucket page for search while
-	 * holding lock on meta page. If we have to wait, then release the meta
-	 * page lock and retry it in a hard way.
-	 */
-	bucket = _hash_hashkey2bucket(hashkey,
-								  metap->hashm_maxbucket,
-								  metap->hashm_highmask,
-								  metap->hashm_lowmask);
-
-	blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-	/* Fetch the primary bucket page for the bucket */
-	buf = ReadBuffer(rel, blkno);
-	if (!ConditionalLockBufferShared(buf))
+	if (rel->rd_amcache != NULL)
 	{
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
-		LockBuffer(buf, HASH_READ);
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		metap = (HashMetaPage)rel->rd_amcache;
 	}
 	else
 	{
-		_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
 		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	if (retry)
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
 	{
 		/*
-		 * Loop until we get a lock on the correct target bucket.  We get the
-		 * lock on primary bucket page and retain the pin on it during read
-		 * operation to prevent the concurrent splits.  Retaining pin on a
-		 * primary bucket page ensures that split can't happen as it needs to
-		 * acquire the cleanup lock on primary bucket page.  Acquiring lock on
-		 * primary bucket and rechecking if it is a target bucket is mandatory
-		 * as otherwise a concurrent split followed by vacuum could remove
-		 * tuples from the selected bucket which otherwise would have been
-		 * visible.
+		 * Compute the target bucket number, and convert to block number.
 		 */
-		for (;;)
-		{
-			/*
-			 * Compute the target bucket number, and convert to block number.
-			 */
-			bucket = _hash_hashkey2bucket(hashkey,
-										  metap->hashm_maxbucket,
-										  metap->hashm_highmask,
-										  metap->hashm_lowmask);
-
-			blkno = BUCKET_TO_BLKNO(metap, bucket);
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		TestForOldSnapshot(scan->xs_snapshot, rel, page);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-			/* Release metapage lock, but keep pin. */
-			_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Check if this bucket is split after we have cached the metapage.
+		 * To do this we need to check whether cached maxbucket number is less
+		 * than or equal to maxbucket number stored in bucket page, which was
+		 * set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after metapage was cached.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
 
-			/*
-			 * If the previous iteration of this loop locked what is still the
-			 * correct target bucket, we are done.  Otherwise, drop any old
-			 * lock and lock what now appears to be the correct bucket.
-			 */
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+		_hash_relbuf(rel, buf);
 
-			/* Fetch the primary bucket page for the bucket */
-			buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
 
-			/*
-			 * Reacquire metapage lock and check that no bucket split has
-			 * taken place while we were awaiting the bucket lock.
-			 */
-			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-			oldblkno = blkno;
-		}
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	TestForOldSnapshot(scan->xs_snapshot, rel, page);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	/* Done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index c0434f5..8dd8130 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,7 +60,14 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
-	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
+	/*
+	 * hasho_prevblkno stores previous ovfl (or bucket) blkno. And, there is a
+	 * special case if given page is bucket primary page then hasho_prevblkno
+	 * will store value of max bucket number during time of that bucket
+	 * creation/split. This will be used to verify if whether cached metapage
+	 * can be used or has to be reread.
+	 */
+	BlockNumber hasho_prevblkno;
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
 	uint16		hasho_flag;		/* page type code, see above */
#12Michael Paquier
michael.paquier@gmail.com
In reply to: Mithun Cy (#11)
Re: Cache Hash Index meta page.

On Thu, Sep 29, 2016 at 12:55 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that this needs to be updated again for v8 of concurrent and v5

of wal

Adding the rebased patch over [1] + [2]

[1] Concurrent Hash index.
[2] Wal for hash index.

Moved to next CF.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Jeff Janes
jeff.janes@gmail.com
In reply to: Mithun Cy (#1)
Re: Cache Hash Index meta page.

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try
to read meta page buffer twice just to make sure bucket is not split after
we found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

[1] Concurrent Hash Indexes <https://commitfest.postgresql.org/10/647/&gt;

Hi Mithun,

Can you describe your benchmarking machine? Your benchmarking data went up
to 128 clients. But how many cores does the machine have? Are you testing
how well it can use the resources it has, or how well it can deal with
oversubscription of the resources?

Also, was the file supposed to be named .ods? I didn't find it to be
openable as an .odc file.

Cheers,

Jeff

#14Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jeff Janes (#13)
Re: Cache Hash Index meta page.

On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Can you describe your benchmarking machine? Your benchmarking data went

up to 128 clients. But how many cores does the machine have? Are

you testing how well it can use the resources it has, or how well it can

deal with oversubscription of the resources?

It is a power2 machine with 192 hyperthreads.

Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191

Also, was the file supposed to be named .ods? I didn't find it to be

openable as an .odc file.

Yes .ods right it is a spreadsheet in ODF.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#15Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Mithun Cy (#11)
Re: Cache Hash Index meta page.

On 09/28/2016 11:55 AM, Mithun Cy wrote:

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

I think that this needs to be updated again for v8 of concurrent and v5

of wal

Adding the rebased patch over [1] + [2]

As the concurrent hash index patch was committed in 6d46f4 this patch
needs a rebase.

I have moved this submission to the next CF.

Thanks for working on this !

Best regards,
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Jesper Pedersen (#15)
1 attachment(s)
Re: Cache Hash Index meta page.

On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

As the concurrent hash index patch was committed in 6d46f4 this patch

needs a rebase.

Thanks Jesper,

Adding the rebased patch.

I have re-run the pgbench readonly tests with below modification.

"alter table pgbench_accounts drop constraint pgbench_accounts_pkey"
postgres
"create index pgbench_accounts_pkey on pgbench_accounts using hash(aid)"
postgres

*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode = -M prepared -S (prepared readonly mode).

Machine used:
power2 with sufficient ram for above shared_buffer.

#############lscpu
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A

*Clients *

*Cache Meta Page patch *

*Base code with amits changes*

* %imp*

1

17062.513102

17218.353817

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

*64*

*576090.293018*

*510665.044842*

*12.8117733604*

*96*

*686813.187117*

*504950.885867*

*36.0158396272*

*104*

*688932.67516*

*498365.55841*

*38.2384202789*

*128*

*730728.526322*

*409011.008553*

*78.6574226711*

Appears there is a good improvement at higher clients.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_06.patchapplication/octet-stream; name=cache_hash_index_meta_page_06.patchDownload
commit 91f2000b3ddf3e8ff136db0d9ad21721fd2f7e47
Author: mithun <mithun@localhost.localdomain>
Date:   Tue Dec 6 01:03:24 2016 +0530

    meta page cache rebase

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 44332e7..0c9c48a 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -478,7 +478,7 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -885,7 +885,7 @@ _hash_splitbucket(Relation rel,
 	 * operation end, we clear split-in-progress flag.
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
-
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -893,7 +893,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 8d43b38..8323f60 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+		return;
+
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
 	if (BlockNumberIsValid(blkno))
@@ -216,10 +221,8 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
 	HashMetaPage metap;
@@ -277,10 +280,26 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage)rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/*  Cache the metapage data for next time*/
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+	}
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -290,46 +309,50 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 		/*
 		 * Compute the target bucket number, and convert to block number.
 		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
+		bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
 									  metap->hashm_highmask,
 									  metap->hashm_lowmask);
 
 		blkno = BUCKET_TO_BLKNO(metap, bucket);
 
-		/* Release metapage lock, but keep pin. */
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
+		/* Check if this bucket is split after we have cached the metapage.
+		 * To do this we need to check whether cached maxbucket number is less
+		 * than or equal to maxbucket number stored in bucket page, which was
+		 * set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after metapage was cached.
 		 */
-		if (retry)
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <=  metap->hashm_maxbucket)
 		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
 		}
 
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* Meta page cache is old try again updating it. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage)rel->rd_amcache;
 
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
 	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	so->hashso_bucket_buf = buf;
 
#17Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#16)
Re: Cache Hash Index meta page.

On Tue, Dec 6, 2016 at 1:28 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

*Clients *

*Cache Meta Page patch *

*Base code with amits changes*

* %imp*

1

17062.513102

17218.353817

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

*64*

*576090.293018*

*510665.044842*

*12.8117733604*

*96*

*686813.187117*

*504950.885867*

*36.0158396272*

*104*

*688932.67516*

*498365.55841*

*38.2384202789*

*128*

*730728.526322*

*409011.008553*

*78.6574226711*

All the above readings are median of 3 runs.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#18Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#16)
Re: Cache Hash Index meta page.

On Mon, Dec 5, 2016 at 2:58 PM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <
jesper.pedersen@redhat.com> wrote:

As the concurrent hash index patch was committed in 6d46f4 this patch

needs a rebase.

Thanks Jesper,

Adding the rebased patch.

-        bucket = _hash_hashkey2bucket(hashkey,
-                                      metap->hashm_maxbucket,
+        bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
                                       metap->hashm_highmask,
                                       metap->hashm_lowmask);

This hunk appears useless.

+ metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

+ /* Cache the metapage data for next time*/

Whitespace.

+ /* Check if this bucket is split after we have cached the metapage.

Whitespace.

Shouldn't _hash_doinsert() be using the cache, too?

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Robert Haas (#18)
1 attachment(s)
Re: Cache Hash Index meta page.

Thanks Robert, I have tried to address all of the comments,
On Tue, Dec 6, 2016 at 2:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:

+ bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
metap->hashm_highmask,
metap->hashm_lowmask);

This hunk appears useless.

+ metap = (HashMetaPage)rel->rd_amcache;

Whitespace.

Fixed.

+ /* Cache the metapage data for next time*/

Whitespace.

Fixed.

+ /* Check if this bucket is split after we have cached the
metapage.

Whitespace.

Fixed.

Shouldn't _hash_doinsert() be using the cache, too?

Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.

I did some pgbench simple-update tests for same, with below changes.

-               "alter table pgbench_branches add primary key (bid)",
-               "alter table pgbench_tellers add primary key (tid)",
-               "alter table pgbench_accounts add primary key (aid)"
+               "create index pgbench_branches_bid on pgbench_branches
using hash (bid)",
+               "create index pgbench_tellers_tid on pgbench_tellers using
hash (tid)",
+               "create index pgbench_accounts_aid on pgbench_accounts
using hash (aid)"

And, removed all reference keys. But I see no improvements; I will further
do benchmarking for copy command and report same.

Clients

After Meta page cache

Base Code

%imp

1

2276.151633

2304.253611

-1.2195696631

32

36816.596513

36439.552652

1.0347104549

64

50943.763133

51005.236973

-0.120524565

128

49156.980457

48458.275106

1.4418700407
Above result is median of three runs, and each run is for 30mins.

*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode = -M prepared -N (prepared simple-update).

*Machine used:*
power2 same as described as above.

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
uint32s) but we also need

*uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
35*4 = 140 bytes.

The comments in this patch need some work, e.g.:

-
+       oopaque->hasho_prevblkno = maxbucket;

No comment?

I have tried to improve commenting part in the new patch.

Apart from this, there seems to be some base bug in _hash_doinsert().
+ * XXX this is useless code if we are only storing hash keys.

+ */

+ if (itemsz > HashMaxItemSize((Page) metap))

+ ereport(ERROR,

+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+ errmsg("index row size %zu exceeds hash maximum %zu",

+ itemsz, HashMaxItemSize((Page) metap)),

+ errhint("Values larger than a buffer page cannot be
indexed.")));

"metap" (HashMetaPage) and Page are different data structure their member
types are not in sync, so should not typecast blindly as above. I think we
should remove this part of the code as we only store hash keys. So I have
removed same but kept the assert below as it is.
Also, there was a bug in the previous patch. I was not releasing the bucket
page lock if cached metadata is old, now same is fixed.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_07.patchapplication/octet-stream; name=cache_hash_index_meta_page_07.patchDownload
commit 77422f67e58947bd20bee1c4977817512c928cbb
Author: mithun <mithun@localhost.localdomain>
Date:   Fri Dec 16 11:29:18 2016 +0530

    cache meta page

diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 6806e32..9161e2e 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -541,7 +541,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(local_metapage.hashm_spares, cur_bucket);
 
 		blkno = bucket_blkno;
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 572146a..886ab2e 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -31,19 +31,16 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		buf = InvalidBuffer;
 	Buffer		bucket_buf;
 	Buffer		metabuf;
+	HashMetaCache metac;
 	HashMetaPage metap;
 	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
+	uint32		bufcount;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -54,28 +51,34 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	itemsz = IndexTupleDSize(*itup);
 	itemsz = MAXALIGN(itemsz);	/* be safe, PageAddItem will do this but we
 								 * need to be consistent */
-
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
+	metabuf = InvalidBuffer;
+	metap = NULL;
 
-	/*
-	 * Check whether the item can fit on a hash page at all. (Eventually, we
-	 * ought to try to apply TOAST methods if not.)  Note that at this point,
-	 * itemsz doesn't include the ItemId.
-	 *
-	 * XXX this is useless code if we are only storing hash keys.
-	 */
-	if (itemsz > HashMaxItemSize((Page) metap))
-		ereport(ERROR,
-				(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-				 errmsg("index row size %zu exceeds hash maximum %zu",
-						itemsz, HashMaxItemSize((Page) metap)),
-			errhint("Values larger than a buffer page cannot be indexed.")));
+	if (rel->rd_amcache != NULL)
+	{
+		metac = (HashMetaCache) rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaDataCache));
+		metac = (HashMetaCache)rel->rd_amcache;
+		metac->hmc_maxbucket = metap->hashm_maxbucket;
+		metac->hmc_highmask = metap->hashm_highmask;
+		metac->hmc_lowmask = metap->hashm_lowmask;
+		memcpy(metac->hmc_spares, metap->hashm_spares,
+			   sizeof(uint32)*HASH_MAX_SPLITPOINTS);
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
+		/* Release metapage lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+	}
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
@@ -86,54 +89,73 @@ restart_insert:
 		 * Compute the target bucket number, and convert to block number.
 		 */
 		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
+									  metac->hmc_maxbucket,
+									  metac->hmc_highmask,
+									  metac->hmc_lowmask);
 
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
+		blkno = BUCKET_TO_BLKNO(metac->hmc_spares, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(pageopaque->hasho_bucket == bucket);
 
 		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
+		 * Check if this bucket is split after we have cached the hash meta
+		 * data. To do this we need to check whether cached maxbucket number is
+		 * less than or equal to maxbucket number stored in bucket page, which
+		 * was set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after caching hash meta data.
 		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
+		if (pageopaque->hasho_prevblkno == InvalidBlockNumber ||
+			pageopaque->hasho_prevblkno <=  metac->hmc_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
 
-		/* Release metapage lock, but keep pin. */
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
 
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
 		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+			metap = HashPageGetMeta(BufferGetPage(metabuf));
 		}
+		else
+			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
 
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
+		metac = (HashMetaCache)rel->rd_amcache;
+		metac->hmc_maxbucket = metap->hashm_maxbucket;
+		metac->hmc_highmask = metap->hashm_highmask;
+		metac->hmc_lowmask = metap->hashm_lowmask;
+		memcpy(metac->hmc_spares, metap->hashm_spares,
+			   sizeof(uint32)*HASH_MAX_SPLITPOINTS);
 
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
-	page = BufferGetPage(buf);
-	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+	/*
+	 * Below we need the metabuf for all cases. If we have not read the Meta
+	 * Page yet just read it once in HASH_NOLOCK mode and hold the pin.
+	 */
+	if (BufferIsInvalid(metabuf))
+	{
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		metap = HashPageGetMeta(BufferGetPage(metabuf));
+	}
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -150,7 +172,8 @@ restart_insert:
 		_hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+						   metac->hmc_maxbucket, metac->hmc_highmask,
+						   metac->hmc_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 44332e7..ba35402 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -475,10 +475,17 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		/* Allow interrupts, in case N is huge */
 		CHECK_FOR_INTERRUPTS();
 
-		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
+		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap->hashm_spares, i),
+							  forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Setting hasho_prevblkno of bucket page with latest maxbucket number
+		 * to indicate bucket has been initialized and need to reconstruct
+		 * HashMetaCache if it is older.
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -594,7 +601,7 @@ restart_expand:
 
 	old_bucket = (new_bucket & metap->hashm_lowmask);
 
-	start_oblkno = BUCKET_TO_BLKNO(metap, old_bucket);
+	start_oblkno = BUCKET_TO_BLKNO(metap->hashm_spares, old_bucket);
 
 	buf_oblkno = _hash_getbuf_with_condlock_cleanup(rel, start_oblkno, LH_BUCKET_PAGE);
 	if (!buf_oblkno)
@@ -682,7 +689,7 @@ restart_expand:
 	 * the current value of hashm_spares[hashm_ovflpoint] correctly shows
 	 * where we are going to put a new splitpoint's worth of buckets.
 	 */
-	start_nblkno = BUCKET_TO_BLKNO(metap, new_bucket);
+	start_nblkno = BUCKET_TO_BLKNO(metap->hashm_spares, new_bucket);
 
 	/*
 	 * If the split point is increasing (hashm_maxbucket's log base 2
@@ -886,6 +893,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number
+	 * to indicate bucket has been split and need to reconstruct HashMetaCache.
+	 * Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -893,7 +906,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 8d43b38..2b6cf31 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+		return;
+
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
 	if (BlockNumberIsValid(blkno))
@@ -216,13 +221,11 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
+	HashMetaCache metac;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +280,102 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	if (rel->rd_amcache != NULL)
+	{
+		metac = (HashMetaCache) rel->rd_amcache;
+	}
+	else
+	{
+		HashMetaPage	metap;
+
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaDataCache));
+		metac = (HashMetaCache)rel->rd_amcache;
+		metac->hmc_maxbucket = metap->hashm_maxbucket;
+		metac->hmc_highmask = metap->hashm_highmask;
+		metac->hmc_lowmask = metap->hashm_lowmask;
+		memcpy(metac->hmc_spares, metap->hashm_spares,
+			   sizeof(uint32)*HASH_MAX_SPLITPOINTS);
+
+		/* Release metapage lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+	}
 
 	/*
 	 * Loop until we get a lock on the correct target bucket.
 	 */
 	for (;;)
 	{
+		HashMetaPage	metap;
+
 		/*
 		 * Compute the target bucket number, and convert to block number.
 		 */
 		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
+									  metac->hmc_maxbucket,
+									  metac->hmc_highmask,
+									  metac->hmc_lowmask);
 
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
+		blkno = BUCKET_TO_BLKNO(metac->hmc_spares, bucket);
 
-		/* Release metapage lock, but keep pin. */
-		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
 
 		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
+		 * Check if this bucket is split after we have cached the hash meta
+		 * data. To do this we need to check whether cached maxbucket number is
+		 * less than or equal to maxbucket number stored in bucket page, which
+		 * was set with that times maxbucket number during bucket page splits.
+		 * In case of upgrade hashno_prevblkno of old bucket page will be set
+		 * with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber
+		 * (see _hash_expandtable). So an explicit check for InvalidBlockNumber
+		 * in hasho_prevblkno will tell whether current bucket has been split
+		 * after caching hash meta data.
 		 */
-		if (retry)
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <=  metac->hmc_maxbucket)
 		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
 		}
 
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
 
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
-		oldblkno = blkno;
-		retry = true;
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
+		{
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+			metap = HashPageGetMeta(BufferGetPage(metabuf));
+		}
+		else
+			_hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_READ);
+
+		metac = (HashMetaCache)rel->rd_amcache;
+		metac->hmc_maxbucket = metap->hashm_maxbucket;
+		metac->hmc_highmask = metap->hashm_highmask;
+		metac->hmc_lowmask = metap->hashm_lowmask;
+		memcpy(metac->hmc_spares, metap->hashm_spares,
+			   sizeof(uint32)*HASH_MAX_SPLITPOINTS);
+
+		/* Release Meta page buffer lock, but keep pin. */
+		_hash_chgbufaccess(rel, metabuf, HASH_READ, HASH_NOLOCK);
 	}
 
 	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index fa9cbdc..b719ffc 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -382,7 +382,7 @@ _hash_get_oldblock_from_newbucket(Relation rel, Bucket new_bucket)
 	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
 
-	blkno = BUCKET_TO_BLKNO(metap, old_bucket);
+	blkno = BUCKET_TO_BLKNO(metap->hashm_spares, old_bucket);
 
 	_hash_relbuf(rel, metabuf);
 
@@ -412,7 +412,7 @@ _hash_get_newblock_from_oldbucket(Relation rel, Bucket old_bucket)
 	new_bucket = _hash_get_newbucket_from_oldbucket(rel, old_bucket,
 													metap->hashm_lowmask,
 													metap->hashm_maxbucket);
-	blkno = BUCKET_TO_BLKNO(metap, new_bucket);
+	blkno = BUCKET_TO_BLKNO(metap->hashm_spares, new_bucket);
 
 	_hash_relbuf(rel, metabuf);
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 6dfc41f..2c2c59f 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -35,8 +35,8 @@ typedef uint32 Bucket;
 
 #define InvalidBucket	((Bucket) 0xFFFFFFFF)
 
-#define BUCKET_TO_BLKNO(metap,B) \
-		((BlockNumber) ((B) + ((B) ? (metap)->hashm_spares[_hash_log2((B)+1)-1] : 0)) + 1)
+#define BUCKET_TO_BLKNO(spares,B) \
+		((BlockNumber) ((B) + ((B) ? spares[_hash_log2((B)+1)-1] : 0)) + 1)
 
 /*
  * Special space for hash index pages.
@@ -60,7 +60,15 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
-	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or split.
+	 * So this helps us to know whether the bucket has been split after caching
+	 * the some of the meta page data. See _hash_doinsert(), _hash_first() to
+	 * know how to use same.
+	 */
+	BlockNumber hasho_prevblkno;
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
 	uint16		hasho_flag;		/* page type code, see above */
@@ -183,6 +191,28 @@ typedef struct HashMetaPageData
 typedef HashMetaPageData *HashMetaPage;
 
 /*
+ * This structure caches minimal hash index metadata, which is sufficient to
+ * say whether a bucket is split after the meta data is cached. By caching
+ * same, we can avoid a buffer read of HASH_METAPAGE, whenever we need to map
+ * the key to a bucket. See _hash_first(), _hash_doinsert().
+ *
+ * NOTE : Unfortunately we cannot put this data structure inside
+ * HashMetaPageData and avoid duplication. Because HashMetaPageData is stored
+ * inside a page, so we will break backward compatibility if we change that
+ * structure.
+ */
+typedef struct HashMetaDataCache
+{
+	uint32		hmc_maxbucket;	/* ID of maximum bucket in use */
+	uint32		hmc_highmask;	/* mask to modulo into entire table */
+	uint32		hmc_lowmask;	/* mask to modulo into lower half of table */
+	uint32		hmc_spares[HASH_MAX_SPLITPOINTS];		/* spare pages before
+														 * each splitpoint */
+} HashMetaDataCache;
+
+typedef HashMetaDataCache *HashMetaCache;
+
+/*
  * Maximum size of a hash index item (it's okay to have only one per page)
  */
 #define HashMaxItemSize(page) \
#20Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#19)
Re: Cache Hash Index meta page.

On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Shouldn't _hash_doinsert() be using the cache, too

Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.

Why would you need to read it at least once in one case but not the other?

I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.

I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.

It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
uint32s) but we also need

*uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
block mapping in "BUCKET_TO_BLKNO(metap, bucket)".

Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
35*4 = 140 bytes.

Well, I guess that makes it more appealing to cache the whole page at least
in terms of raw number of bytes, but I suppose my real complaint here is
that there don't seem to be any clear rules for where, whether, and to what
extent we can rely on the cache to be valid. Without that, I think this
patch is creating an extremely serious maintenance hazard for anyone who
wants to try to modify this code in the future. A future patch author
needs to be able to tell what data they can use and what data they can't
use, and why.

Apart from this, there seems to be some base bug in _hash_doinsert().

+ * XXX this is useless code if we are only storing hash keys.

+ */

+ if (itemsz > HashMaxItemSize((Page) metap))

+ ereport(ERROR,

+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),

+ errmsg("index row size %zu exceeds hash maximum %zu",

+ itemsz, HashMaxItemSize((Page) metap)),

+ errhint("Values larger than a buffer page cannot be
indexed.")));

"metap" (HashMetaPage) and Page are different data structure their member
types are not in sync, so should not typecast blindly as above. I think we
should remove this part of the code as we only store hash keys. So I have
removed same but kept the assert below as it is.

Any existing bugs should be the subject of a separate patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Robert Haas (#20)
Re: Cache Hash Index meta page.

On Tue, Dec 20, 2016 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy <mithun.cy@enterprisedb.com>
wrote:

Shouldn't _hash_doinsert() be using the cache, too

Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.

Why would you need to read it at least once in one case but not the other?

For read-only: in _hash_first if target bucket is not plit after caching
the meta page contents. we never need to read the metapage. But for insert:
in _hash_doinsert at the end we modify the meta page to store the number of
tuples.

+  * Write-lock the metapage so we can increment the tuple count. After
+  * incrementing it, check to see if it's time for a split.
+ */
+ _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
+
+ metap->hashm_ntuples += 1;

Well, I guess that makes it more appealing to cache the whole page at least

in terms of raw number of bytes, but I suppose my real complaint here is
that there don't seem to be any clear rules for where, whether, and to what
extent we can rely on the cache to be valid.

--- Okay will revert back to cache the entire meta page.

Without that, I think this patch is creating an extremely serious
maintenance hazard for anyone who wants to try to modify this code in the
future. A future patch author needs to be able to tell what data they can
use and what data they can't use, and why.

-- I think if it is okay, I can document same for each member
of HashMetaPageData whether to read from cached from meta page or directly
from current meta page. Below briefly I have commented for each member. If
you suggest I can go with that approach, I will produce a neat patch for
same.

*typedef struct HashMetaPageData*

*{*

*1. uint32 hashm_magic; /* magic no. for hash tables */*

-- I think this should remain same. So can be read from catched meta page.

*2. uint32 hashm_version; /* version ID */*

-- This is one time initied, never changed afterwards. So can be read from
catched metapage.

*3. double hashm_ntuples; /* number of tuples stored in the table */*

*-*- This changes on every insert. So should not be read from chached data.

*4. uint16 hashm_ffactor; /* target fill factor (tuples/bucket) */*

-- This is one time initied, never changed afterwards. So can be read from
catched metapage.

*5. uint16 hashm_bsize; /* index page size (bytes) */*

-- This is one time initied, never changed afterwards. So can be read from
catched metapage.

*6. uint16 hashm_bmsize; /* bitmap array size (bytes) - must be a power of
2 */*

-- This is one time initied, never changed afterwards. So can be read from
catched metapage

*7. uint16 hashm_bmshift; /* log2(bitmap array size in BITS) */*

-- This is one time initied, never changed afterwards. So can be read from
catched metapage

*8. *If hashm_maxbucket, hashm_highmask and hashm_lowmask are all read and
cached at same time when metapage was locked, then key to bucket number map
function _hash_hashkey2bucket() should always produce same output. If
bucket is split after caching above elements (which can be known because
old bucket pages will never move once allocated and we mark bucket
pages hasho_prevblkno with incremented hashm_highmask), we can invalidate
them and re-read same from meta page. If your intention is not to save a
metapage read while trying to map the key to buket page, then do not read
them from cached meta page.

* uint32 hashm_maxbucket; /* ID of maximum bucket in use */*

* uint32 hashm_highmask; /* mask to modulo into entire table */*

* uint32 hashm_lowmask; /* mask to modulo into lower half of table */*

*9.*

* uint32 hashm_ovflpoint;/* splitpoint from which ovflpgs being*

* * allocated */*

-- Since used for allocation of overflow pages, should get latest value
directly from meta page.

*10.*

* uint32 hashm_firstfree; /* lowest-number free ovflpage (bit#) */*

-- Should always be read from metapage directly.

*11. *

* uint32 hashm_nmaps; /* number of bitmap pages */*

-- Should always be read from metapage directly.

*12.*

*RegProcedure hashm_procid; /* hash procedure id from pg_proc */*

-- Never used till now, When we use, need to look at it about its policy.

*13*

*uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before*

* * each splitpoint */*

Spares before given split_point never change and bucket pages never move.
So when combined with cached hashm_maxbucket, hashm_highmask
and hashm_lowmask, all read at same time under lock, BUCKET_TO_BLKNO should
always produce same output, pointing to right physical block. Should only
be used to save a meta page read when we want to map key to bucket block as
said above.

*14.*

* BlockNumber hashm_mapp[HASH_MAX_BITMAPS]; /* blknos of ovfl bitmaps */*

Always read from metapage durectly.

*} HashMetaPageData;*

Any existing bugs should be the subject of a separate patch.

Sorry I will make a new patch for same sepatately.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#22Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#21)
Re: Cache Hash Index meta page.

On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

-- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.

Plain text emails are preferred on this list.

I don't have any confidence in this approach. I'm not sure exactly
what needs to be changed here, but what you're doing right now is just
too error-prone. There's a cached metapage available, and you've got
code accessing directly, and that's OK except when it's not, and maybe
we can add some comments to explain, but I don't think that's going to
be good enough to really make it clear and maintainable. We need some
kind of more substantive safeguard to prevent the cached metapage data
from being used in unsafe ways -- and while we're at it, we should try
to use it in as many of the places where it *is* safe as possible. My
suggestion for a separate structure was one idea; another might be
providing some kind of API that's always used to access the metapage
cache. Or maybe there's a third option. But this, the way it is
right now, is just too ad-hoc, even with more comments. IMHO, anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#22)
Re: Cache Hash Index meta page.

On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

-- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.

Plain text emails are preferred on this list.

I don't have any confidence in this approach. I'm not sure exactly
what needs to be changed here, but what you're doing right now is just
too error-prone. There's a cached metapage available, and you've got
code accessing directly, and that's OK except when it's not, and maybe
we can add some comments to explain, but I don't think that's going to
be good enough to really make it clear and maintainable. We need some
kind of more substantive safeguard to prevent the cached metapage data
from being used in unsafe ways -- and while we're at it, we should try
to use it in as many of the places where it *is* safe as possible. My
suggestion for a separate structure was one idea; another might be
providing some kind of API that's always used to access the metapage
cache. Or maybe there's a third option.

This metapage cache can be validated only when we have a bucket in
which we have stored the maxbucket value. I think what we can do to
localize the use of metapage cache is to write a new API which will
return a bucket page locked in specified mode based on hashkey.
Something like Buffer _hash_get_buc_buffer_from_hashkey(hashkey,
lockmode). I think this will make metpage cache access somewhat
similar to what we have in btree where we use cache to access
rootpage. Will something like that address your concern?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#23)
1 attachment(s)
Re: Cache Hash Index meta page.

On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

-- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.

Plain text emails are preferred on this list.

Sorry, I have set the mail to plain text mode now.

I think this will make metpage cache access somewhat
similar to what we have in btree where we use cache to access
rootpage. Will something like that address your concern?

Thanks, just like _bt_getroot I am introducing a new function
_hash_getbucketbuf_from_hashkey() which will give the target bucket
buffer for the given hashkey. This will use the cached metapage
contents instead of reading meta page buffer if cached data is valid.
There are 2 places which can use this service 1. _hash_first and 2.
_hash_doinsert.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_07.patchtext/x-patch; charset=US-ASCII; name=cache_hash_index_meta_page_07.patchDownload
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 46df589..c45b3f0 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,19 +32,13 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
-	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +51,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into to it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,59 +75,7 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_WRITE, &usedmetap);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
@@ -152,7 +99,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +174,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 45e184c..c726909 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Setting hasho_prevblkno of bucket page with latest maxbucket number
+		 * to indicate bucket has been initialized and need to reconstruct
+		 * HashMetaCache if it is older.
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct HashMetaCache.
+	 * Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,124 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see @_hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage) rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the hash meta
+		 * data. To do this we need to check whether cached maxbucket number
+		 * is less than or equal to maxbucket number stored in bucket page,
+		 * which was set with that times maxbucket number during bucket page
+		 * splits. In case of upgrade hashno_prevblkno of old bucket page will
+		 * be set with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+		 * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+		 * hasho_prevblkno will tell whether current bucket has been split
+		 * after caching hash meta data.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		else
+			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+
+		metap = HashPageGetMeta(BufferGetPage(metabuf));
+
+		/* Cache the metapage data for next time. */
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 913b87c..cdc54e1 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+		return;
+
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
 	if (BlockNumberIsValid(blkno))
@@ -216,13 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,56 +278,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_READ, NULL);
 	page = BufferGetPage(buf);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index bc08f81..fe98157 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,14 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the some of the meta page data. See _hash_doinsert(),
+	 * _hash_first() to know how to use same.
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +335,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern Buffer
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#25Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#24)
1 attachment(s)
Re: Cache Hash Index meta page.

On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
Oops, patch number should be 08 re-attaching same after renaming.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_08.patchtext/x-patch; charset=US-ASCII; name=cache_hash_index_meta_page_08.patchDownload
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 46df589..c45b3f0 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,19 +32,13 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
-	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +51,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into to it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,59 +75,7 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_WRITE, &usedmetap);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
@@ -152,7 +99,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +174,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 45e184c..c726909 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Setting hasho_prevblkno of bucket page with latest maxbucket number
+		 * to indicate bucket has been initialized and need to reconstruct
+		 * HashMetaCache if it is older.
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct HashMetaCache.
+	 * Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,124 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see @_hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage) rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the hash meta
+		 * data. To do this we need to check whether cached maxbucket number
+		 * is less than or equal to maxbucket number stored in bucket page,
+		 * which was set with that times maxbucket number during bucket page
+		 * splits. In case of upgrade hashno_prevblkno of old bucket page will
+		 * be set with InvalidBlockNumber. And as of now maximum value the
+		 * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+		 * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+		 * hasho_prevblkno will tell whether current bucket has been split
+		 * after caching hash meta data.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		else
+			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+
+		metap = HashPageGetMeta(BufferGetPage(metabuf));
+
+		/* Cache the metapage data for next time. */
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 913b87c..cdc54e1 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+		return;
+
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
 	if (BlockNumberIsValid(blkno))
@@ -216,13 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	uint32		hashkey;
 	Bucket		bucket;
 	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,56 +278,7 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
+	buf = _hash_getbucketbuf_from_hashkey(hashkey, rel, HASH_READ, NULL);
 	page = BufferGetPage(buf);
 	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 	Assert(opaque->hasho_bucket == bucket);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index bc08f81..fe98157 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,14 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the some of the meta page data. See _hash_doinsert(),
+	 * _hash_first() to know how to use same.
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +335,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern Buffer
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#24)
Re: Cache Hash Index meta page.

On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

-- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.

Plain text emails are preferred on this list.

Sorry, I have set the mail to plain text mode now.

I think this will make metpage cache access somewhat
similar to what we have in btree where we use cache to access
rootpage. Will something like that address your concern?

Thanks, just like _bt_getroot I am introducing a new function
_hash_getbucketbuf_from_hashkey() which will give the target bucket
buffer for the given hashkey. This will use the cached metapage
contents instead of reading meta page buffer if cached data is valid.
There are 2 places which can use this service 1. _hash_first and 2.
_hash_doinsert.

This version of the patch looks much better than the previous version.
I have few review comments:

1.
+ * pin so we can use the metabuf while writing into to it below.
+ */
+ metabuf =
_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);

usage "into to .." in above comment seems to be wrong.

2.
- pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+ /*
+ *
Setting hasho_prevblkno of bucket page with latest maxbucket number
+ * to indicate
bucket has been initialized and need to reconstruct
+ * HashMetaCache if it is older.
+
*/
+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

In the above comment, a reference to HashMetaCache is confusing, are
your referring to any structure here? If you change this, consider to
change the similar usage at other places in the patch as well.

Also, it is not clear to what do you mean by ".. to indicate bucket
has been initialized .."? assigning maxbucket as a special value to
prevblkno is not related to initializing a bucket page.

3.
 typedef struct HashPageOpaqueData
 {
+ /*
+ * If this is an ovfl page this stores previous ovfl
(or bucket) blkno.
+ * Else if this is a bucket page we use this for a special purpose. We
+ *
store hashm_maxbucket value, whenever this page is initialized or
+ * split. So this helps us
to know whether the bucket has been split after
+ * caching the some of the meta page data.
See _hash_doinsert(),
+ * _hash_first() to know how to use same.
+ */

In above comment, where you are saying ".. caching the some of the
meta page data .." is slightly confusing, because it appears to me
that you are caching whole of the metapage not some part of it.

4.
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,

Generally, for _hash_* API's, we use rel as the first parameter, so I
think it is better to maintain the same with this API as well.

5.
  _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,
+   usedmetap->hashm_highmask,
+   usedmetap->hashm_lowmask);

I think we should add an Assert for the validity of usedmetap before using it.

6. Getting few compilation errors in assert-enabled build.

1>src/backend/access/hash/hashinsert.c(85): error C2065: 'bucket' :
undeclared identifier
1>src/backend/access/hash/hashinsert.c(155): error C2065: 'bucket' :
undeclared identifier

1>src/backend/access/hash/hashsearch.c(223): warning C4101: 'blkno' :
unreferenced local variable
1> hashutil.c
1>\src\backend\access\hash\hashsearch.c(284): warning C4700:
uninitialized local variable 'bucket' used

7.
+ /*
+ * Check if this bucket is split after we have cached the hash meta
+ * data. To do this we need to check whether cached maxbucket number
+ * is less than or equal to maxbucket number stored in bucket page,
+ * which was set with that times maxbucket number during bucket page
+ * splits. In case of upgrade hashno_prevblkno of old bucket page will
+ * be set with InvalidBlockNumber. And as of now maximum value the
+ * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+ * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+ * hasho_prevblkno will tell whether current bucket has been split
+ * after caching hash meta data.
+ */

I can understand what you want to say in above comment, but I think
you can write it in somewhat shorter form. There is no need to
explain the exact check.

8.
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
_hash_relbuf(rel, *bufp);

  *bufp = InvalidBuffer;
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+ return;
+

I don't think above check is right. Even if it is a bucket page, we
might need to scan bucket being populated, refer check else if
(so->hashso_buc_populated && so->hashso_buc_split). Apart from that,
you can't access bucket page after releasing the lock on it. Why have
you added such a check?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#26)
1 attachment(s)
Re: Cache Hash Index meta page.

Thanks Amit for detailed review, and pointing out various issues in
the patch. I have tried to fix all of your comments as below.

On Mon, Jan 2, 2017 at 11:29 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
usage "into to .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong.usage "into to .." in above comment seems to be wrong.

-- Fixed.

2.
In the above comment, a reference to HashMetaCache is confusing, are
your referring to any structure here? If you change this, consider toyour referring to any structure here? If you change this, consider toyour referring to any structure here? If you change this, consider toyour referring to any structure here? If you change this, consider to
change the similar usage at other places in the patch as well.change the similar usage at other places in the patch as well.change the similar usage at other places in the patch as well.change the similar usage at other places in the patch as well.

-- Fixed. Removed HashMetaCache everywhere in the code. Where ever
needed added HashMetaPageData.

Also, it is not clear to what do you mean by ".. to indicate bucketto indicate bucket
has been initialized .."? assigning maxbucket as a special value tohas been initialized .."? assigning maxbucket as a special value to
prevblkno is not related to initializing a bucket page.prevblkno is not related to initializing a bucket page.

-- To be consistent with our definition of prevblkno, I am setting
prevblkno with current hashm_maxbucket when we initialize the bucket
page. I have tried to correct the comments accordingly.

3.
In above comment, where you are saying ".. caching the some of the
meta page data .." is slightly confusing, because it appears to me
that you are caching whole of the metapage not some part of it.

-- Fixed. Changed to caching the HashMetaPageData.

4.
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,

Generally, for _hash_* API's, we use rel as the first parameter, so I
think it is better to maintain the same with this API as well.

-- Fixed.

5.
_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,
+   usedmetap->hashm_highmask,
+   usedmetap->hashm_lowmask);

I think we should add an Assert for the validity of usedmetap before using it.

-- Fixed. Added Assert(usedmetap != NULL);

6. Getting few compilation errors in assert-enabled build.

-- Fixed. Sorry, I missed handling bucket number which is needed at
below codes. I have fixed same now.

7.
I can understand what you want to say in above comment, but I think
you can write it in somewhat shorter form. There is no need to
explain the exact check.

-- Fixed. I have tried to compress it into a few lines.

8.
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
_hash_relbuf(rel, *bufp);

*bufp = InvalidBuffer;
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+ return;
+

I don't think above check is right. Even if it is a bucket page, we
might need to scan bucket being populated, refer check else if
(so->hashso_buc_populated && so->hashso_buc_split). Apart from that,
you can't access bucket page after releasing the lock on it. Why have
you added such a check?

-- Fixed. That was a mistake, now I have fixed it. Actually, if bucket
page is passed to _hash_readprev then there will not be a prevblkno.
But from this patch onwards prevblkno of bucket page will store
hashm_maxbucket. So a check BlockNumberIsValid (blkno) will not be
valid anymore. I have fixed by adding as below.
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
  {
+ Assert(BlockNumberIsValid(blkno));

There are 2 other places which does same @_hash_freeovflpage,
@_hash_squeezebucket.
But that will only be called for overflow pages. So I did not make
changes. But I think we should also change there to make it
consistent.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_09.patchapplication/octet-stream; name=cache_hash_index_meta_page_09.patchDownload
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 46df589..dd20337 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 45e184c..8e64e88 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * @_hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,122 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see @_hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage) rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		else
+			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+
+		metap = HashPageGetMeta(BufferGetPage(metabuf));
+
+		/* Cache the metapage data for next time. */
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 913b87c..71a040f 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan,
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +218,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +275,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index bc08f81..c285a3a 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +334,9 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#28Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#27)
1 attachment(s)
Re: Cache Hash Index meta page.

On Tue, Jan 3, 2017 at 12:05 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

As part performance/space analysis of hash index on varchar data type
with this patch, I have run some tests for same with modified pgbench.
Modification includes:
1. Changed aid column of pg_accounts table from int to varchar(x)
2. To generate unique values as before inserted stringified integer
repeatedly to fill x.

I have mainly tested for varchar(30) and varchar(90),
Below document has the detailed report on our captured data. New hash
indexes have some ~25% improved performance over btree. And, as
expected very space efficient.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

hash_index_sizes_experiment_01.odsapplication/vnd.oasis.opendocument.spreadsheet; name=hash_index_sizes_experiment_01.odsDownload
#29Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#28)
Re: Cache Hash Index meta page.

On Wed, Jan 4, 2017 at 4:19 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

As part performance/space analysis of hash index on varchar data typevarchar data type
with this patch, I have run some tests for same with modified pgbench.with this patch, I have run some tests for same with modified pgbench.

I forgot to mention all these tests were run on power2 machine which
has 192 2 hyperthreads.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Mithun Cy (#29)
1 attachment(s)
Re: Cache Hash Index meta page.

On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
I have re-based the patch to fix one compilation warning
@_hash_doinsert where variable bucket was only used for Asserting but
was not declared about its purpose.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_10.patchapplication/octet-stream; name=cache_hash_index_meta_page_10.patchDownload
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..6991707 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,19 +32,14 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
 	Size		itemsz;
 	bool		do_expand;
 	uint32		hashkey;
-	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
+	Bucket		bucket PG_USED_FOR_ASSERTS_ONLY;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert:
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
 		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..348c85d 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * @_hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,122 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see @_hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	if (rel->rd_amcache != NULL)
+	{
+		metap = (HashMetaPage) rel->rd_amcache;
+	}
+	else
+	{
+		/* Read the metapage */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		metap = HashPageGetMeta(page);
+
+		/* Cache the metapage data for next time. */
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Cached meta data is old try again updating it. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		else
+			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+
+		metap = HashPageGetMeta(BufferGetPage(metabuf));
+
+		/* Cache the metapage data for next time. */
+		memcpy(rel->rd_amcache, metap, sizeof(HashMetaPageData));
+		metap = (HashMetaPage) rel->rd_amcache;
+
+		/* Release metapage lock, but keep pin. */
+		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	/* done with the metapage */
+	if (!BufferIsInvalid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index c0bdfe6..e11b6b5 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan,
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	/* If it is a bucket page there will not be a prevblkno. */
+	if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +218,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +275,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index b0a1131..b3fceaa 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +334,9 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#31Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#30)
Re: Cache Hash Index meta page.

On Thu, Jan 5, 2017 at 11:38 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
I have re-based the patch to fix one compilation warning
@_hash_doinsert where variable bucket was only used for Asserting but
was not declared about its purpose.

Few more comments:
1.
 }
+
+
+/*
+ * _hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given

no need to two extra lines, one is sufficient and matches the nearby
coding pattern.

2.
+ * old bucket in case of bucket split, see @_hash_doinsert().

Do you see anywhere else in the code the pattern of using @ symbol in
comments before function name? In general, there is no harm in using
it, but maybe it is better to be consistent with usage at other
places.

3.
+ /*
+ * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+ * Should be safe to use further.
+ */
+ bucket = pageopaque->hasho_bucket;

/*
* If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert:
LockBuffer(buf, BUFFER_LOCK_UNLOCK);

  _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-   maxbucket, highmask, lowmask);
+   usedmetap->hashm_maxbucket,

after this change, i think you can directly use bucket in
_hash_finish_split instead of pageopaque->hasho_bucket

4.
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan,
  *bufp = InvalidBuffer;
  /* check for interrupts while we're not holding any buffer lock */
  CHECK_FOR_INTERRUPTS();
- if (BlockNumberIsValid(blkno))
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))

Won't the check similar to the existing check (if (*bufp ==
so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
above this code will suffice the need? If so, then you can check it
once and use it in both places.

5. The reader and insertion algorithm needs to be updated in README.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#24)
Re: Cache Hash Index meta page.

On Tue, Dec 27, 2016 at 3:06 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

Thanks, just like _bt_getroot I am introducing a new function
_hash_getbucketbuf_from_hashkey() which will give the target bucket
buffer for the given hashkey. This will use the cached metapage
contents instead of reading meta page buffer if cached data is valid.
There are 2 places which can use this service 1. _hash_first and 2.
_hash_doinsert.

Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
this new logic? Or at least update the comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#31)
1 attachment(s)
Re: Cache Hash Index meta page.

On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Few more comments:
1.
no need to two extra lines, one is sufficient and matches the nearby
coding pattern.

-- Fixed.

2.
Do you see anywhere else in the code the pattern of using @ symbol in
comments before function name?

-- Fixed.

3.

after this change, i think you can directly use bucket in
_hash_finish_split instead of pageopaque->hasho_bucket

-- Fixed.

4.

Won't the check similar to the existing check (if (*bufp ==
so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
above this code will suffice the need? If so, then you can check it
once and use it in both places.

-- Fixed.

5. The reader and insertion algorithm needs to be updated in README.

-- Added info in README.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_11.patchapplication/octet-stream; name=cache_hash_index_meta_page_11.patchDownload
diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 01ea115..a9aeee2 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -188,17 +188,9 @@ track of available overflow pages.
 
 The reader algorithm is:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in shared mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	Given a hashkey get the target bucket page with read lock, using cached
+	metapage. The getbucketbuf_from_hashkey method below explains the same.
+
 	if the target bucket is still being populated by a split:
 		release the buffer content lock on current bucket page
 		pin and acquire the buffer content lock on old bucket in shared mode
@@ -238,17 +230,9 @@ which this bucket is formed by split.
 
 The insertion algorithm is rather similar:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in exclusive mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	Given a hashkey get the target bucket page with write lock, using cached
+	metapage. The getbucketbuf_from_hashkey method below explains the same.
+
 -- (so far same as reader, except for acquisition of buffer content lock in
 	exclusive mode on primary bucket page)
 	if the bucket-being-split flag is set for a bucket and pin count on it is
@@ -290,6 +274,21 @@ When an inserter cannot find space in any existing page of a bucket, it
 must obtain an overflow page and add that page to the bucket's chain.
 Details of that part of the algorithm appear later.
 
+getbucketbuf_from_hashkey method which is used in reader and insert algorithm
+is implemented as below. Using this method helps us to save reading metapage
+everytime we execute reader and insert algorithm.
+
+	If metapage cache is not set, read the meta page data and set the cache.
+
+	Loop:
+		Get target bucket from the hashkey using metapage cache. Lock the
+		bucket page as requested by reader/insert algorithm.
+		If target bucket is split before metapage data was cached then we are
+		done.
+		Else first release the bucket page and then update the metapage cache
+		with latest metapage data.
+		Loop again to reach the new target bucket.
+
 The page split algorithm is entered whenever an inserter observes that the
 index is overfull (has a higher-than-wanted ratio of tuples to buckets).
 The algorithm attempts, but does not necessarily succeed, to split one
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0cbf6b0..5ccf717 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -503,28 +503,19 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	Bucket		orig_maxbucket;
 	Bucket		cur_maxbucket;
 	Bucket		cur_bucket;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	HashMetaPage metap;
-	HashMetaPageData local_metapage;
+	HashMetaPage cachedmetap;
 
 	tuples_removed = 0;
 	num_index_tuples = 0;
 
 	/*
-	 * Read the metapage to fetch original bucket and tuple counts.  Also, we
-	 * keep a copy of the last-seen metapage so that we can use its
-	 * hashm_spares[] values to compute bucket page addresses.  This is a bit
-	 * hokey but perfectly safe, since the interesting entries in the spares
-	 * array cannot change under us; and it beats rereading the metapage for
-	 * each bucket.
+	 * update and get the metapage cache data.
 	 */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
-	orig_maxbucket = metap->hashm_maxbucket;
-	orig_ntuples = metap->hashm_ntuples;
-	memcpy(&local_metapage, metap, sizeof(local_metapage));
-	/* release the lock, but keep pin */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	cachedmetap = _hash_getcachedmetap(rel, true);
+	orig_maxbucket = cachedmetap->hashm_maxbucket;
+	orig_ntuples = cachedmetap->hashm_ntuples;
 
 	/* Scan the buckets that we know exist */
 	cur_bucket = 0;
@@ -542,7 +533,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
 
 		blkno = bucket_blkno;
 
@@ -574,19 +565,19 @@ loop_top:
 			 * tuples left behind by the most recent split.  To prevent that,
 			 * now that the primary page of the target bucket has been locked
 			 * (and thus can't be further split), update our cached metapage
-			 * data.
+			 * data in such case.
 			 */
-			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-			memcpy(&local_metapage, metap, sizeof(local_metapage));
-			LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+			if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+				bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+				cachedmetap = _hash_getcachedmetap(rel, true);
 		}
 
 		bucket_buf = buf;
 
 		hashbucketcleanup(rel, cur_bucket, bucket_buf, blkno, info->strategy,
-						  local_metapage.hashm_maxbucket,
-						  local_metapage.hashm_highmask,
-						  local_metapage.hashm_lowmask, &tuples_removed,
+						  cachedmetap->hashm_maxbucket,
+						  cachedmetap->hashm_highmask,
+						  cachedmetap->hashm_lowmask, &tuples_removed,
 						  &num_index_tuples, split_cleanup,
 						  callback, callback_state);
 
@@ -597,15 +588,18 @@ loop_top:
 	}
 
 	/* Write-lock metapage and check for split since we started */
-	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
+	if (BufferIsInvalid(metabuf))
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_WRITE, LH_META_PAGE);
+	else
+		LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
 
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
 		/* There's been a split, so process the additional bucket(s) */
-		cur_maxbucket = metap->hashm_maxbucket;
-		memcpy(&local_metapage, metap, sizeof(local_metapage));
 		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+		cachedmetap = _hash_getcachedmetap(rel, true);
+		cur_maxbucket = cachedmetap->hashm_maxbucket;
 		goto loop_top;
 	}
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..bec5ef3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -151,8 +106,10 @@ restart_insert:
 		/* release the lock on bucket buffer, before completing the split. */
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+		_hash_finish_split(rel, metabuf, buf, bucket,
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..9461336 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * _hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,115 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+/*
+ *	_hash_getcachedmetap() -- Returns cached metapage data.
+ *
+ * 	updatecache: if set true updates the cache with latest meta page data, then
+ * 	returns same.
+ */
+HashMetaPage
+_hash_getcachedmetap(Relation rel, bool updatecache)
+{
+	Buffer	metabuf;
+	Page	page;
+
+	if (updatecache || rel->rd_amcache == NULL)
+	{
+		if (rel->rd_amcache == NULL)
+			rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+												 sizeof(HashMetaPageData));
+
+		/* Read the metapage. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		memcpy(rel->rd_amcache, HashPageGetMeta(page),
+			   sizeof(HashMetaPageData));
+
+		/* Release metapage. */
+		_hash_relbuf(rel, metabuf);
+	}
+
+	return (HashMetaPage) rel->rd_amcache;
+}
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see _hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	metap = _hash_getcachedmetap(rel, false);
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Update the cached meta page data. */
+		metap = _hash_getcachedmetap(rel, true);
+	}
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index c0bdfe6..922143d 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -139,6 +139,7 @@ _hash_readprev(IndexScanDesc scan,
 	BlockNumber blkno;
 	Relation	rel = scan->indexRelation;
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
+	bool		haveprevblk = true;
 
 	blkno = (*opaquep)->hasho_prevblkno;
 
@@ -147,15 +148,20 @@ _hash_readprev(IndexScanDesc scan,
 	 * comments in _hash_first to know the reason of retaining pin.
 	 */
 	if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)
+	{
 		LockBuffer(*bufp, BUFFER_LOCK_UNLOCK);
+		haveprevblk = false;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	if (haveprevblk)
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +278,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index b0a1131..74dbb15 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +334,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern HashMetaPage _hash_getcachedmetap(Relation rel, bool updatecache);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#34Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Robert Haas (#32)
Re: Cache Hash Index meta page.

On Wed, Jan 11, 2017 at 12:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
this new logic? Or at least update the comments?

I have introduced a new function _hash_getcachedmetap in patch 11 [1]cache_hash_index_meta_page_11.patch </messages/by-id/CAD__OuguwKqKeGFXLqs6D3fshTR83Zo6FrKd79DGVR17gJY+Tg@mail.gmail.com&gt; -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com with
this hashbulkdelete() can use metapage cache instead of saving it locally.

[1]: cache_hash_index_meta_page_11.patch </messages/by-id/CAD__OuguwKqKeGFXLqs6D3fshTR83Zo6FrKd79DGVR17gJY+Tg@mail.gmail.com&gt; -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
</messages/by-id/CAD__OuguwKqKeGFXLqs6D3fshTR83Zo6FrKd79DGVR17gJY+Tg@mail.gmail.com&gt;
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#33)
Re: Cache Hash Index meta page.

On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Below are review comments on latest version of patch.

1.
  /*
- * Read the metapage to fetch original bucket and tuple counts.  Also, we
- * keep a copy of the last-seen metapage so that we can use its
- * hashm_spares[] values to compute bucket page addresses.  This is a bit
- * hokey but perfectly safe, since the interesting entries in the spares
- * array cannot change under us; and it beats rereading the metapage for
- * each bucket.
+ * update and get the metapage cache data.
  */
- metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
- metap = HashPageGetMeta(BufferGetPage(metabuf));
- orig_maxbucket = metap->hashm_maxbucket;
- orig_ntuples = metap->hashm_ntuples;
- memcpy(&local_metapage, metap, sizeof(local_metapage));
- /* release the lock, but keep pin */
- LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+ cachedmetap = _hash_getcachedmetap(rel, true);
+ orig_maxbucket = cachedmetap->hashm_maxbucket;
+ orig_ntuples = cachedmetap->hashm_ntuples;

(a) I think you can retain the previous comment or modify it slightly.
Just removing the whole comment and replacing it with a single line
seems like a step backward.
(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it. BTW, why can't you design your new API
such that it always take pinned metapage? You can always release the
pin in the caller if required. I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.

2.
+ if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+ bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+ cachedmetap = _hash_getcachedmetap(rel, true);

I don't understand the meaning of above if check. It seems like you
will update the metapage when previous block number is not a valid
block number which will be true at the first split. How will you
ensure that there is a re-split and cached metapage is not relevant.
I think if there is && in the above condition, then we can ensure it.

3.
+ Given a hashkey get the target bucket page with read lock, using cached
+ metapage. The getbucketbuf_from_hashkey method below explains the same.
+

All the sentences in algorithm start with small letters, then why do
you need an exception for this sentence. I think you don't need to
add an empty line. Also, I think the usage of
getbucketbuf_from_hashkey seems out of place. How about writing it
as:

The usage of cached metapage is explained later.

4.
+ If target bucket is split before metapage data was cached then we are
+ done.
+ Else first release the bucket page and then update the metapage cache
+ with latest metapage data.

I think it is better to retain original text of readme and add about
meta page update.

5.
+ Loop:
..
..
+ Loop again to reach the new target bucket.

No need to write "Loop again ..", that is implicit.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#35)
1 attachment(s)
Re: Cache Hash Index meta page.

On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
(a) I think you can retain the previous comment or modify it slightly.
Just removing the whole comment and replacing it with a single line
seems like a step backward.

-- Fixed, Just modified to say it

(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it. BTW, why can't you design your new API
such that it always take pinned metapage? You can always release the
pin in the caller if required. I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.

-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.

2.
+ if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+ bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+ cachedmetap = _hash_getcachedmetap(rel, true);

I don't understand the meaning of above if check. It seems like you
will update the metapage when previous block number is not a valid
block number which will be true at the first split. How will you
ensure that there is a re-split and cached metapage is not relevant.
I think if there is && in the above condition, then we can ensure it.

-- Oops that was a mistake corrected as you stated.

3.
+ Given a hashkey get the target bucket page with read lock, using cached
+ metapage. The getbucketbuf_from_hashkey method below explains the same.
+

All the sentences in algorithm start with small letters, then why do
you need an exception for this sentence. I think you don't need to
add an empty line. Also, I think the usage of
getbucketbuf_from_hashkey seems out of place. How about writing it
as:

The usage of cached metapage is explained later.

-- Fixed as like you have asked.

4.
+ If target bucket is split before metapage data was cached then we are
+ done.
+ Else first release the bucket page and then update the metapage cache
+ with latest metapage data.

I think it is better to retain original text of readme and add about
meta page update.

-- Fixed. Now where ever it is meaning full I have kept original
wordings. But, the way we get to right target buffer after the latest
split is slightly different than what the original code did. So there
is a slight modification to show we use metapage cache.

5.
+ Loop:
..
..
+ Loop again to reach the new target bucket.

No need to write "Loop again ..", that is implicit.

-- Fixed as liked you have asked.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_12.patchapplication/octet-stream; name=cache_hash_index_meta_page_12.patchDownload
diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 01ea115..e4c5bd0 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -188,17 +188,8 @@ track of available overflow pages.
 
 The reader algorithm is:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in shared mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the read lock, using
+	cached metapage; the usage of cached metapage is explained later.
 	if the target bucket is still being populated by a split:
 		release the buffer content lock on current bucket page
 		pin and acquire the buffer content lock on old bucket in shared mode
@@ -238,17 +229,8 @@ which this bucket is formed by split.
 
 The insertion algorithm is rather similar:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in exclusive mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the write lock, using
+	cached metapage; the usage of cached metapage is explained later.
 -- (so far same as reader, except for acquisition of buffer content lock in
 	exclusive mode on primary bucket page)
 	if the bucket-being-split flag is set for a bucket and pin count on it is
@@ -290,6 +272,20 @@ When an inserter cannot find space in any existing page of a bucket, it
 must obtain an overflow page and add that page to the bucket's chain.
 Details of that part of the algorithm appear later.
 
+The usage of cached metapage is explained as below.
+
+	if metapage cache is not set, read the meta page data; and set the cache;
+	hold meta page pin.
+	Loop:
+		compute bucket number for target hash key; take the buffer content
+		lock on bucket page in the read/write mode as requested by
+		reader/insert algorithm.
+		if (target bucket is split before metapage data was cached)
+			break;
+		release any existing bucket page buffer content lock; update the
+		metapage cache with latest metapage data.
+	release if any pin on metapage
+
 The page split algorithm is entered whenever an inserter observes that the
 index is overfull (has a higher-than-wanted ratio of tuples to buckets).
 The algorithm attempts, but does not necessarily succeed, to split one
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0cbf6b0..96b417a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	Bucket		cur_bucket;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	HashMetaPageData local_metapage;
+	HashMetaPage cachedmetap;
 
 	tuples_removed = 0;
 	num_index_tuples = 0;
 
 	/*
-	 * Read the metapage to fetch original bucket and tuple counts.  Also, we
-	 * keep a copy of the last-seen metapage so that we can use its
-	 * hashm_spares[] values to compute bucket page addresses.  This is a bit
-	 * hokey but perfectly safe, since the interesting entries in the spares
-	 * array cannot change under us; and it beats rereading the metapage for
-	 * each bucket.
+	 * Read the metapage to fetch original bucket and tuple counts. We use the
+	 * cached meta page data so that we can use its hashm_spares[] values to
+	 * compute bucket page addresses.  This is a bit hokey but perfectly safe,
+	 * since the interesting entries in the spares array cannot change under
+	 * us; and it beats rereading the metapage for each bucket.
 	 */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
-	orig_maxbucket = metap->hashm_maxbucket;
-	orig_ntuples = metap->hashm_ntuples;
-	memcpy(&local_metapage, metap, sizeof(local_metapage));
-	/* release the lock, but keep pin */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+	cachedmetap = _hash_getcachedmetap(rel, metabuf);
+	orig_maxbucket = cachedmetap->hashm_maxbucket;
+	orig_ntuples = cachedmetap->hashm_ntuples;
 
 	/* Scan the buckets that we know exist */
 	cur_bucket = 0;
@@ -542,7 +538,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
 
 		blkno = bucket_blkno;
 
@@ -574,19 +570,19 @@ loop_top:
 			 * tuples left behind by the most recent split.  To prevent that,
 			 * now that the primary page of the target bucket has been locked
 			 * (and thus can't be further split), update our cached metapage
-			 * data.
+			 * data in such case.
 			 */
-			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-			memcpy(&local_metapage, metap, sizeof(local_metapage));
-			LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+			if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber &&
+				bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+				cachedmetap = _hash_getcachedmetap(rel, metabuf);
 		}
 
 		bucket_buf = buf;
 
 		hashbucketcleanup(rel, cur_bucket, bucket_buf, blkno, info->strategy,
-						  local_metapage.hashm_maxbucket,
-						  local_metapage.hashm_highmask,
-						  local_metapage.hashm_lowmask, &tuples_removed,
+						  cachedmetap->hashm_maxbucket,
+						  cachedmetap->hashm_highmask,
+						  cachedmetap->hashm_lowmask, &tuples_removed,
 						  &num_index_tuples, split_cleanup,
 						  callback, callback_state);
 
@@ -603,9 +599,9 @@ loop_top:
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
 		/* There's been a split, so process the additional bucket(s) */
-		cur_maxbucket = metap->hashm_maxbucket;
-		memcpy(&local_metapage, metap, sizeof(local_metapage));
 		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+		cachedmetap = _hash_getcachedmetap(rel, metabuf);
+		cur_maxbucket = cachedmetap->hashm_maxbucket;
 		goto loop_top;
 	}
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..bec5ef3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -151,8 +106,10 @@ restart_insert:
 		/* release the lock on bucket buffer, before completing the split. */
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+		_hash_finish_split(rel, metabuf, buf, bucket,
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..27b3285 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * _hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,126 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+/*
+ *	_hash_getcachedmetap() -- Returns cached metapage data.
+ *
+ * 	metabuf : If set caller must hold a pin, but no lock, on the metapage. Read
+ * 	from metabuf and set the rd_amcache.
+ *
+ */
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer metabuf)
+{
+	Page	page;
+
+	if (BufferIsInvalid(metabuf))
+		return (HashMetaPage) rel->rd_amcache;
+
+	if (rel->rd_amcache == NULL)
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+
+	/* Read the metapage. */
+	LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+	page = BufferGetPage(metabuf);
+	memcpy(rel->rd_amcache, HashPageGetMeta(page), sizeof(HashMetaPageData));
+
+	/* Release metapage lock. Keep the pin. */
+	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	return (HashMetaPage) rel->rd_amcache;
+}
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access everytime we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see _hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	if (!(metap = _hash_getcachedmetap(rel, InvalidBuffer)))
+	{
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		metap = _hash_getcachedmetap(rel, metabuf);
+		Assert(metap != NULL);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Update the cached meta page data. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		metap = _hash_getcachedmetap(rel, metabuf);
+		Assert(metap != NULL);
+	}
+
+	if (BufferIsValid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index c0bdfe6..922143d 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -139,6 +139,7 @@ _hash_readprev(IndexScanDesc scan,
 	BlockNumber blkno;
 	Relation	rel = scan->indexRelation;
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
+	bool		haveprevblk = true;
 
 	blkno = (*opaquep)->hasho_prevblkno;
 
@@ -147,15 +148,20 @@ _hash_readprev(IndexScanDesc scan,
 	 * comments in _hash_first to know the reason of retaining pin.
 	 */
 	if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)
+	{
 		LockBuffer(*bufp, BUFFER_LOCK_UNLOCK);
+		haveprevblk = false;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	if (haveprevblk)
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +278,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index b0a1131..2e1a6c5 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -327,6 +334,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern HashMetaPage _hash_getcachedmetap(Relation rel, Buffer metabuf);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#37Amit Kapila
amit.kapila16@gmail.com
In reply to: Mithun Cy (#36)
Re: Cache Hash Index meta page.

On Wed, Jan 18, 2017 at 11:51 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it. BTW, why can't you design your new API
such that it always take pinned metapage? You can always release the
pin in the caller if required. I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.

-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.

1.
@@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
..
..

+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ cachedmetap = _hash_getcachedmetap(rel, metabuf);

In the above flow, do we really need an updated metapage, can't we use
the cached one? We are already taking care of bucket split down in
that function.

2.
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer metabuf)
+{
..
..
+ if (BufferIsInvalid(metabuf))
+ return (HashMetaPage) rel->rd_amcache;
..
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+ HashMetaPage *cachedmetap)
{
..
+ if (!(metap = _hash_getcachedmetap(rel, InvalidBuffer)))
+ {
+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ metap = _hash_getcachedmetap(rel, metabuf);
+ Assert(metap != NULL);
+ }
..
}

The above two chunks of code look worse as compare to your previous
patch. I think what we can do is keep the patch ready with both the
versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
and let committer take the final call.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Amit Kapila (#37)
2 attachment(s)
Re: Cache Hash Index meta page.

On Tue, Jan 24, 2017 at 3:10 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

1.
@@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,

In the above flow, do we really need an updated metapage, can't we use
the cached one? We are already taking care of bucket split down in
that function.

Yes, we can use the old cached metap entry, the only reason I decided
to use the latest metapage content is because the old code used to do
that. And, cached metap is used to avoid ad-hoc local saving of same
and hence unify the cached metap API. I did not intend to save the
metapage read here which I thought will not be much useful if new
buckets are added anyway we need to read the metapage at the end. I
have taken you comments now I only read metap cache which is already
set.

2.
The above two chunks of code look worse as compare to your previous
patch. I think what we can do is keep the patch ready with both the
versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
and let committer take the final call.

_v11 API's was self-sustained one but it does not hold pins on the
metapage buffer. Whereas in _v12 we hold the pin for two consecutive
reads of metapage. I have taken your advice and producing 2 different
patches.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_13_donotholdpin.patchapplication/octet-stream; name=cache_hash_index_meta_page_13_donotholdpin.patchDownload
commit c49c72e047a2efc9a6b918dd3dc56fba72388d96
Author: mithun <mithun@localhost.localdomain>
Date:   Thu Jan 26 23:58:18 2017 +0530

    Hash Index : Cache meta page Patch
    ----------------------------------
    Patch Name Cache_hash_index_meta_page_13_donotholdpin.patch

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 01ea115..e4c5bd0 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -188,17 +188,8 @@ track of available overflow pages.
 
 The reader algorithm is:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in shared mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the read lock, using
+	cached metapage; the usage of cached metapage is explained later.
 	if the target bucket is still being populated by a split:
 		release the buffer content lock on current bucket page
 		pin and acquire the buffer content lock on old bucket in shared mode
@@ -238,17 +229,8 @@ which this bucket is formed by split.
 
 The insertion algorithm is rather similar:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in exclusive mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the write lock, using
+	cached metapage; the usage of cached metapage is explained later.
 -- (so far same as reader, except for acquisition of buffer content lock in
 	exclusive mode on primary bucket page)
 	if the bucket-being-split flag is set for a bucket and pin count on it is
@@ -290,6 +272,20 @@ When an inserter cannot find space in any existing page of a bucket, it
 must obtain an overflow page and add that page to the bucket's chain.
 Details of that part of the algorithm appear later.
 
+The usage of cached metapage is explained as below.
+
+	if metapage cache is not set, read the meta page data; and set the cache;
+	hold meta page pin.
+	Loop:
+		compute bucket number for target hash key; take the buffer content
+		lock on bucket page in the read/write mode as requested by
+		reader/insert algorithm.
+		if (target bucket is split before metapage data was cached)
+			break;
+		release any existing bucket page buffer content lock; update the
+		metapage cache with latest metapage data.
+	release if any pin on metapage
+
 The page split algorithm is entered whenever an inserter observes that the
 index is overfull (has a higher-than-wanted ratio of tuples to buckets).
 The algorithm attempts, but does not necessarily succeed, to split one
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index ec8ed33..a39c911 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -507,28 +507,23 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	Bucket		orig_maxbucket;
 	Bucket		cur_maxbucket;
 	Bucket		cur_bucket;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	HashMetaPage metap;
-	HashMetaPageData local_metapage;
+	HashMetaPage cachedmetap;
 
 	tuples_removed = 0;
 	num_index_tuples = 0;
 
 	/*
-	 * Read the metapage to fetch original bucket and tuple counts.  Also, we
-	 * keep a copy of the last-seen metapage so that we can use its
-	 * hashm_spares[] values to compute bucket page addresses.  This is a bit
-	 * hokey but perfectly safe, since the interesting entries in the spares
-	 * array cannot change under us; and it beats rereading the metapage for
-	 * each bucket.
+	 * Read the metapage to fetch original bucket and tuple counts. We use the
+	 * cached meta page data so that we can use its hashm_spares[] values to
+	 * compute bucket page addresses.  This is a bit hokey but perfectly safe,
+	 * since the interesting entries in the spares array cannot change under
+	 * us; and it beats rereading the metapage for each bucket.
 	 */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
-	orig_maxbucket = metap->hashm_maxbucket;
-	orig_ntuples = metap->hashm_ntuples;
-	memcpy(&local_metapage, metap, sizeof(local_metapage));
-	/* release the lock, but keep pin */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	cachedmetap = _hash_getcachedmetap(rel, false);
+	orig_maxbucket = cachedmetap->hashm_maxbucket;
+	orig_ntuples = cachedmetap->hashm_ntuples;
 
 	/* Scan the buckets that we know exist */
 	cur_bucket = 0;
@@ -546,7 +541,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
 
 		blkno = bucket_blkno;
 
@@ -578,19 +573,19 @@ loop_top:
 			 * tuples left behind by the most recent split.  To prevent that,
 			 * now that the primary page of the target bucket has been locked
 			 * (and thus can't be further split), update our cached metapage
-			 * data.
+			 * data in such case.
 			 */
-			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-			memcpy(&local_metapage, metap, sizeof(local_metapage));
-			LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+			if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber &&
+				bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+				cachedmetap = _hash_getcachedmetap(rel, true);
 		}
 
 		bucket_buf = buf;
 
 		hashbucketcleanup(rel, cur_bucket, bucket_buf, blkno, info->strategy,
-						  local_metapage.hashm_maxbucket,
-						  local_metapage.hashm_highmask,
-						  local_metapage.hashm_lowmask, &tuples_removed,
+						  cachedmetap->hashm_maxbucket,
+						  cachedmetap->hashm_highmask,
+						  cachedmetap->hashm_lowmask, &tuples_removed,
 						  &num_index_tuples, split_cleanup,
 						  callback, callback_state);
 
@@ -600,6 +595,9 @@ loop_top:
 		cur_bucket++;
 	}
 
+	if (BufferIsInvalid(metabuf))
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+
 	/* Write-lock metapage and check for split since we started */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
@@ -607,9 +605,9 @@ loop_top:
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
 		/* There's been a split, so process the additional bucket(s) */
-		cur_maxbucket = metap->hashm_maxbucket;
-		memcpy(&local_metapage, metap, sizeof(local_metapage));
 		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+		cachedmetap = _hash_getcachedmetap(rel, true);
+		cur_maxbucket = cachedmetap->hashm_maxbucket;
 		goto loop_top;
 	}
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..bec5ef3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -151,8 +106,10 @@ restart_insert:
 		/* release the lock on bucket buffer, before completing the split. */
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+		_hash_finish_split(rel, metabuf, buf, bucket,
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..5368292 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * _hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,115 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+/*
+ *	_hash_getcachedmetap() -- Returns cached metapage data.
+ *
+ * 	updatecache: if set true updates the cache with latest meta page data, then
+ * 	returns same.
+ */
+HashMetaPage
+_hash_getcachedmetap(Relation rel, bool updatecache)
+{
+	Buffer		metabuf;
+	Page		page;
+
+	if (updatecache || rel->rd_amcache == NULL)
+	{
+		if (rel->rd_amcache == NULL)
+			rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+												 sizeof(HashMetaPageData));
+
+		/* Read the metapage. */
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+		page = BufferGetPage(metabuf);
+		memcpy(rel->rd_amcache, HashPageGetMeta(page),
+			   sizeof(HashMetaPageData));
+
+		/* Release metapage. */
+		_hash_relbuf(rel, metabuf);
+	}
+
+	return (HashMetaPage) rel->rd_amcache;
+}
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access every time we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see _hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	metap = _hash_getcachedmetap(rel, false);
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Update the cached meta page data. */
+		metap = _hash_getcachedmetap(rel, true);
+	}
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index a59ad6f..af504b3 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -139,6 +139,7 @@ _hash_readprev(IndexScanDesc scan,
 	BlockNumber blkno;
 	Relation	rel = scan->indexRelation;
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
+	bool		haveprevblk = true;
 
 	blkno = (*opaquep)->hasho_prevblkno;
 
@@ -147,15 +148,20 @@ _hash_readprev(IndexScanDesc scan,
 	 * comments in _hash_first to know the reason of retaining pin.
 	 */
 	if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)
+	{
 		LockBuffer(*bufp, BUFFER_LOCK_UNLOCK);
+		haveprevblk = false;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	if (haveprevblk)
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +278,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 69a3873..6eed415 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -305,6 +312,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern HashMetaPage _hash_getcachedmetap(Relation rel, bool updatecache);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
cache_hash_index_meta_page_13_holdpin.patchapplication/octet-stream; name=cache_hash_index_meta_page_13_holdpin.patchDownload
commit a63b9e753947a3b159e4b527ee0297459a48639a
Author: mithun <mithun@localhost.localdomain>
Date:   Fri Jan 27 00:03:36 2017 +0530

    cache_hash_index_meta_page_13_holdpin.patch

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 01ea115..e4c5bd0 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -188,17 +188,8 @@ track of available overflow pages.
 
 The reader algorithm is:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in shared mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the read lock, using
+	cached metapage; the usage of cached metapage is explained later.
 	if the target bucket is still being populated by a split:
 		release the buffer content lock on current bucket page
 		pin and acquire the buffer content lock on old bucket in shared mode
@@ -238,17 +229,8 @@ which this bucket is formed by split.
 
 The insertion algorithm is rather similar:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in exclusive mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the write lock, using
+	cached metapage; the usage of cached metapage is explained later.
 -- (so far same as reader, except for acquisition of buffer content lock in
 	exclusive mode on primary bucket page)
 	if the bucket-being-split flag is set for a bucket and pin count on it is
@@ -290,6 +272,20 @@ When an inserter cannot find space in any existing page of a bucket, it
 must obtain an overflow page and add that page to the bucket's chain.
 Details of that part of the algorithm appear later.
 
+The usage of cached metapage is explained as below.
+
+	if metapage cache is not set, read the meta page data; and set the cache;
+	hold meta page pin.
+	Loop:
+		compute bucket number for target hash key; take the buffer content
+		lock on bucket page in the read/write mode as requested by
+		reader/insert algorithm.
+		if (target bucket is split before metapage data was cached)
+			break;
+		release any existing bucket page buffer content lock; update the
+		metapage cache with latest metapage data.
+	release if any pin on metapage
+
 The page split algorithm is entered whenever an inserter observes that the
 index is overfull (has a higher-than-wanted ratio of tuples to buckets).
 The algorithm attempts, but does not necessarily succeed, to split one
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index ec8ed33..7994d16 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -507,28 +507,31 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	Bucket		orig_maxbucket;
 	Bucket		cur_maxbucket;
 	Bucket		cur_bucket;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	HashMetaPage metap;
-	HashMetaPageData local_metapage;
+	HashMetaPage cachedmetap;
 
 	tuples_removed = 0;
 	num_index_tuples = 0;
 
 	/*
-	 * Read the metapage to fetch original bucket and tuple counts.  Also, we
-	 * keep a copy of the last-seen metapage so that we can use its
-	 * hashm_spares[] values to compute bucket page addresses.  This is a bit
-	 * hokey but perfectly safe, since the interesting entries in the spares
-	 * array cannot change under us; and it beats rereading the metapage for
-	 * each bucket.
+	 * Read the metapage to fetch original bucket and tuple counts. We use the
+	 * cached meta page data so that we can use its hashm_spares[] values to
+	 * compute bucket page addresses.  This is a bit hokey but perfectly safe,
+	 * since the interesting entries in the spares array cannot change under
+	 * us; and it beats rereading the metapage for each bucket.
 	 */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
-	orig_maxbucket = metap->hashm_maxbucket;
-	orig_ntuples = metap->hashm_ntuples;
-	memcpy(&local_metapage, metap, sizeof(local_metapage));
-	/* release the lock, but keep pin */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	cachedmetap = _hash_getcachedmetap(rel, InvalidBuffer);
+
+	if (!cachedmetap)
+	{
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		cachedmetap = _hash_getcachedmetap(rel, metabuf);
+		Assert(cachedmetap != NULL);
+	}
+
+	orig_maxbucket = cachedmetap->hashm_maxbucket;
+	orig_ntuples = cachedmetap->hashm_ntuples;
 
 	/* Scan the buckets that we know exist */
 	cur_bucket = 0;
@@ -546,7 +549,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
 
 		blkno = bucket_blkno;
 
@@ -578,19 +581,24 @@ loop_top:
 			 * tuples left behind by the most recent split.  To prevent that,
 			 * now that the primary page of the target bucket has been locked
 			 * (and thus can't be further split), update our cached metapage
-			 * data.
+			 * data in such case.
 			 */
-			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-			memcpy(&local_metapage, metap, sizeof(local_metapage));
-			LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+			if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber &&
+				bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+			{
+				if (BufferIsInvalid(metabuf))
+					metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK,
+										   LH_META_PAGE);
+				cachedmetap = _hash_getcachedmetap(rel, metabuf);
+			}
 		}
 
 		bucket_buf = buf;
 
 		hashbucketcleanup(rel, cur_bucket, bucket_buf, blkno, info->strategy,
-						  local_metapage.hashm_maxbucket,
-						  local_metapage.hashm_highmask,
-						  local_metapage.hashm_lowmask, &tuples_removed,
+						  cachedmetap->hashm_maxbucket,
+						  cachedmetap->hashm_highmask,
+						  cachedmetap->hashm_lowmask, &tuples_removed,
 						  &num_index_tuples, split_cleanup,
 						  callback, callback_state);
 
@@ -600,6 +608,9 @@ loop_top:
 		cur_bucket++;
 	}
 
+	if (BufferIsInvalid(metabuf))
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+
 	/* Write-lock metapage and check for split since we started */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
@@ -607,9 +618,9 @@ loop_top:
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
 		/* There's been a split, so process the additional bucket(s) */
-		cur_maxbucket = metap->hashm_maxbucket;
-		memcpy(&local_metapage, metap, sizeof(local_metapage));
 		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+		cachedmetap = _hash_getcachedmetap(rel, metabuf);
+		cur_maxbucket = cachedmetap->hashm_maxbucket;
 		goto loop_top;
 	}
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..bec5ef3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -151,8 +106,10 @@ restart_insert:
 		/* release the lock on bucket buffer, before completing the split. */
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+		_hash_finish_split(rel, metabuf, buf, bucket,
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..6c66c5d 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * _hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,127 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+/*
+ *	_hash_getcachedmetap() -- Returns cached metapage data.
+ *
+ * 	metabuf : If set caller must hold a pin, but no lock, on the metapage. Read
+ * 	from metabuf and set the rd_amcache.
+ *
+ */
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer metabuf)
+{
+	Page		page;
+
+	if (BufferIsInvalid(metabuf))
+		return (HashMetaPage) rel->rd_amcache;
+
+	if (rel->rd_amcache == NULL)
+		rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+											 sizeof(HashMetaPageData));
+
+	/* Read the metapage. */
+	LockBuffer(metabuf, BUFFER_LOCK_SHARE);
+	page = BufferGetPage(metabuf);
+	memcpy(rel->rd_amcache, HashPageGetMeta(page), sizeof(HashMetaPageData));
+
+	/* Release metapage lock. Keep the pin. */
+	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	return (HashMetaPage) rel->rd_amcache;
+}
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access every time we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see _hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	metap = _hash_getcachedmetap(rel, InvalidBuffer);
+	if (!metap)
+	{
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		metap = _hash_getcachedmetap(rel, metabuf);
+		Assert(metap != NULL);
+	}
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Update the cached meta page data. */
+		if (BufferIsInvalid(metabuf))
+			metabuf =
+				_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+		metap = _hash_getcachedmetap(rel, metabuf);
+		Assert(metap != NULL);
+	}
+
+	if (BufferIsValid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index a59ad6f..af504b3 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -139,6 +139,7 @@ _hash_readprev(IndexScanDesc scan,
 	BlockNumber blkno;
 	Relation	rel = scan->indexRelation;
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
+	bool		haveprevblk = true;
 
 	blkno = (*opaquep)->hasho_prevblkno;
 
@@ -147,15 +148,20 @@ _hash_readprev(IndexScanDesc scan,
 	 * comments in _hash_first to know the reason of retaining pin.
 	 */
 	if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)
+	{
 		LockBuffer(*bufp, BUFFER_LOCK_UNLOCK);
+		haveprevblk = false;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	if (haveprevblk)
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +278,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 69a3873..5a39d14 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -305,6 +312,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern HashMetaPage _hash_getcachedmetap(Relation rel, Buffer metabuf);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#39Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#38)
Re: Cache Hash Index meta page.

On Thu, Jan 26, 2017 at 1:48 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

_v11 API's was self-sustained one but it does not hold pins on the
metapage buffer. Whereas in _v12 we hold the pin for two consecutive
reads of metapage. I have taken your advice and producing 2 different
patches.

Hmm. I think both of these APIs have some advantages. On the one
hand, passing metabuf sometimes allows you to avoid pin/unpin cycles -
e.g. in hashbulkdelete it makes a fair amount of sense to keep the
metabuf pinned once we've had to read it, just in case we need it
again. On the other hand, it's surprising that passing a value for
the metabuf forces the cached to be refreshed. I wonder if a good API
might be something like this:

HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);

If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held. If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Robert Haas (#39)
1 attachment(s)
Re: Cache Hash Index meta page.

HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);

If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held. If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.

Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_hash_index_meta_page_14.patchapplication/octet-stream; name=cache_hash_index_meta_page_14.patchDownload
commit f01780a9f2118898c688a04c451aaaeef343d9b6
Author: mithun <mithun@localhost.localdomain>
Date:   Sun Jan 29 14:37:46 2017 +0530

    cache_hash_index_meta_page_14.patch

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 01ea115..e4c5bd0 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -188,17 +188,8 @@ track of available overflow pages.
 
 The reader algorithm is:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in shared mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the read lock, using
+	cached metapage; the usage of cached metapage is explained later.
 	if the target bucket is still being populated by a split:
 		release the buffer content lock on current bucket page
 		pin and acquire the buffer content lock on old bucket in shared mode
@@ -238,17 +229,8 @@ which this bucket is formed by split.
 
 The insertion algorithm is rather similar:
 
-	pin meta page and take buffer content lock in shared mode
-	loop:
-		compute bucket number for target hash key
-		release meta page buffer content lock
-		if (correct bucket page is already locked)
-			break
-		release any existing bucket page buffer content lock (if a concurrent
-         split happened)
-		take the buffer content lock on bucket page in exclusive mode
-		retake meta page buffer content lock in shared mode
-	release pin on metapage
+	given a hashkey get the target bucket page with the write lock, using
+	cached metapage; the usage of cached metapage is explained later.
 -- (so far same as reader, except for acquisition of buffer content lock in
 	exclusive mode on primary bucket page)
 	if the bucket-being-split flag is set for a bucket and pin count on it is
@@ -290,6 +272,20 @@ When an inserter cannot find space in any existing page of a bucket, it
 must obtain an overflow page and add that page to the bucket's chain.
 Details of that part of the algorithm appear later.
 
+The usage of cached metapage is explained as below.
+
+	if metapage cache is not set, read the meta page data; and set the cache;
+	hold meta page pin.
+	Loop:
+		compute bucket number for target hash key; take the buffer content
+		lock on bucket page in the read/write mode as requested by
+		reader/insert algorithm.
+		if (target bucket is split before metapage data was cached)
+			break;
+		release any existing bucket page buffer content lock; update the
+		metapage cache with latest metapage data.
+	release if any pin on metapage
+
 The page split algorithm is entered whenever an inserter observes that the
 index is overfull (has a higher-than-wanted ratio of tuples to buckets).
 The algorithm attempts, but does not necessarily succeed, to split one
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index ec8ed33..4c35bdc 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -507,28 +507,25 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 	Bucket		orig_maxbucket;
 	Bucket		cur_maxbucket;
 	Bucket		cur_bucket;
-	Buffer		metabuf;
+	Buffer		metabuf = InvalidBuffer;
 	HashMetaPage metap;
-	HashMetaPageData local_metapage;
+	HashMetaPage cachedmetap;
 
 	tuples_removed = 0;
 	num_index_tuples = 0;
 
 	/*
-	 * Read the metapage to fetch original bucket and tuple counts.  Also, we
-	 * keep a copy of the last-seen metapage so that we can use its
-	 * hashm_spares[] values to compute bucket page addresses.  This is a bit
-	 * hokey but perfectly safe, since the interesting entries in the spares
-	 * array cannot change under us; and it beats rereading the metapage for
-	 * each bucket.
+	 * Read the metapage to fetch original bucket and tuple counts. We use the
+	 * cached meta page data so that we can use its hashm_spares[] values to
+	 * compute bucket page addresses.  This is a bit hokey but perfectly safe,
+	 * since the interesting entries in the spares array cannot change under
+	 * us; and it beats rereading the metapage for each bucket.
 	 */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	metap = HashPageGetMeta(BufferGetPage(metabuf));
-	orig_maxbucket = metap->hashm_maxbucket;
-	orig_ntuples = metap->hashm_ntuples;
-	memcpy(&local_metapage, metap, sizeof(local_metapage));
-	/* release the lock, but keep pin */
-	LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+	cachedmetap = _hash_getcachedmetap(rel, &metabuf, false);
+	Assert(cachedmetap != NULL);
+
+	orig_maxbucket = cachedmetap->hashm_maxbucket;
+	orig_ntuples = cachedmetap->hashm_ntuples;
 
 	/* Scan the buckets that we know exist */
 	cur_bucket = 0;
@@ -546,7 +543,7 @@ loop_top:
 		bool		split_cleanup = false;
 
 		/* Get address of bucket's start page */
-		bucket_blkno = BUCKET_TO_BLKNO(&local_metapage, cur_bucket);
+		bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
 
 		blkno = bucket_blkno;
 
@@ -578,19 +575,22 @@ loop_top:
 			 * tuples left behind by the most recent split.  To prevent that,
 			 * now that the primary page of the target bucket has been locked
 			 * (and thus can't be further split), update our cached metapage
-			 * data.
+			 * data in such case.
 			 */
-			LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-			memcpy(&local_metapage, metap, sizeof(local_metapage));
-			LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+			if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber &&
+				bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+			{
+				cachedmetap = _hash_getcachedmetap(rel, &metabuf, true);
+				Assert(cachedmetap != NULL);
+			}
 		}
 
 		bucket_buf = buf;
 
 		hashbucketcleanup(rel, cur_bucket, bucket_buf, blkno, info->strategy,
-						  local_metapage.hashm_maxbucket,
-						  local_metapage.hashm_highmask,
-						  local_metapage.hashm_lowmask, &tuples_removed,
+						  cachedmetap->hashm_maxbucket,
+						  cachedmetap->hashm_highmask,
+						  cachedmetap->hashm_lowmask, &tuples_removed,
 						  &num_index_tuples, split_cleanup,
 						  callback, callback_state);
 
@@ -600,6 +600,9 @@ loop_top:
 		cur_bucket++;
 	}
 
+	if (BufferIsInvalid(metabuf))
+		metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+
 	/* Write-lock metapage and check for split since we started */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 	metap = HashPageGetMeta(BufferGetPage(metabuf));
@@ -607,9 +610,10 @@ loop_top:
 	if (cur_maxbucket != metap->hashm_maxbucket)
 	{
 		/* There's been a split, so process the additional bucket(s) */
-		cur_maxbucket = metap->hashm_maxbucket;
-		memcpy(&local_metapage, metap, sizeof(local_metapage));
 		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+		cachedmetap = _hash_getcachedmetap(rel, &metabuf, true);
+		Assert(cachedmetap != NULL);
+		cur_maxbucket = cachedmetap->hashm_maxbucket;
 		goto loop_top;
 	}
 
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 39c70d3..bec5ef3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,7 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	Buffer		bucket_buf;
 	Buffer		metabuf;
 	HashMetaPage metap;
-	BlockNumber blkno;
-	BlockNumber oldblkno;
-	bool		retry;
+	HashMetaPage usedmetap = NULL;
 	Page		metapage;
 	Page		page;
 	HashPageOpaque pageopaque;
@@ -42,9 +40,6 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 	bool		do_expand;
 	uint32		hashkey;
 	Bucket		bucket;
-	uint32		maxbucket;
-	uint32		highmask;
-	uint32		lowmask;
 
 	/*
 	 * Get the hash key for the item (it's stored in the index tuple itself).
@@ -57,10 +52,15 @@ _hash_doinsert(Relation rel, IndexTuple itup)
 								 * need to be consistent */
 
 restart_insert:
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
+
+	/*
+	 * Load the metapage. No need to lock as of now because we only access
+	 * page header element pd_pagesize_version in HashMaxItemSize(), this
+	 * element is constant and will not move while accessing. But we hold the
+	 * pin so we can use the metabuf while writing into it below.
+	 */
+	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
 	metapage = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(metapage);
 
 	/*
 	 * Check whether the item can fit on a hash page at all. (Eventually, we
@@ -76,66 +76,21 @@ restart_insert:
 						itemsz, HashMaxItemSize(metapage)),
 			errhint("Values larger than a buffer page cannot be indexed.")));
 
-	oldblkno = InvalidBlockNumber;
-	retry = false;
-
-	/*
-	 * Loop until we get a lock on the correct target bucket.
-	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/*
-		 * Copy bucket mapping info now; refer the comment in
-		 * _hash_expandtable where we copy this information before calling
-		 * _hash_splitbucket to see why this is okay.
-		 */
-		maxbucket = metap->hashm_maxbucket;
-		highmask = metap->hashm_highmask;
-		lowmask = metap->hashm_lowmask;
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked the primary page of
-		 * what is still the correct target bucket, we are done.  Otherwise,
-		 * drop any old lock before acquiring the new one.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch and lock the primary bucket page for the target bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_WRITE, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_WRITE,
+										  &usedmetap);
+	Assert(usedmetap != NULL);
 
 	/* remember the primary bucket buffer to release the pin on it at end. */
 	bucket_buf = buf;
 
 	page = BufferGetPage(buf);
 	pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(pageopaque->hasho_bucket == bucket);
+
+	/*
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
+	 */
+	bucket = pageopaque->hasho_bucket;
 
 	/*
 	 * If this bucket is in the process of being split, try to finish the
@@ -151,8 +106,10 @@ restart_insert:
 		/* release the lock on bucket buffer, before completing the split. */
 		LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 
-		_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
-						   maxbucket, highmask, lowmask);
+		_hash_finish_split(rel, metabuf, buf, bucket,
+						   usedmetap->hashm_maxbucket,
+						   usedmetap->hashm_highmask,
+						   usedmetap->hashm_lowmask);
 
 		/* release the pin on old and meta buffer.  retry for insert. */
 		_hash_dropbuf(rel, buf);
@@ -225,6 +182,7 @@ restart_insert:
 	 */
 	LockBuffer(metabuf, BUFFER_LOCK_EXCLUSIVE);
 
+	metap = HashPageGetMeta(metapage);
 	metap->hashm_ntuples += 1;
 
 	/* Make sure this stays in sync with _hash_expandtable() */
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 9430794..c70d6bb 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -434,7 +434,13 @@ _hash_metapinit(Relation rel, double num_tuples, ForkNumber forkNum)
 		buf = _hash_getnewbuf(rel, BUCKET_TO_BLKNO(metap, i), forkNum);
 		pg = BufferGetPage(buf);
 		pageopaque = (HashPageOpaque) PageGetSpecialPointer(pg);
-		pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+		/*
+		 * Set hasho_prevblkno with current hashm_maxbucket. This value will
+		 * be used to validate cached HashMetaPageData. See
+		 * _hash_getbucketbuf_from_hashkey().
+		 */
+		pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
 		pageopaque->hasho_nextblkno = InvalidBlockNumber;
 		pageopaque->hasho_bucket = i;
 		pageopaque->hasho_flag = LH_BUCKET_PAGE;
@@ -845,6 +851,12 @@ _hash_splitbucket(Relation rel,
 	 */
 	oopaque->hasho_flag |= LH_BUCKET_BEING_SPLIT;
 
+	/*
+	 * Setting hasho_prevblkno of bucket page with latest maxbucket number to
+	 * indicate bucket has been split and need to reconstruct
+	 * HashMetaPageData. Below same is done for new bucket page.
+	 */
+	oopaque->hasho_prevblkno = maxbucket;
 	npage = BufferGetPage(nbuf);
 
 	/*
@@ -852,7 +864,7 @@ _hash_splitbucket(Relation rel,
 	 * split is in progress.
 	 */
 	nopaque = (HashPageOpaque) PageGetSpecialPointer(npage);
-	nopaque->hasho_prevblkno = InvalidBlockNumber;
+	nopaque->hasho_prevblkno = maxbucket;
 	nopaque->hasho_nextblkno = InvalidBlockNumber;
 	nopaque->hasho_bucket = nbucket;
 	nopaque->hasho_flag = LH_BUCKET_PAGE | LH_BUCKET_BEING_POPULATED;
@@ -1191,3 +1203,130 @@ _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf, Bucket obucket,
 	LockBuffer(obuf, BUFFER_LOCK_UNLOCK);
 	hash_destroy(tidhtab);
 }
+
+/*
+ *	_hash_getcachedmetap() -- Returns cached metapage data.
+ *
+ *	metabuf : If set with valid bufferid, caller must hold a pin, but no lock,
+ *	on the metapage.
+ *	force_refresh : If set, update the cache with latest metapage content.
+ *
+ *	This function will not release the pin on metabuf. The caller should do
+ *	same.
+ */
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer *metabuf, bool force_refresh)
+{
+	Page		page;
+
+	Assert(metabuf);
+	if (force_refresh || rel->rd_amcache == NULL)
+	{
+		if (rel->rd_amcache == NULL)
+			rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
+												 sizeof(HashMetaPageData));
+
+		/* Read the metapage. */
+		if (BufferIsValid(*metabuf))
+			LockBuffer(*metabuf, BUFFER_LOCK_SHARE);
+		else
+			*metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ,
+									LH_META_PAGE);
+
+		page = BufferGetPage(*metabuf);
+		memcpy(rel->rd_amcache, HashPageGetMeta(page),
+			   sizeof(HashMetaPageData));
+
+		/* Release metapage lock. Keep the pin. */
+		LockBuffer(*metabuf, BUFFER_LOCK_UNLOCK);
+	}
+
+	return (HashMetaPage) rel->rd_amcache;
+}
+
+/*
+ *	_hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
+ *										 hashkey.
+ *
+ *	Bucket Pages do not move or get removed once they are allocated. This give
+ *	us an opportunity to use the previously saved metapage contents to reach
+ *	the target bucket buffer, instead of every time reading from the metapage
+ *	buffer. This saves one buffer access every time we want to reach the target
+ *	bucket buffer, which is very helpful savings in bufmgr traffic and
+ *	contention.
+ *
+ *	The access type parameter (HASH_READ or HASH_WRITE) indicates whether the
+ *	bucket buffer has to be locked for reading or writing.
+ *
+ *	The out parameter cachedmetap is set with metapage contents used for
+ *	hashkey to bucket buffer mapping. Some callers need this info to reach the
+ *	old bucket in case of bucket split, see _hash_doinsert().
+ */
+Buffer
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+								HashMetaPage *cachedmetap)
+{
+	HashMetaPage metap;
+	Buffer		buf;
+	Buffer		metabuf = InvalidBuffer;
+	Page		page;
+	Bucket		bucket;
+	BlockNumber blkno;
+	HashPageOpaque opaque;
+
+	/* We read from target bucket buffer, hence locking is must. */
+	Assert(access == HASH_READ || access == HASH_WRITE);
+
+	metap = _hash_getcachedmetap(rel, &metabuf, false);
+	Assert(metap != NULL);
+
+	/*
+	 * Loop until we get a lock on the correct target bucket.
+	 */
+	for (;;)
+	{
+		/*
+		 * Compute the target bucket number, and convert to block number.
+		 */
+		bucket = _hash_hashkey2bucket(hashkey,
+									  metap->hashm_maxbucket,
+									  metap->hashm_highmask,
+									  metap->hashm_lowmask);
+
+		blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+		/* Fetch the primary bucket page for the bucket */
+		buf = _hash_getbuf(rel, blkno, access, LH_BUCKET_PAGE);
+		page = BufferGetPage(buf);
+		opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+		Assert(opaque->hasho_bucket == bucket);
+
+		/*
+		 * Check if this bucket is split after we have cached the
+		 * HashMetaPageData by comparing their respective hashm_maxbucket. If
+		 * so we need to read the metapage and recompute the bucket number
+		 * again.
+		 */
+		if (opaque->hasho_prevblkno == InvalidBlockNumber ||
+			opaque->hasho_prevblkno <= metap->hashm_maxbucket)
+		{
+			/* Ok now we have the right bucket proceed to search in it. */
+			break;
+		}
+
+		/* First drop any locks held on bucket buffers. */
+		_hash_relbuf(rel, buf);
+
+		/* Update the cached meta page data. */
+		metap = _hash_getcachedmetap(rel, &metabuf, true);
+		Assert(metap != NULL);
+	}
+
+	if (BufferIsValid(metabuf))
+		_hash_dropbuf(rel, metabuf);
+
+	if (cachedmetap)
+		*cachedmetap = metap;
+
+	return buf;
+}
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index a59ad6f..af504b3 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -139,6 +139,7 @@ _hash_readprev(IndexScanDesc scan,
 	BlockNumber blkno;
 	Relation	rel = scan->indexRelation;
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
+	bool		haveprevblk = true;
 
 	blkno = (*opaquep)->hasho_prevblkno;
 
@@ -147,15 +148,20 @@ _hash_readprev(IndexScanDesc scan,
 	 * comments in _hash_first to know the reason of retaining pin.
 	 */
 	if (*bufp == so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)
+	{
 		LockBuffer(*bufp, BUFFER_LOCK_UNLOCK);
+		haveprevblk = false;
+	}
 	else
 		_hash_relbuf(rel, *bufp);
 
 	*bufp = InvalidBuffer;
 	/* check for interrupts while we're not holding any buffer lock */
 	CHECK_FOR_INTERRUPTS();
-	if (BlockNumberIsValid(blkno))
+
+	if (haveprevblk)
 	{
+		Assert(BlockNumberIsValid(blkno));
 		*bufp = _hash_getbuf(rel, blkno, HASH_READ,
 							 LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
 		*pagep = BufferGetPage(*bufp);
@@ -215,14 +221,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 	ScanKey		cur;
 	uint32		hashkey;
 	Bucket		bucket;
-	BlockNumber blkno;
-	BlockNumber oldblkno = InvalidBuffer;
-	bool		retry = false;
 	Buffer		buf;
-	Buffer		metabuf;
 	Page		page;
 	HashPageOpaque opaque;
-	HashMetaPage metap;
 	IndexTuple	itup;
 	ItemPointer current;
 	OffsetNumber offnum;
@@ -277,59 +278,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
 
 	so->hashso_sk_hash = hashkey;
 
-	/* Read the metapage */
-	metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
-	page = BufferGetPage(metabuf);
-	metap = HashPageGetMeta(page);
+	buf = _hash_getbucketbuf_from_hashkey(rel, hashkey, HASH_READ, NULL);
+	page = BufferGetPage(buf);
+	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
 
 	/*
-	 * Loop until we get a lock on the correct target bucket.
+	 * In _hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+	 * Should be safe to use further.
 	 */
-	for (;;)
-	{
-		/*
-		 * Compute the target bucket number, and convert to block number.
-		 */
-		bucket = _hash_hashkey2bucket(hashkey,
-									  metap->hashm_maxbucket,
-									  metap->hashm_highmask,
-									  metap->hashm_lowmask);
-
-		blkno = BUCKET_TO_BLKNO(metap, bucket);
-
-		/* Release metapage lock, but keep pin. */
-		LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
-
-		/*
-		 * If the previous iteration of this loop locked what is still the
-		 * correct target bucket, we are done.  Otherwise, drop any old lock
-		 * and lock what now appears to be the correct bucket.
-		 */
-		if (retry)
-		{
-			if (oldblkno == blkno)
-				break;
-			_hash_relbuf(rel, buf);
-		}
-
-		/* Fetch the primary bucket page for the bucket */
-		buf = _hash_getbuf(rel, blkno, HASH_READ, LH_BUCKET_PAGE);
-
-		/*
-		 * Reacquire metapage lock and check that no bucket split has taken
-		 * place while we were awaiting the bucket lock.
-		 */
-		LockBuffer(metabuf, BUFFER_LOCK_SHARE);
-		oldblkno = blkno;
-		retry = true;
-	}
-
-	/* done with the metapage */
-	_hash_dropbuf(rel, metabuf);
-
-	page = BufferGetPage(buf);
-	opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-	Assert(opaque->hasho_bucket == bucket);
+	bucket = opaque->hasho_bucket;
 
 	so->hashso_bucket_buf = buf;
 
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 69a3873..213d5fc 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -60,6 +60,13 @@ typedef uint32 Bucket;
 
 typedef struct HashPageOpaqueData
 {
+	/*
+	 * If this is an ovfl page this stores previous ovfl (or bucket) blkno.
+	 * Else if this is a bucket page we use this for a special purpose. We
+	 * store hashm_maxbucket value, whenever this page is initialized or
+	 * split. So this helps us to know whether the bucket has been split after
+	 * caching the HashMetaPageData. See _hash_getbucketbuf_from_hashkey().
+	 */
 	BlockNumber hasho_prevblkno;	/* previous ovfl (or bucket) blkno */
 	BlockNumber hasho_nextblkno;	/* next ovfl blkno */
 	Bucket		hasho_bucket;	/* bucket number this pg belongs to */
@@ -305,6 +312,10 @@ extern Buffer _hash_getbuf(Relation rel, BlockNumber blkno,
 			 int access, int flags);
 extern Buffer _hash_getbuf_with_condlock_cleanup(Relation rel,
 								   BlockNumber blkno, int flags);
+extern HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool force_refresh);
+extern Buffer _hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey,
+								int access,
+								HashMetaPage *cachedmetap);
 extern Buffer _hash_getinitbuf(Relation rel, BlockNumber blkno);
 extern Buffer _hash_getnewbuf(Relation rel, BlockNumber blkno,
 				ForkNumber forkNum);
#41Michael Paquier
michael.paquier@gmail.com
In reply to: Mithun Cy (#40)
Re: Cache Hash Index meta page.

On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);

If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held. If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.

Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.

Moved to CF 2017-03.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#41)
Re: Cache Hash Index meta page.

On Wed, Feb 1, 2017 at 12:23 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);

If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held. If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.

Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.

Moved to CF 2017-03.

Committed with some changes (which I noted in the commit message).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Erik Rijkers
er@xs4all.nl
In reply to: Robert Haas (#42)
Re: Cache Hash Index meta page.

On 2017-02-07 18:41, Robert Haas wrote:

Committed with some changes (which I noted in the commit message).

This has caused a warning with gcc 6.20:

hashpage.c: In function ‘_hash_getcachedmetap’:
hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
rel->rd_amcache = cache;
~~~~~~~~~~~~~~~~^~~~~~~

which hopefully can be prevented...

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44Mithun Cy
mithun.cy@enterprisedb.com
In reply to: Erik Rijkers (#43)
1 attachment(s)
Re: Cache Hash Index meta page.

On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote:

On 2017-02-07 18:41, Robert Haas wrote:

Committed with some changes (which I noted in the commit message).

Thanks, Robert and all who have reviewed the patch and given their
valuable comments.

This has caused a warning with gcc 6.20:

hashpage.c: In function ‘_hash_getcachedmetap’:
hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
rel->rd_amcache = cache;
~~~~~~~~~~~~~~~~^~~~~~~

Yes, I also see the warning. I think the compiler is not able to see
cache is always initialized and used under condition if
(rel->rd_amcache == NULL).
I think to make the compiler happy we can initialize the cache with
NULL when it is defined.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachments:

cache_metap_compiler_warning01application/octet-stream; name=cache_metap_compiler_warning01Download
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index d52f149..9485978 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1220,7 +1220,7 @@ _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool force_refresh)
 	Assert(metabuf);
 	if (force_refresh || rel->rd_amcache == NULL)
 	{
-		char   *cache;
+		char   *cache = NULL;
 
 		/*
 		 * It's important that we don't set rd_amcache to an invalid
#45Robert Haas
robertmhaas@gmail.com
In reply to: Mithun Cy (#44)
Re: Cache Hash Index meta page.

On Tue, Feb 7, 2017 at 1:52 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers <er@xs4all.nl> wrote:

On 2017-02-07 18:41, Robert Haas wrote:

Committed with some changes (which I noted in the commit message).

Thanks, Robert and all who have reviewed the patch and given their
valuable comments.

This has caused a warning with gcc 6.20:

hashpage.c: In function ‘_hash_getcachedmetap’:
hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
rel->rd_amcache = cache;
~~~~~~~~~~~~~~~~^~~~~~~

Yes, I also see the warning. I think the compiler is not able to see
cache is always initialized and used under condition if
(rel->rd_amcache == NULL).
I think to make the compiler happy we can initialize the cache with
NULL when it is defined.

Thanks for the reports and patch. Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers