GIN improvements part2: fast scan
Hackes,
attached patch implementing "fast scan" technique for GIN. This is second
patch of GIN improvements, see the 1st one here:
/messages/by-id/CAPpHfduxv-iL7aedwPW0W5fXrWGAKfxijWM63_hZujaCRxnmFQ@mail.gmail.com
This patch allow to skip parts of posting trees when their scan is not
necessary. In particular, it solves "frequent_term & rare_term" problem of
FTS.
It introduces new interface method pre_consistent which behaves like
consistent, but:
1) allows false positives on input (check[])
2) allowed to return false positives
Some example: "frequent_term & rare_term" becomes pretty fast.
create table test as (select to_tsvector('english', 'bbb') as v from
generate_series(1,1000000));
insert into test (select to_tsvector('english', 'ddd') from
generate_series(1,10));
create index test_idx on test using gin (v);
postgres=# explain analyze select * from test where v @@
to_tsquery('english', 'bbb & ddd');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=942.75..7280.63 rows=5000 width=17)
(actual time=0.458..0.461 rows=10 loops=1)
Recheck Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
-> Bitmap Index Scan on test_idx (cost=0.00..941.50 rows=5000 width=0)
(actual time=0.449..0.449 rows=10 loops=1)
Index Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
Total runtime: 0.516 ms
(5 rows)
------
With best regards,
Alexander Korotkov.
Attachments:
gin_fast_scan.1.patch.gzapplication/x-gzip; name=gin_fast_scan.1.patch.gzDownload
� ���Q �<ko�H���_����M����y2���a�g;���"[6c����_6����l>$���b�yHb���������f���.H�{������x�������.��4�9oyl��E�?�Y0�l��9����{�>��:�`�{�Vk���z����^�q_�{/������/���o�;�r���&_aF�l��:�p��������`�������<as@����yQ��A�}���sN`������@-�{���\o���'��&��
��(��Y�
}����y����|�GV[~��4L�[��5
X�i|L�H�A\^�f O/V�)�����fj�L���j6��eJW@��b������oM�&G��c/�^���~�����1/~!i�Y��!��He���
�n~=�>k�����}o���
qk9J,�atG��db�7<�e�M@'��}�����
��U������@@N�R0N��*
����%X���,�*J� ���l� �_��+�c#!�����4a����%�\�M���������~y��$9RtJ��x%��1��+y�
?��)(F�,rB.e[3��y���l$�CT�Z:�O:�_�1V���1�0&4V"�q��Ln���W��ZC�=q�����Ear�%�����;��M�E44�{, ��Ma??�u�@��;�G�{�F�=djZ�H=�r,��b��%�}�*�c�����I��Z:h.��\�#&KX6h0-������y�YI�?<��o7��6���0�X�40���
��s?�������,��7�y ��TzGv���0��*����:m bSP�u{����Z�������@`����OK���>^y b�@��i�4t�mw�Y��#-�d&�����;7���O��oET�k<O`���sCe5@S���N������i��8~���m��w��@��X���%�0���XN����=�f��C�P�@�� ���_4�I��0�y(��"�b���&�?�FC���d���hj��T�����8�E��b5�k���D�d�n�$&�< }���[R�|��5%t�l\���
�����Bw�y��g��cFo8�{G���7QY�����B���(q��M~�f�47M���f�d1����}'�������)����}����Jt����m�
OOE�))�NW1�w|�O��-0���������.��������8���v����������������l!UG=�7jg���g%�~������<E��"2"F�h�Z�����<��j�`�������i�W��B �� �h��������8_*'�B����<��s��j2�zK�9|q�?��M8F5��[3X����!|�\��o�F!S��S����X�D�v�����"}�t���|�Bj�C���4�W���|�H�H������cbp~����?0� =0�B��@� �8��3�����Nag�hJk�xKb��q��c$��c�G�Plf�J����,:c��<#K��������8m��7s�W��$X�QeIM��u|\������=��.�A�1��I��USP�0�0���[���Ft(�BL����������oF���)\��(*��A�}���da1���v���������Z�J��\�bI���h���$U� `�'b���,y���m��
��B��L���������c��6JNF��X��-�*�A���&�:��O���&��zA���J*�� !y�� ,6�F����~fm�����L��]����4�0+����`U~T�)�#�g��f���}�$b,I����[���2��� ���K��
Kwx"����c�VQ:,jLe��,��������R�������
�f�L���hi�|��)���u0�9t��`h��rZ!y�%=A���_+o"�)���alJ�Y�\�2�����Y�eE���E�U�wi ���%wE��!P�X
�2����,��EN�%�(l������3-e�;��
����Z\�����(kGH���1�@�K�h2(?���|��g��j�������\�E�C�E���o�Q��u�b���d����L�:�t�X�W7U�j���x����������"g�����_�m���b!?�������L:r������Yh|������.�u}�g�i�h��W�&S���5m�Vg]���)����?"w������KP#gA�5�}F�RA����\�I��h9��W��W�S�sd�PX�}���������aq�29��Z)�����S��-<*����$,�����M�X��6��C�����Mq�'��Vs��|�WH�S4����b�-I��+`�7�j3�����S�����6$�j�{sh�K���)9��9/&��i�����������2�R���V�������T�+����ms��yZ^ .�AKI�:�~�:�~�:������
�G>����e�O�&��=�,_ UaI�c�r0�Id���N�!����Q�\�C�����l��?������:#�*����c(\<#�IO���������d���� \e�t���4�Q��$K����cR�J}��!P��5O�9�_X
�t�@�)��7&di��'��9�w�H�+��u.wO������n��9Q9K�������j�����������I;(r����1�Y?Sg��N��{*C�m�l��*D�NQ�Jx,������/0����Yr���X�9-�p��������S��:YS'o��I/���H���FU�;T��M��*��8f�6�p��f���o}�xs�RK����uN3rE�E�e-�����
P��U�&��D=[�3Af��(u����� �Q&K�'���0���T?�!<g��y:�M���=tE=���.���3�XA�Y�n�n@F�i�T[��n[f��;�P)�������7���aE�'T7��F#�:��q�����z���E�nh��(�]�s7I7��`T���n��+��K�0m:��x�Tx/�)���Tuy*,kk�����U.@��C����nfXQNy���2k��6���g1�(UT,L�1�r>_�u��5*<:���@����
�Y7y!�E�.�l[�rR����Gs2@��P!&UH �kb`sUw]��V���m>�����-���l�S���5��*1��������U<�"j�f��9��P�����G��ymP��N�S�W�����DV�5��"7���}��a��Y��V�jT�7�h�V���/��gQ$�TVg]�v��A(��1~+������l� ZrP��E��}������)����'r�S)!�����2�H�~Y�|�h��5�3�hM��h}�a�@�J�:�j�:W�W�+O8\ �qiP�7�����v��q�a�k��h�O8����>�:"�+e%�F�b�&�����R���f��� \�+��+P���E!�B� -����GdSJ5�����u�����`���j{���<.�v�a;z�U�d�lL�x�i�6)�x���S:�j]�����8�_^LN���������#��d����-��\]�/n����8�s;~-O���t�oeM{U����c5F����h���j�:�r�=�!�V�y��:�k4 ����FLXj�8KG�����������g37��0�B�0�If�;���� ���lY
��A�:���}��}���� �+��`������y�K��:����������V�����������q�I�����_*�n�c���3����k�����I��q�0Z$eNh�uT.ri �����x,������{����L��0v�7;�����d,znMq�|1+/r�x�\�!����q<��Ao���2V�n\G���\�v|
�9=� �\"����3����y��j�%�jm0�O���M�')4�y'�p/��[>�s ��\��(�]�|I��U%or.`����cU��.O����b�o?b{����h���P�6��J��P����� ����$� ��(�4�:�a�u4����|��Q�>�%�����K��T�6!H�7�;��v�B�j�3C����;�}O����,��i�L�7�K��H�u��Q,������~hbxN��� ��;%��GX��L-^na���?��5�d��-Z"@6iI��v:��j�:�7���j�D�MK$XQK�0H�.�qN�}`4m��@�]H�����o<9��Vv^��i,Zzs7I��.�i0�'<��GX5�oK�=�%.���ey�o���<�(�����&��7�����[��<mt������x��h�E��w�5����T5��<��&!�� �zK�.$���PI�>���Q���������"B�h�Z���\�,���c���8Vs��84�P��
?�|��Z[� --�~�>�G<J ���w9� �R���n�})T�<-eB.���#:�$��P�x�Xz ��t�g���6$���� (G�))n�&��f�s���Z���M����T�,+n%XE�7IJ
'���� ;� v�G��R�0{� �]���������|k�D�gCD���e����E��fU������_��!d�ro]��R[��!���9L������/��R������g������B���t$Q�����<6'��b�*$f ����$��CB\��#������z
�����ww����
�����= s�6����*aQ���G���w�i������B��/G �W�dq�4�0f���Md�Q~7#��4\�8�5/�%a������7��2�h4P���IV�5X,������ Xo��+O�aow�qIf��&� �>=W? ofg
*�B�H���(^�x�D��NhV^
���=�C�$>,�\��V�n�O��E��>[i��{<�)�MW��"��9u��������9��Wl zRy��q�XJ-�� ��+�qA�^Z��[�l��G3�� ���}��u��2Wy�`�dlnw�*2�eX���*|j����1|��O�
H�������(D1�������'
}3W��^=?S|U��� �V��-��]�T�����*<!���y�o�5D�:x�z�zQ�@6���;M9��������"i��$�X��^��"g��������h���q�'n�5�7c|_�>�����_'�����R1�v-e��]f�D#�������Ls�YpW���������FBeoF�T���AB!->�]���M���!\kM�i�z�blW��s������T��!�E��d�����7��]���11�G0b/Z���MW�M!e)���,H���� [��-��t��a��Uw5_mW<OQ�Z�����Zt���;�>�]+3ju��sJ�WI�Jj
-����Z�D�;����O��:�+8z��N�v���t%;
����WW���:�HG[���vX��X�.�Ej�fM5���FD�����!������d�7��*���'����S��^�l��C�}���l�IME�'I��w"T_���c���w���VZ�|�\��.���.�y�6H4u���z��� l�fL?
�������C�~�E)F(����Og>�~o�u�*��"�rdP�(������k�'��%E,!��OB���A�����'�'��5a���������]�$m4����x�L�'~� B���S.��tu���P]�=b��LFA����n�������lT��/�����?�|�']���xs���%;�6�pW�!aM���g���5<3C������ � �j]f^g������1.c�b%2/��3�����Bs+��f�
V��Sa�Qa�Oa�Ma�Ka5��uxt�X����^��F%K��6�k.����$�Dt�_��B���-��_�K�~*~���_����~~���\��
�S��.L����:*RKu�������T�����������X�����&>t��S�j9n4�n����.�J~�XQ���G�wNO|t��? �����d?���!�$������J7��W��l��P��v��u�z�h�r�|u:k`��l_| �`
�����G�~
���j�Lm��i���~����7KTy�!M�>��j���^g���ch�U6�,�0W��,+?�X�G<R#�w������+�|�N��MM���l3g��wJFF�
F��(62V��Q%������x�'G`��Q@c��66�R4��Q����h��sm�����������9������Dy�(����nd��K6P��<.��<R :���Ho� {qc��� +�_��b On Sat, Jun 15, 2013 at 2:55 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
attached patch implementing "fast scan" technique for GIN. This is second
patch of GIN improvements, see the 1st one here:/messages/by-id/CAPpHfduxv-iL7aedwPW0W5fXrWGAKfxijWM63_hZujaCRxnmFQ@mail.gmail.com
This patch allow to skip parts of posting trees when their scan is not
necessary. In particular, it solves "frequent_term & rare_term" problem of
FTS.
It introduces new interface method pre_consistent which behaves like
consistent, but:
1) allows false positives on input (check[])
2) allowed to return false positivesSome example: "frequent_term & rare_term" becomes pretty fast.
create table test as (select to_tsvector('english', 'bbb') as v from
generate_series(1,1000000));
insert into test (select to_tsvector('english', 'ddd') from
generate_series(1,10));
create index test_idx on test using gin (v);postgres=# explain analyze select * from test where v @@
to_tsquery('english', 'bbb & ddd');
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=942.75..7280.63 rows=5000 width=17)
(actual time=0.458..0.461 rows=10 loops=1)
Recheck Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
-> Bitmap Index Scan on test_idx (cost=0.00..941.50 rows=5000
width=0) (actual time=0.449..0.449 rows=10 loops=1)
Index Cond: (v @@ '''bbb'' & ''ddd'''::tsquery)
Total runtime: 0.516 ms
(5 rows)
Attached version of patch has some refactoring and bug fixes.
------
With best regards,
Alexander Korotkov.
Attachments:
On 17.06.2013 15:55, Alexander Korotkov wrote:
On Sat, Jun 15, 2013 at 2:55 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:
attached patch implementing "fast scan" technique for GIN. This is second
patch of GIN improvements, see the 1st one here:/messages/by-id/CAPpHfduxv-iL7aedwPW0W5fXrWGAKfxijWM63_hZujaCRxnmFQ@mail.gmail.com
This patch allow to skip parts of posting trees when their scan is not
necessary. In particular, it solves "frequent_term& rare_term" problem of
FTS.
It introduces new interface method pre_consistent which behaves like
consistent, but:
1) allows false positives on input (check[])
2) allowed to return false positivesSome example: "frequent_term& rare_term" becomes pretty fast.
create table test as (select to_tsvector('english', 'bbb') as v from
generate_series(1,1000000));
insert into test (select to_tsvector('english', 'ddd') from
generate_series(1,10));
create index test_idx on test using gin (v);postgres=# explain analyze select * from test where v @@
to_tsquery('english', 'bbb& ddd');
QUERY PLAN-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=942.75..7280.63 rows=5000 width=17)
(actual time=0.458..0.461 rows=10 loops=1)
Recheck Cond: (v @@ '''bbb''& ''ddd'''::tsquery)
-> Bitmap Index Scan on test_idx (cost=0.00..941.50 rows=5000
width=0) (actual time=0.449..0.449 rows=10 loops=1)
Index Cond: (v @@ '''bbb''& ''ddd'''::tsquery)
Total runtime: 0.516 ms
(5 rows)Attached version of patch has some refactoring and bug fixes.
Good timing, I just started looking at this.
I think you'll need to explain how this works. There are no docs, and
almost no comments.
(and this shows how poorly I understand this, but) Why does this require
the "additional information" patch? What extra information do you store
on-disk, in the additional information?
The pre-consistent method is like the consistent method, but it allows
false positives. I think that's because during the scan, before having
scanned for all the keys, the gin AM doesn't yet know if the tuple
contains all of the keys. So it passes the keys it doesn't yet know
about as 'true' to pre-consistent. Could that be generalized, to pass a
tri-state instead of a boolean for each key to the pre-consistent
method? For each key, you would pass "true", "false", or "don't know". I
think you could then also speed up queries like "!english & bbb".
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 17, 2013 at 5:09 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 17.06.2013 15:55, Alexander Korotkov wrote:
On Sat, Jun 15, 2013 at 2:55 AM, Alexander Korotkov<aekorotkov@gmail.com>
**wrote:attached patch implementing "fast scan" technique for GIN. This is second
patch of GIN improvements, see the 1st one here:
http://www.postgresql.org/**message-id/CAPpHfduxv-**
iL7aedwPW0W5fXrWGAKfxijWM63_**hZujaCRxnmFQ@mail.gmail.com</messages/by-id/CAPpHfduxv-iL7aedwPW0W5fXrWGAKfxijWM63_hZujaCRxnmFQ@mail.gmail.com>
This patch allow to skip parts of posting trees when their scan is not
necessary. In particular, it solves "frequent_term& rare_term" problem
ofFTS.
It introduces new interface method pre_consistent which behaves like
consistent, but:
1) allows false positives on input (check[])
2) allowed to return false positivesSome example: "frequent_term& rare_term" becomes pretty fast.
create table test as (select to_tsvector('english', 'bbb') as v from
generate_series(1,1000000));
insert into test (select to_tsvector('english', 'ddd') from
generate_series(1,10));
create index test_idx on test using gin (v);postgres=# explain analyze select * from test where v @@
to_tsquery('english', 'bbb& ddd');QUERY PLAN
------------------------------**------------------------------**
------------------------------**-----------------------------
Bitmap Heap Scan on test (cost=942.75..7280.63 rows=5000 width=17)
(actual time=0.458..0.461 rows=10 loops=1)
Recheck Cond: (v @@ '''bbb''& ''ddd'''::tsquery)
-> Bitmap Index Scan on test_idx (cost=0.00..941.50 rows=5000
width=0) (actual time=0.449..0.449 rows=10 loops=1)
Index Cond: (v @@ '''bbb''& ''ddd'''::tsquery)
Total runtime: 0.516 ms
(5 rows)Attached version of patch has some refactoring and bug fixes.
Good timing, I just started looking at this.
I think you'll need to explain how this works. There are no docs, and
almost no comments.
Sorry for that. I'll post patch with docs and comments in couple days.
(and this shows how poorly I understand this, but) Why does this require
the "additional information" patch?
In principle, it doesn't require "additional information" patch. Same
optimization could be done without "additional information". The reason why
it requires "additional information" patch is that otherwise I have to
implement and maintain 2 versions of "fast scan": with and without
"additional information".
What extra information do you store on-disk, in the additional information?
It depends on opclass. In regex patch it is part of graph associated with
trigram. In fulltext search it is word positions. In array similarity
search it could be length of array for better estimation of similarity (not
implemented yet). So it's anything which is stored with each ItemPointer
and is useful for consistent or index driven sorting (see patch #3).
The pre-consistent method is like the consistent method, but it allows
false positives. I think that's because during the scan, before having
scanned for all the keys, the gin AM doesn't yet know if the tuple contains
all of the keys. So it passes the keys it doesn't yet know about as 'true'
to pre-consistent.
Could that be generalized, to pass a tri-state instead of a boolean for
each key to the pre-consistent method? For each key, you would pass "true",
"false", or "don't know". I think you could then also speed up queries like
"!english & bbb".
I would like to illustrate that on example. Imagine you have fulltext query
"rare_term & frequent_term". Frequent term has large posting tree while
rare term has only small posting list containing iptr1, iptr2 and iptr3. At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr < iptr2. pre_consistent returns false. So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.
And yes, it could be generalized to tri-state instead of a boolean. I had
that idea first. But I found superiority of exact "true" quite narrow.
Let's see it on the example. If you have "!term1 & term2" there are few
cases:
1) term1 is rare, term2 is frequent => you can exclude only some entries
from posting tree of term2, performance benefit will be negligible
2) term1 is frequent, term2 is rare => you should actually scan term2
posting list and then check term1 posting tree like in example about
"rare_term & frequent_term", so there is no need of tri-state to handle
this situation
3) term1 is frequent, term2 is frequent => this case probably could be
optimized by tri-state. But in order to actully skip some parts of term2
posting tree you need item pointers in term1 to go sequential. It seems for
me to be very narrow case.
4) term1 is rare, term2 is rare => don't need optimization
BTW, version of patch with some bug fixes is attached.
------
With best regards,
Alexander Korotkov.
Attachments:
gin_fast_scan.3.patch.gzapplication/x-gzip; name=gin_fast_scan.3.patch.gzDownload
� ���Q �<ks�8���_�����M���>O��q��k�g;S��J�h�K�����f���� IIt�{7W���$��F����� �N���,��w�&������(8�|����,���m�p����v�g������Ep�v:�~�Y�w������nO�i��=����������Z}��]&|�%�(x����7��7a�
[0j��WD��uFC���?�t�0g�cl���}/�)��<����#�l��s;�O��:j���=$�j3
�6)~���W�����9���(`/�����n��/w��&����#�2l��v��e��}$�� .�O�)�.V�[����{<��0��W���'��[���/&.��?��%�XT(8���5�80.��Qp���#�b�g������X�T&��Tw�����V>=���
z�-�B<ZN��&�0�#�^2����4Z�� (p� ����m�����s����5A@N�Rp���eFO��`]c��l^�iF3|�v[����y=�$���=����t�[�D���)�S��W �i��_�I�����,YI1i�&��������#a��K��L����HO��p�JRK��H��J:�J���0����L��0I3������*{R��'n����CaQ��g���{��,d�
���.�9�P��/`G�G��d�3�Hv��X��LM#�:R���={n>����[��1�y��`�$�h-]4���2������
�Y�����y�7��o<���[RKb���C��
��"���O��2�����%|q���oJ�t�J���?Un%��9
�90O[��������a�o�;�a0��:�6�@@8{\��O�����v
�5��8��{�7�n \��\��L�2��������?"�Q�M�#�y�p�N}/RV$1�l�Da9���i�%����������v���h�o+o�^�W�4��&�h�/�M0�`��0 �qf]w��t�����J`���c:���<�9��"��
>���-�1X���9a���H�Q��K8t�dy��)f�p��;]=_�Nn�3F��2w��4>O/V�y��i��� �l�Sd��CZ!.>�}K�*o��0���# �1o.Pr^���[V4��W<]�u7���:��@�8
A�v���t$n��Q���$�3����a�4/����a���' �[���W;7{��4H�I�<��8�P��G^�Q�����s�H��'�-�;���G�����x^�0��e�x ����-8���7���g���L��_\����#�j<t��]�������1�X�)��.�|l3Xr pY�v��k�k�����������p@����
���<���i�&���� ��{���X����m.� ���l1���s�l�6�2������������!���\Y�An������A�I���tzB�����:�n���o���5�B�H�*�U�G���K����0<L�8F&g�o�O�C��.�z4C� �� �>��0�a����7�Qq��k��$n-����;z����~n��(���X���^����CWG�
-�L/��n'�A��5�`u�u��>R��Fc��3/%����e����B�C<Y�ysdd�S���G����a������cg��Y����{��F��QM�] (BB����6|f�C�D�B1!?�����dn
x�OrO�#�P���OVtl��^����zkN��y���p��r�
1�\u.���7'�U�S��SL
{@bDu���q���T�e�� }�U
���-J�5��"kT��/��c���c2.~������y�*�����hX>���bE-C����u0����SfB�� Doy���"S0Are�H�h\�0/�c�^����3t���p0t���v�6/<*�����k�6*^�d��lG8g#NGhS����%�����l���Q!���������p��W����UCPv�_���AU��;a��V��I��D�n�|nzH�*�j���DX6����LL<���t��KbV���0���
#�O�<�a����(��8�?�S*�tP���FJ�}R���X�9/x� �5������l��"���p��)�'{5��\��`dR���q�g������R��jN!���v����\���a�;��,�L������E��g���L��Za�3�e�����~��q:��R�U��*J���~�O:F��(�)`�R^-�2B�
gw�;�D�d8.�@��U%��BH^aw�C���{C\?�����op��{���2���y��V������ '�p|N�g�L"�� �����Q�@0<��/p�u��������?i>�J����A�L��
n�����Q����_�<�fc8"��KA�\w�|���q�� �t�(�n���h�@�����c^D������$1���v� !9���b,&F=�E��J�&�����)�S��'��C5�O� "�[32j�i��5`����l6a��PC���Y.e��f��__�-NZX�V-
����hc)��`4d��1���F��u35[u[�����\�E�C�E��Q��+j���'�0��\���k���W,��M�������]5\{3�^g�*�Ft����O������_�7X���ibO�O����>a ay:EQ_RU����������D���S�M�J�_
j��\�E�r��J��y������5!����FH��k���x��v3�[��&��D�Q5���"jse<���"���� S���S�9_����FX��L���VF��9ey�"�w q�gJY�y �3���>�m6��6{-
@�������.^���)0^!��i�gT�JD�4������Ws�7h�D�}����k�
I��4��9�QFt�3���nN��b�-Y���NB��_��)e�k�y+���gV�2���rk���w�iZ^ .�!���i���i��:i��i�?#�H��%� K�rr� �f11��Reg�\���;���������R�����"R��y����=�:m���P+�a����5j=��CmRu[�v�>]QA@�/��y�@�)<RU� :��fqB��#A|�Z���6��f�o>���ysG+�%*gQ���������8��
77��L�A��M�a��c�����i�@��R���Y�M�B�����U�UO�I����$��Yr��1Y�9,�p�����8����N�a���]��B���k�r�;���|����a�QY���k��k#��kv��������i����h�d��+���s��w���uz|vY��Cp��-��_���]}�� �1'L�&�e���������p�Z��|���n��+�Y8�t� ����������
*��O��U���I�l�H9����
"���m$w�C����,�O�n&���FFy�
��
��|��k�"J=���5���0�"XW~Z�� o��>���8����{]z�W7��(�<eX{����L4���'lC�������5?G�>ew�F�+%9�"�S)�sO�M^v���mkT��=(���J��
>�B�H��#k��Q��/F��k
��5������&�\�Z���?*�k�5�S�Rp���6�g�HMG7���\]{��>���k���ur�
�b](
~a��^Sl*|qsW�x��2W]�+aUU���xXE���H��<���&y�-eu6�ej��B�p�����6���T��{!�b������)�n�����Od'RB�������U'��y������X�;����Q����e��$+g��=��������U&V=�_X�����B$7p�dT����D��&Wg�V8*��\
��b�j��\lv�;t�M����v�����@���v�/�%7m5M�����T���0�ZET�Bxu��A-\��'D�
���N5$I�� H���FT�uT��V�vsmRA��a,���l�%'Tg<���?��7��'7��/&�Wo������������wd���`<����v{<�����w���w�(���_���"9� ������sr�a��� +���'����B��%��l��h��Mq��&��K�t�,C�eq"N������
OE���b(��U��%����o�I6��[��[�)z�g������p�EK�&-���n�������[�D���%��%��f*)dl[��0Zl��Y�u�<�J7�c7?���k��x���4e���n�y��6� � ��i�h�j��yDB���5[&�My�-�m(��pyu:9yq}~}szq��OZ� �{��b�LB�,'~�|l�4�d��Oy��O�h]���$U����)�(���o�.���
�4����*��N�"����"�z�X��~8
!��U&�1��wq��B%�QA���O�Q�-m���X�����=��-��2�7�
�&7�cQpD'���j��|��
9��
���!=��� KA9�����V�lR\���^��u<���z�L%���V�L��
�?�<-t�����
oo�z��B����W�W���m&t%��+�=�OP������#�**i
������� �\�?��9�jvr?�`�]�R�,���`�
?@;w�_J����B�
"���GN������
�:]�@��J*��#s`���7�KK��W� D����x!�JD�6#�=s|rw��7o���#Ci�e�n�A��� (IH�R�3,xT��w���U�x �Z��]��
�f�]��M�&2>�_�
�Hj#
� �����|������L�mSs�c���
wcLt �c�XFE���� X�+�M�aow �qIf��&� ��Fa�OUkE��AU�,x2������<p�LI O|�a�f+</��"�l�)��p��@w�M��ig
��l�����/!��D`Kq%��J�y�a�V�}a��W7�����<U�6�:>�N]���;�a�*QiW��`��o��:�3�<X_k�XS�%�w��������[ p�2���7
B�4 ..��i#h���M3�3���"��@Z�gG�K��#�{��rOn�i���w
�p��zQ� �Dr��rX~��?��E�@�I4
���[x������^�����S���~I�8���l�G�>�_�>�����_&�����B1����Hm4.����7g����|e���4�U�m�i�g�.z���0Z�i[�>�Ni��K�'�K���u�8��ZDSb�^�[�6�+{�Vo{fo��9:�7�H��l�y�L��i�v��x=���
��r�*��*K!E���$I�:�`dl�Qn��Agy�V6��&��F���|���p>����W�Q������l�ee��R
�V���cj>S���]����[u&8����<r{�����ZWt��H�Lx�����`���ro�sV|Z���o�"5�]�4��-^w9��j����wVo��k���|W����|�
��;tFfe�s�Q_���PD�H��N
�J�0�&���W��N��hv{��S�A�U�7�r� -� �0����O�J8~X����xA���"���R0��4�4�i��/(��0���l��u��}��x�xv >��(y.���������i�n���a�W8-Ele�Q�����'W�d����\�_�����:������c��������!��{��U�4k4:l���5���'~wB����\&��j#�YS]��]A|��x��^r��S�%��,�����Q^��>���.����9������=L�
��K�Zd�%���Skxf:��7��wd0 �M�y]�o�^���4�+�yQ��H�n����E��H3+���VE�S�����STfST�RT����:<�P,BVT
X��� ��%�X�
r.M��M%�D4�_�Kp����-��_�K�~*~���_����~~�U�/���7 �*����:�6YG�s�n���[����6����?V�z�8f�6��b�z>rX�k���78������5]B��ucF���pp�Eo���G�C`��I�����9 ��Kw[���H �fX����@<A
�K����Xo�_GCwh��������J�����5�v\��\��_�Nm��&S�%�vp�;�A��
���:�%��<��&ah��
5R�D@�;�:�12^�+o���\��p��xQbu��X�X�m��7������.�E��3MV�d��C�������)������M����V_��B!��D����%����BvH���X��?~����%���Jy��b���6�}��C���8�y������Q� )����!n��E�w�^u�#����������]?�F=�c>k4j����hsP i��T(�XHH����ga��~��B;�`�����#�����]�����*|����a�����6��V���Qo`�^���XdS�jU�:�Z�E_vmNM5u.�:���J+w���������8���7s�zD���\�D�q����e����������|;�5�G���-�����B�
����x�g 8��^�z��(RV-LJ\����e+!�r��%c��w@A������8��T@bv5>���j���Z4�wS���!��{��S�e�
��(��&>�Q�d�8��8~r$8���b.FIi�`5WS��#\|�:� �P�qAY:����<��O7�4]~p�+���J�&�j����Jj@��w������X-�m�'��M�����w�Z8<W��x'L>r���_
�&{����:��z����� �OU�z8tU,�`ZY��Z %��>c0��4���N�+ ��1��sP42Fgct~�E�~�{3��X�6p6�����6("O�gD���"�F�dO�C��\g�f<���2����hl�}:����h*s On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext query
"rare_term& frequent_term". Frequent term has large posting tree while
rare term has only small posting list containing iptr1, iptr2 and iptr3. At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr< iptr2. pre_consistent returns false. So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.
Thanks, now I understand the rare-term & frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you
need the new pre-consistent function for this.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree whilerare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr< iptr2. pre_consistent returns false.
So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term & frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you need
the new pre-consistent function for this.
In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1 & term_2 & ... & term_n"
query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.
------
With best regards,
Alexander Korotkov.
On Wed, Jun 19, 2013 at 12:30 PM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree whilerare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr< iptr2. pre_consistent returns false.
So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term & frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you need
the new pre-consistent function for this.In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1 & term_2 & ... & term_n"
query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times.
To be precise you don't need the first check argument I listed. So, it's
2^(n-1)-1 calls.
------
With best regards,
Alexander Korotkov.
On 19.06.2013 11:30, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree whilerare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know that
rare term is not present for iptr< iptr2. pre_consistent returns false.
So
we can start scanning frequent term posting tree from iptr1. Similarly we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term& frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you need
the new pre-consistent function for this.In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1& term_2& ...& term_n"
query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.
Hmm. So how does that work with the pre-consistent function? Don't you
need to call that 2^(n-1)-1 times as well?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 19.06.2013 11:30, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree whilerare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know
that
rare term is not present for iptr< iptr2. pre_consistent returns
false.
So
we can start scanning frequent term posting tree from iptr1. Similarly
we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term& frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you
need
the new pre-consistent function for this.In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1& term_2& ...&
term_n"query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.Hmm. So how does that work with the pre-consistent function? Don't you
need to call that 2^(n-1)-1 times as well?
I call pre-consistent once with [false, true, true, ..., true].
Pre-consistent knows that each true passed to it could be false positive.
So, if it returns false it guarantees that consistent will be false for all
possible combinations.
------
With best regards,
Alexander Korotkov.
On 19.06.2013 11:56, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:On 19.06.2013 11:30, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree whilerare term has only small posting list containing iptr1, iptr2 and iptr3.
At
first we get iptr1 from posting list of rare term, then we would like to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know
that
rare term is not present for iptr< iptr2. pre_consistent returns
false.
So
we can start scanning frequent term posting tree from iptr1. Similarly
we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term& frequent-term problem. Couldn't
you do that with the existing consistent function? I don't see why you
need
the new pre-consistent function for this.In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1& term_2& ...&
term_n"query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.Hmm. So how does that work with the pre-consistent function? Don't you
need to call that 2^(n-1)-1 times as well?I call pre-consistent once with [false, true, true, ..., true].
Pre-consistent knows that each true passed to it could be false positive.
So, if it returns false it guarantees that consistent will be false for all
possible combinations.
Ok, I see.
I spent some time pondering this. I'd like to find a way to do something
about this without requiring another user-defined function. A couple of
observations:
1. The profile of that rare-term & frequent-term quest, without any
patch, looks like this:
28,55% postgres ginCompareItemPointers
19,36% postgres keyGetItem
15,20% postgres scanGetItem
7,75% postgres checkcondition_gin
6,25% postgres gin_tsquery_consistent
4,34% postgres TS_execute
3,85% postgres callConsistentFn
3,64% postgres FunctionCall8Coll
3,19% postgres check_stack_depth
2,60% postgres entryGetNextItem
1,35% postgres entryGetItem
1,25% postgres MemoryContextReset
1,12% postgres MemoryContextSwitchTo
0,31% libc-2.17.so __memcpy_ssse3_back
0,24% postgres collectMatchesForHeapRow
I was quite surprised by seeing ginCompareItemPointers at the top. It
turns out that it's relatively expensive to do comparisons in the format
we keep item pointers, packed in 6 bytes, in 3 int16s. I hacked together
a patch to convert ItemPointers into uint64s, when dealing with them in
memory. That helped quite a bit.
Another important thing in the above profile is that calling
consistent-function is taking up a lot of resources. And in the example
test case you gave, it's called with the same arguments every time.
Caching the result of consistent-function would be a big win.
I wrote a quick patch to do that caching, and together with the
itempointer hack, I was able to halve the runtime of that test case.
That's impressive, we probably should pursue that low-hanging fruit, but
it's still slower than your "fast scan" patch by a factor of 100x. So
clearly we do need an algorithmic improvement here, along the lines of
your patch, or something similar.
2. There's one trick we could do even without the pre-consistent
function, that would help the particular test case you gave. Suppose
that you have two search terms A and B. If you have just called
consistent on a row that matched term A, but not term B, you can skip
any subsequent rows in the scan that match A but not B. That means that
you can skip over to the next row that matches B. This is essentially
the same thing you do with the pre-consistent function, it's just a
degenerate case of it. That helps as long as the search contains only
one frequent term, but if it contains multiple, then you have to still
stop at every row that matches more than one of the frequent terms.
3. I'm still not totally convinced that we shouldn't just build the
"truth table" by calling the regular consistent function with all the
combinations of matching keys, as discussed above. I think that would
work pretty well in practice, for queries with up to 5-10 frequent
terms. Per the profiling, it probably would make sense to pre-compute
such a table anyway, to avoid calling consistent repeatedly.
4. If we do go with a new function, I'd like to just call it
"consistent" (or consistent2 or something, to keep it separate form the
old consistent function), and pass it a tri-state input for each search
term. It might not be any different for the full-text search
implementation, or any of the other ones for that matter, but I think it
would be a more understandable API.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 11:43 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 19.06.2013 11:56, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 12:49 PM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:On 19.06.2013 11:30, Alexander Korotkov wrote:
On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas<
hlinnakangas@vmware.com> wrote:
On 18.06.2013 23:59, Alexander Korotkov wrote:
I would like to illustrate that on example. Imagine you have fulltext
query
"rare_term& frequent_term". Frequent term has large posting tree
whilerare term has only small posting list containing iptr1, iptr2 and
iptr3.
At
first we get iptr1 from posting list of rare term, then we would like
to
check whether we have to scan part of frequent term posting tree where
iptr
< iptr1. So we call pre_consistent([false, true]), because we know
that
rare term is not present for iptr< iptr2. pre_consistent returns
false.
So
we can start scanning frequent term posting tree from iptr1. Similarly
we
can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
maximum possible pointer.Thanks, now I understand the rare-term& frequent-term problem.
Couldn't
you do that with the existing consistent function? I don't see why you
need
the new pre-consistent function for this.In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1& term_2& ...&
term_n"query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query
specific
(i.e. in opclass) it's typically easy to calculate exactly what we need
in
single pass. That's why I introduced pre_consistent.Hmm. So how does that work with the pre-consistent function? Don't you
need to call that 2^(n-1)-1 times as well?I call pre-consistent once with [false, true, true, ..., true].
Pre-consistent knows that each true passed to it could be false positive.
So, if it returns false it guarantees that consistent will be false for
all
possible combinations.Ok, I see.
I spent some time pondering this. I'd like to find a way to do something
about this without requiring another user-defined function. A couple of
observations:1. The profile of that rare-term & frequent-term quest, without any patch,
looks like this:28,55% postgres ginCompareItemPointers
19,36% postgres keyGetItem
15,20% postgres scanGetItem
7,75% postgres checkcondition_gin
6,25% postgres gin_tsquery_consistent
4,34% postgres TS_execute
3,85% postgres callConsistentFn
3,64% postgres FunctionCall8Coll
3,19% postgres check_stack_depth
2,60% postgres entryGetNextItem
1,35% postgres entryGetItem
1,25% postgres MemoryContextReset
1,12% postgres MemoryContextSwitchTo
0,31% libc-2.17.so __memcpy_ssse3_back
0,24% postgres collectMatchesForHeapRowI was quite surprised by seeing ginCompareItemPointers at the top. It
turns out that it's relatively expensive to do comparisons in the format we
keep item pointers, packed in 6 bytes, in 3 int16s. I hacked together a
patch to convert ItemPointers into uint64s, when dealing with them in
memory. That helped quite a bit.Another important thing in the above profile is that calling
consistent-function is taking up a lot of resources. And in the example
test case you gave, it's called with the same arguments every time. Caching
the result of consistent-function would be a big win.I wrote a quick patch to do that caching, and together with the
itempointer hack, I was able to halve the runtime of that test case. That's
impressive, we probably should pursue that low-hanging fruit, but it's
still slower than your "fast scan" patch by a factor of 100x. So clearly we
do need an algorithmic improvement here, along the lines of your patch, or
something similar.
For sure, many advantages can be achieved without "fast scan". For example,
sphinx is known to be fast, but it straightforwardly scan each posting list
just like GIN now.
2. There's one trick we could do even without the pre-consistent function,
that would help the particular test case you gave. Suppose that you have
two search terms A and B. If you have just called consistent on a row that
matched term A, but not term B, you can skip any subsequent rows in the
scan that match A but not B. That means that you can skip over to the next
row that matches B. This is essentially the same thing you do with the
pre-consistent function, it's just a degenerate case of it. That helps as
long as the search contains only one frequent term, but if it contains
multiple, then you have to still stop at every row that matches more than
one of the frequent terms.
Yes, two terms case is confluent and there is no direct need of
preConsistent.
3. I'm still not totally convinced that we shouldn't just build the "truth
table" by calling the regular consistent function with all the combinations
of matching keys, as discussed above. I think that would work pretty well
in practice, for queries with up to 5-10 frequent terms. Per the profiling,
it probably would make sense to pre-compute such a table anyway, to avoid
calling consistent repeatedly.
Why do you mention 5-10 _frequent_ items? If we have 5-10 frequent items
and 20 rare items we would have to create "truth table" of frequent items
for each new combination of rare items. For 20 rare items truth
combinations can be unique for each item pointer, in that case you would
have to calculate small "truth table" of frequent items for each item
pointers. And then it can appear to be not so small. I mean it seems to me
that we should take into account both "frequent" and "rare" items when
talking about "truth table".
4. If we do go with a new function, I'd like to just call it "consistent"
(or consistent2 or something, to keep it separate form the old consistent
function), and pass it a tri-state input for each search term. It might not
be any different for the full-text search implementation, or any of the
other ones for that matter, but I think it would be a more understandable
API.
Understandable API makes sense. But for now, I can't see even potentional
usage of third state (exact false). Also, with preConsistent interface "as
is" in some cases we can use old consistent method as both consistent and
preConsistent when it implements monotonous boolean function. For example,
it's consistent function for opclasses of arrays.
Revised version of patch is attaches with more comments and docs.
------
With best regards,
Alexander Korotkov.
Attachments:
On Tue, Jun 25, 2013 at 2:20 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
4. If we do go with a new function, I'd like to just call it "consistent"
(or consistent2 or something, to keep it separate form the old consistent
function), and pass it a tri-state input for each search term. It might not
be any different for the full-text search implementation, or any of the
other ones for that matter, but I think it would be a more understandable
API.Understandable API makes sense. But for now, I can't see even potentional
usage of third state (exact false).
Typo here. I meant "exact true".
Also, with preConsistent interface "as is" in some cases we can use old
consistent method as both consistent and preConsistent when it implements
monotonous boolean function. For example, it's consistent function for
opclasses of arrays.
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.
------
With best regards,
Alexander Korotkov.
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.
I'm going to mark this as "returned with feedback". For the next
version, I'd like to see the API changed per above. Also, I'd like us to
do something about the tidbitmap overhead, as a separate patch before
this, so that we can assess the actual benefit of this patch. And a new
test case that demonstrates the I/O benefits.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
this is a follow-up to the message I posted to the thread about
additional info in GIN.
I've applied both ginaddinfo.7.patch and gin_fast_scan.4.patch on commit
b8fd1a09, but I'm observing a lot of failures like this:
STATEMENT: SELECT id FROM messages WHERE body_tsvector @@
plainto_tsquery('english', 'email free') LIMIT 100
ERROR: buffer 238068 is not owned by resource owner Portal
There's a GIN index on messages(body_tsvector). I haven't dug into why
it fails like this.
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.
Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.
------
With best regards,
Alexander Korotkov.
Attachments:
gin-fast-scan.6.patch.gzapplication/x-gzip; name=gin-fast-scan.6.patch.gzDownload
� E�R �=�s�6�?+ro&�Y������qr~M������t<���%RGRI|���ow� ���&w������ ��~�������������`�M�d0=�&�7~���I$ec"��a����s)�T�V���v��T~#�Lz�F����t8y����2����}������Er�E�`�Fz�w�������BP��xA����AG`;��� ���#!~����`��3!��Rx�Hz�{1'^"�"R�a��1�����lL?D$�g�B� ����0������7_A����@W���@z���APp��_&@���/6�.�&�_5h�����p.f�*���b��cy�� M��]E�G�R ��:^-?y�e$�\|�"^E��K`�_�QOI!��{� ��+�`�<��/q.��DP��IW>�"?U ��oq9�gP�)Tr\&�!�GjP_���l��l��H,���l��ZyN�^�f3�Q��
�wzv������w5�������Y�B��cyL/�9�{���K[�.������G*"e�������I-� �V��
|�-��z��&Uha4G�=j��2!��&��)(P� �|���W�����9
+v���3
�����~�s��1q l��c`��5�eX�\2V��{��qy^a �����~"�/��A�`��O�����+xfh6�=�i���*l��p�d>���T��������K/��f��?�AP"\�9S���N����T�
����f��*�;5��s��M����wu�xT� ��-~����^7v9Rk~
h�7xj��`v��a��
��d���E��l��X�0H<?��HV��$���e]�pw�{$�[�Q��(�N�\�S19;�}�F�~y��u@2K=T�q�����n$
����&CHWRA;(6�5=T�Y ��fq������������j
`���V_��-JR@���[�i�AAUc�2���k5���������;�#`��W&��'���-�3yg�^��(`����<O
��K;��JWh�fv�m1�b���������h���t ��\��m������~)�r��V��@A��h���V����,��m�6V����\�@��R���&�x.�b0B��/��I�D�B��.9L�n[u�-ho~��Mh�k��z�XVAX�n��S��������t���'��������$�ye��`"3�D �1�� O��x:>�� y� �/R����t�?��t[CX ������n����b�R&o�����%
��4�{!�Z��2m/n����7oHy���������n�7^2�}�'oY���^�x5��q#H����Z�0`�+�2����B;J/j*���5 �����I�P�c�c*2n/��8����7Z0:����r����P����[����Z5$����01O8K;!<+��.��`Y;}�&��B�������
���c0.��u7~0�P��x%Xy3�~hpZ��Z�n>o�~���S�M�c=V���c��Qj���
�!s��X�Q�����>3�0*S��g��B:��,u�k�c��2Rd���t���b(����,ds�#�u��xF9ez�g��1X�~|+��u���$o\4��{���X��|t
�����P1~��)���������'�!5E�43-� u�i����Ly�"�|����7P��P�!�(2����!��N�+�>�6���:rQ� ���b�:`��AY�I__|#��$L�9��1�}�L��6�T3��m�'��ZQyl�YI��3K�^����y�h�R��y<B�V���J0a�/��F��be��Y��.��� ] ��^#��h�'�6K�
v�qj]��Ln����63\��@�E�4B��x�������\-�a��0��-����$ �����l���+�������h��_b�c������Z����?�a9�sx ���������>q�^ �����JC�U�8a�/���9��@�$�O8��YM�)��<k1��O��o�Xb;�����rz�M3q���W+������c�?�E��,��,���- _�l��s���V�7�>`�X��%>�4�}������#��d9�E"Tl�I���{����vh��"WO�L~��W)��n����!}�z8�#�U���u����bJ0��f��S��������^�x��� X����&C�xx�#@���������z�:�A�m��<R�} ���Be�P��d�����y��\����*4���D�xjn�PY(��'%��M���rF��*
��(�@���F��I��&�~�`�"��z��� �Xh��U#���1�o�y����(���L�;��u5[�)���YT`e/�4C��~�MF"4��f.}�I�>z\�����v�,[�q ��Z�V��j���Vm��z�{�Y9��v��gi�%4�|D�o��Q��i��l.@]�� ���%;�y��~"��h++���T�F~)}���f��+�# =<CA?}���,�z�{��Y������F0�b4�uF��,$28w�)������a��P* Q����Q��DT�R'4X��h��7�)��O���x�D<����
��P�T�J��0U� )�P��&�l��f.DgP:�%���>?��9,�S���i(�Q���@@�S����,JOd��K�{�c�F�st4��e���{���d�my���La�(�8kwE�401MIJt��W�!#��X�[��$1�9����%�Y ��8���:E��Wr��8�����7��
�Q��&����kY�|��|���� �S�n� %�����rC���6v��3KH�-����o2��T��
��u����!�|����=>����F4�G�J�����Jz�����q<H"E94#n����(�-L �7�Gy@���R���0#m��g�UK�_L�Zc���U@&��H��7�v�����:qU�B
A\ `Y��/)�n�/�����������o�Hn��~�DT��Ydw��� {���M��T9h!��X�)����7�>���2\0��c 7��1\�&��1�lsp�7�}�����\�([�4 9���M��<'��A�f'U%�<Mk��J/�� ����B���a~�":Z�<
���{��95��*���m�w�����b��4yS�M%�nB���x�2lT��U���j>���Uw2�#���Z�P�8 #��UT��M8)E�(����R���n�ZY��6bR�� �d�XbtsP�Y�5����1a\�5&���O!bfH��EM6!Y�����s�����I�Rp���K\C���j��a��k�7��3�^��E_��c��t�'��Rx����x���p��i�j>���lE����|�������8D��L���r]?�)��)�lgP��^�
��|�w�w��~�� �
�~�`�u���)Q<,[Fmb};K��}��\F�'`a��rz#��8�H'L��gn��
4��9�x�p��i�hW
�����$��f�������!)bEp��B��BS��U9��E��V>W#��� i�m��b"o
���8�vYG���Q��e��`��~�����/6(�x�������� sb�":�#1B��^�Jmc�k���.F�]r�c�e�L�&=�Q��wSGa`�{P��������e�XrV�N��V�mgPt+[��Tz2����Z��@M�T:>���2�����-��_����� vg������g ���2������[Ibx!��S�M� ����3�:��!,��9[���d���@BI�$�������%��8��XQ���rxs�������g�J�����;e��U�����l�O|��@�E�Q��JH:B�]�n��:Ktr_3D?d�8���R�ig�m�l�R@��������d��!���*8Lo��y��p)Uo��*��G���/����&����:=��z��Ws�}*
����z'0��U�M3U1��v������b���! 7�sJ� I����1%[���T
�nG�m ��<�n������H����H��KX �y?'��R���Vh��ld(���a����!�j����Fx]�ev�e��2-�M���U��w������>��P����P�
|e��� �8����+��vo������8���C*�+w��T/����sf[�v�m�!EQ��_�G}�����wY�BW����3�P�$���l���G\C��F�|M���]���Y�_��N��S%i-J�ZBF�>~����>��09����>y1aS���nVX'^< �^l3
^�������4r�d�P4�Ie�l�zhF-���8���<Wp�+��6�r
Wg8����v�I\���|���/���
�S�fZ;c����bv6\��M���d+�=��� ���x�6G�� �W�;��, ��LOe���Y�J*4�J���!C^�S��*��dln-�od��jN��#X���z[d>FI�B[�M���M��3��5�1EZy����$e�CM�����K�i!k:_=����h_�b�����"U����CL�4����+�|�����6q�����v'���:>���J�8��{1����@ew�>�J�hiY�;p�2*7{�NUY^o���z#�,���v��fo����g�$���i|&?����V�TEu�k����Q�w�}�O/��r�������\��a�fs���R�H{�B��vX�m������H���q/�6W��~�[�p����������_���]]��T��G�i�n>��@6��L���F�3��N�K��}����,��x �e\��@�v�Wo����:U��t�E������Y���<]M��^ nt�
:�qB:Gg�/@��'�����: G�g���hbx���������2�������I?>�T�_gZn!��k��q���e�(�M�d�
Z�!�����x���]�]n�%��%`c':�i�u�� �Ve��:1�z����&���Yx= ���'U��P�c��3�<�������i^�`ZF�[W�,�6YK�EO~��8:�������]��8?�z �S1�9��u_����"�?����������+���m���>���b������.��?�*���06��O}h��Az��;���
��
�\�������{�yQ�.�����B����^c�����I�'[�X)���O�������������U��N���_�m��j���7M���?�\�e �D�d�j
;�Fc��&c�+��]Eu)X^T���$�<�uF���L�X���;�dg������������o��L�(��5�2���5^��|�����r�j��>���w��gW'������d=W���=��5|�Z�*X���^]���TI�@�f��v�,=�(�'�Xvk�����>����4��C� l"�2�M!���Vy���||{��7��L>�>x�a�����$�7��{��L!�t��QE%�����`ZS'o���jv��lP�������Im8���z�3j�mp7��k�� �m5��O����� �:��<��1��.��g9Y��b#�������J���N�O�P�8<;�[�+����F��BF��:m���U>� ���$S>��)Lc�qC������,��rV��x��� ���'�X�\�H63��eN���+���+�>?��WSiX��[�<e�6��q2�����g����u�]e2��qN�����G��L]�2���N'��QJ7u@o�]�r>��:=z� ����;{���������sX� |K���K�z����wNk� �o�~��M������X���+3�M�9����`�h�Z��h&7r���&.5���8��8���K����p�;=#��n����JU+/��zurq����N�Y����<��D4�������j`)��%<�H�D
�/��*�B�@��������D�� �&�Y|
#zqiQE
|�.f�#���,A$~�e�QiE�a��o�����(p(-L���[G~:={y�������;��%7(��KI�.���#�Z��� �z�xh��+�gJ2�&}�7�W��U+�/k_��:C�d*��4:��#~|� W�Y�K�`��(P�b'�������ow�w ���~&���n�����Cvzw3;�B�Y�*D?^��)}'{47
�����g�n������EZ��]������S�� : �M��'��r����Q �����z��*���l�CR�"����k�8d���^n�d����f$�����X����k��� G/D������U5�V�3
���B�zb���A�S��%���`���C�Ne��#�:�F�9���X�J�����������V
��5+�g>����A�t�S��n@�g���~V����9��*��)�tVA�.��x��W���i�[��s�g�:\o6�t<�[`{����{��X������k4�-9��L�|oE;!����'���?���ko�e��R/a�v������GWG�*�$��/��;����8�T�b(fT
]���&ApK�8���o7`��[����o��h(L��/�[��I"���s�I.��=���[�(�������R��� H��_x*�I�H~2�^��W�hf�E�W�Go��L@$0`��k��S��x="\F<MJ$^��.�H�o���KJ� #V��*0HhJ��NA�LA�JA9�N�\ht�I��(e�n��D�K�1�J3f�S ATS��Q ����>3���>iz�GM/�\�{$z��^�-�|��7�O�#�/&���I:z7o7��l��v���������� �f]� 6��������Q�;�����-:��
z�X�j��l�0Bk����x������n�L�W+�!=h
3�����4�mm#u�`; ;P8=���~�[�����7�@��eX't��V�/����.�jXt5���_D�7puAW����a���[|�_����)��)��6�"Z��6�dX'�eX'�eX'���~
�n�N���N���N�m�E�F=GX'�2�}� �6_`�m!�uB�a��eXt[�6_������o
��o
�$�Z|�:�:I3�u�f��n�/�n�
� ]�uB�a]����"F���7uAW����a�����oEVnGUCnGTC:��_�p�@�D�:��u�}��aw�DHA��:)�u�`��V��y���N�a�$�:��&_���m�1��cX�=��N���0��
��y��y��v����7���:���:���:)�!_��[�A�:)�uR�$:|��#��l`X'���N����������N��a�d�:����2GX'|� _�u�m�&_ �-�n
���4��n��N�e�/��
6�����dX'��a]��|��Y�-�M�����u�
����E�A�
6n6
�U
�Q
�$t�|�*7P'���NB�a���������N�2������&!:}�)���a�������*��z8��$`v3$���
����C!Qtw($���N�D��w�B�N��C!Q'��PH� �� �:��S!Q'i�;!Q'tw'$����D���!Q'��;!Q'
�;!Q' �C!Q�=�;!Q'������;uR�u�
�u�
;u�w�B�.�m�B�N���D]�a�B�������������������-�b9�A$�P7>���r���A���c/���F�J�j�tc��`
��
�i���Q6��uV?��/���/''b�n������X�xD����XvT����4�p��F8�Yv�d.g � �y�8�]�iFN(�.�Y�Vg�W����''g�e��z|ty�9C��E�-�8=��t��i]<�"o�;xZW)�>h�/Z6Z�O�Z0'g/k)��x��&x�
�%�M_��&��C]����G�uA[��y���i��Z���xuq�VP:���;���7��p)����8��,��7��Oa���
@����G�w��0���@�_��iEJ�O�������c��EJ�:��!"3o���a
�r����kQ��_����K���@��� �%"l�N��%��I{�&���������l?��w�p��v�+�I��nM��lt�����p�6k���sSO��2�V���W�l24A����A�zS��^����/���~�>4���%���@���R�JJ����h���F�?���v#�(�3h�
�D����Ix��d ��D����)>�i�Or>��������J�����h����z�����6o� On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent; exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.
Thanks! A couple of thoughts after a 5-minute glance:
* documentation
* How about defining the tri-state consistent function to also return a
tri-state? True would mean that the tuple definitely matches, false
means the tuple definitely does not match, and Unknown means it might
match. Or does return value true with recheck==true have the same
effect? If I understood the patch, right, returning Unknown or True
wouldn't actually make any difference, but it's conceivable that we
might come up with more optimizations in the future that could take
advantage of that. For example, for a query like "foo OR (bar AND baz)",
you could immediately return any tuples that match foo, and not bother
scanning for bar and baz at all.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I checked out master and put together a test case using a small percentage
of production data for a known problem we have with Pg 9.2 and text search
scans.
A small percentage in this case means 10 million records randomly selected;
has a few billion records.
Tests ran for master successfully and I recorded timings.
Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)
initdb, import dump
The GIN index fails to build with a segfault.
DETAIL: Failed process was running: CREATE INDEX textsearch_gin_idx ON kp
USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT
NULL);
#0 XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn@entry=0x7fffcf341920,
bkpb=bkpb@entry=0x7fffcf341960, rdata=0x468f11 <ginFindLeafPage+529>,
rdata=0x468f11 <ginFindLeafPage+529>) at xlog.c:2339
#1 0x00000000004b9ddd in XLogInsert (rmid=rmid@entry=13 '\r',
info=info@entry=16 '\020', rdata=rdata@entry=0x7fffcf341bf0) at xlog.c:936
#2 0x0000000000468a9e in createPostingTree (index=0x7fa4e8d31030,
items=items@entry=0xfb55680, nitems=nitems@entry=762,
buildStats=buildStats@entry=0x7fffcf343dd0) at gindatapage.c:1324
#3 0x00000000004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0,
nitem=762, items=0xfb55680, category=<optimized out>, key=34078256,
attnum=<optimized out>, ginstate=0x7fffcf341df0) at gininsert.c:281
#4 ginEntryInsert (ginstate=ginstate@entry=0x7fffcf341df0,
attnum=<optimized out>, key=34078256, category=<optimized out>,
items=0xfb55680, nitem=762,
buildStats=buildStats@entry=0x7fffcf343dd0) at gininsert.c:351
#5 0x00000000004635b0 in ginbuild (fcinfo=<optimized out>) at
gininsert.c:531
#6 0x0000000000718637 in OidFunctionCall3Coll
(functionId=functionId@entry=2738,
collation=collation@entry=0, arg1=arg1@entry=140346257507968,
arg2=arg2@entry=140346257510448, arg3=arg3@entry=32826432) at
fmgr.c:1649
#7 0x00000000004ce1da in index_build
(heapRelation=heapRelation@entry=0x7fa4e8d30680,
indexRelation=indexRelation@entry=0x7fa4e8d31030,
indexInfo=indexInfo@entry=0x1f4e440, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:1963
#8 0x00000000004ceeaa in index_create
(heapRelation=heapRelation@entry=0x7fa4e8d30680,
indexRelationName=indexRelationName@entry=0x1f4e660
"textsearch_gin_knn_idx", indexRelationId=16395, indexRelationId@entry=0,
relFileNode=<optimized out>, indexInfo=indexInfo@entry=0x1f4e440,
indexColNames=indexColNames@entry=0x1f4f728,
accessMethodObjectId=accessMethodObjectId@entry=2742,
tableSpaceId=tableSpaceId@entry=0,
collationObjectId=collationObjectId@entry=0x1f4fcc8,
classObjectId=classObjectId@entry=0x1f4fce0,
coloptions=coloptions@entry=0x1f4fcf8,
reloptions=reloptions@entry=0, isprimary=0 '\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000') at index.c:1082
#9 0x0000000000546a78 in DefineIndex (stmt=<optimized out>,
indexRelationId=indexRelationId@entry=0, is_alter_table=is_alter_table@entry=0
'\000',
check_rights=check_rights@entry=1 '\001', skip_build=skip_build@entry=0
'\000', quiet=quiet@entry=0 '\000') at indexcmds.c:594
#10 0x000000000065147e in ProcessUtilitySlow
(parsetree=parsetree@entry=0x1f7fb68,
queryString=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);",
context=<optimized out>, params=params@entry=0x0,
completionTag=completionTag@entry=0x7fffcf344c10 "", dest=<optimized out>)
at utility.c:1163
#11 0x000000000065079e in standard_ProcessUtility (parsetree=0x1f7fb68,
queryString=<optimized out>, context=<optimized out>, params=0x0,
dest=<optimized out>, completionTag=0x7fffcf344c10 "") at utility.c:873
#12 0x000000000064de61 in PortalRunUtility (portal=portal@entry=0x1f4c350,
utilityStmt=utilityStmt@entry=0x1f7fb68, isTopLevel=isTopLevel@entry=1
'\001',
dest=dest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10
"") at pquery.c:1187
#13 0x000000000064e9e5 in PortalRunMulti (portal=portal@entry=0x1f4c350,
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08,
altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10
"") at pquery.c:1318
#14 0x000000000064f459 in PortalRun (portal=portal@entry=0x1f4c350,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',
dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10 "") at pquery.c:816
#15 0x000000000064d2d5 in exec_simple_query (
query_string=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);") at
postgres.c:1048
#16 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1f2ad40,
dbname=0x1f2abf8 "rbt", username=<optimized out>) at postgres.c:3992
#17 0x000000000045b1b4 in BackendRun (port=0x1f47280) at postmaster.c:4085
#18 BackendStartup (port=0x1f47280) at postmaster.c:3774
#19 ServerLoop () at postmaster.c:1585
#20 0x000000000060d031 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1f28b20)
at postmaster.c:1240
#21 0x000000000045bb25 in main (argc=3, argv=0x1f28b20) at main.c:196
On Thu, Nov 14, 2013 at 12:26 PM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
Show quoted text
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent;
exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.------
With best regards,
Alexander Korotkov.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor <rbt@simple-knowledge.com>wrote:
I checked out master and put together a test case using a small percentage
of production data for a known problem we have with Pg 9.2 and text search
scans.A small percentage in this case means 10 million records randomly
selected; has a few billion records.Tests ran for master successfully and I recorded timings.
Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)initdb, import dump
The GIN index fails to build with a segfault.
Thanks for testing. See fixed version in thread about packed posting lists.
------
With best regards,
Alexander Korotkov.
On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
hlinnakangas@vmware.comwrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and exact
false values will be passed in the case of current patch consistent;
exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before this,
so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.Thanks! A couple of thoughts after a 5-minute glance:
* documentation
Will provide documented version this week.
* How about defining the tri-state consistent function to also return a
tri-state? True would mean that the tuple definitely matches, false means
the tuple definitely does not match, and Unknown means it might match. Or
does return value true with recheck==true have the same effect? If I
understood the patch, right, returning Unknown or True wouldn't actually
make any difference, but it's conceivable that we might come up with more
optimizations in the future that could take advantage of that. For example,
for a query like "foo OR (bar AND baz)", you could immediately return any
tuples that match foo, and not bother scanning for bar and baz at all.
The meaning of recheck flag when input contains unknown is undefined now. :)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false
instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return
recheck. This meaning is probably logical, but I don't see any usage of it.
I'm not against idea of tri-state returning value for consisted, because
it's logical continuation of its tri-state input. However, I don't see
usage of distinguish True and Unknown in returning value for now :)
In example you give we can return foo immediately, but we have to create
full bitmap. So we anyway will have to scan (bar AND baz). We could skip
part of trees for bar and baz. But it's possible only when foo contains
large amount of sequential TIDS so we can be sure that we didn't miss any
TIDs. This seems to be very narrow use-case for me.
Another point is that one day we probably could immediately return tuples
in gingettuple. And with LIMIT clause and no sorting we can don't search
for other tuples. However, gingettuple was removed because of reasons of
concurrency. And my patches for index-based ordering didn't return it in
previous manner: it collects all the results and then returns them
one-by-one.
------
With best regards,
Alexander Korotkov.
I tried again this morning using gin-packed-postinglists-16.patch and
gin-fast-scan.6.patch. No crashes.
It is about a 0.1% random sample of production data (10,000,000 records)
with the below structure. Pg was compiled with debug enabled in both cases.
Table "public.kp"
Column | Type | Modifiers
--------+---------+-----------
id | bigint | not null
string | text | not null
score1 | integer |
score2 | integer |
score3 | integer |
score4 | integer |
Indexes:
"kp_pkey" PRIMARY KEY, btree (id)
"kp_string_key" UNIQUE CONSTRAINT, btree (string)
"textsearch_gin_idx" gin (to_tsvector('simple'::regconfig, string))
WHERE score1 IS NOT NULL
This is a query tested. All data is in Pg buffer cache for these timings.
Words like "the" and "and" are very common (~9% of entries, each) and a
word like "hotel" is much less common (~0.2% of entries).
SELECT id,string
FROM kp
WHERE score1 IS NOT NULL
AND to_tsvector('simple', string) @@ to_tsquery('simple', ?)
-- ? is substituted with the query strings
ORDER BY score1 DESC, score2 ASC
LIMIT 1000;
Limit (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032
rows=142 loops=1)
-> Sort (cost=56.04..56.04 rows=1 width=37) (actual
time=250.008..250.017 rows=142 loops=1)
Sort Key: score1, score2
Sort Method: quicksort Memory: 36kB
-> Bitmap Heap Scan on kp (cost=52.01..56.03 rows=1 width=37)
(actual time=249.711..249.945 rows=142 loops=1)
Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@
'''hotel'' & ''and'' & ''the'''::tsquery) AND (score1 IS NOT NULL))
-> Bitmap Index Scan on textsearch_gin_idx
(cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142
loops=1)
Index Cond: (to_tsvector('simple'::regconfig, string)
@@ '''hotel'' & ''and'' & ''the'''::tsquery)
Total runtime: 250.096 ms
Times are from \timing on.
MASTER
=======
the: 888.436 ms 926.609 ms 885.502 ms
and: 944.052 ms 937.732 ms 920.050 ms
hotel: 53.992 ms 57.039 ms 65.581 ms
and & the & hotel: 260.308 ms 248.275 ms 248.098 ms
These numbers roughly match what we get with Pg 9.2. The time savings
between 'the' and 'and & the & hotel' is mostly heap lookups for the score
and the final sort.
The size of the index on disk is about 2% smaller in the patched version.
PATCHED
=======
the: 1055.169 ms 1081.976 ms 1083.021 ms
and: 912.173 ms 949.364 ms 965.261 ms
hotel: 62.591 ms 64.341 ms 62.923 ms
and & the & hotel: 268.577 ms 259.293 ms 257.408 ms
hotel & and & the: 253.574 ms 258.071 ms 250.280 ms
I was hoping that the 'and & the & hotel' case would improve with this
patch to be closer to the 'hotel' search, as I thought that was the kind of
thing it targeted. Unfortunately, it did not. I actually applied the
patches, compiled, initdb/load data, and ran it again thinking I made a
mistake.
Reordering the terms 'hotel & and & the' doesn't change the result.
On Fri, Nov 15, 2013 at 1:51 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Show quoted text
On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor <rbt@simple-knowledge.com>wrote:
I checked out master and put together a test case using a small
percentage of production data for a known problem we have with Pg 9.2 and
text search scans.A small percentage in this case means 10 million records randomly
selected; has a few billion records.Tests ran for master successfully and I recorded timings.
Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)initdb, import dump
The GIN index fails to build with a segfault.
Thanks for testing. See fixed version in thread about packed posting lists.
------
With best regards,
Alexander Korotkov.
I tried again this morning using gin-packed-postinglists-16.patch and
gin-fast-scan.6.patch. No crashes during index building.
Pg was compiled with debug enabled in both cases. The data is a ~0.1%
random sample of production data (10,000,000 records for the test) with the
below structure.
Table "public.kp"
Column | Type | Modifiers
--------+---------+-----------
id | bigint | not null
string | text | not null
score1 | integer |
score2 | integer |
score3 | integer |
score4 | integer |
Indexes:
"kp_pkey" PRIMARY KEY, btree (id)
"kp_string_key" UNIQUE CONSTRAINT, btree (string)
"textsearch_gin_idx" gin (to_tsvector('simple'::regconfig, string))
WHERE score1 IS NOT NULL
All data is in Pg buffer cache for these timings. Words like "the" and
"and" are very common (~9% of entries, each) and a word like "hotel" is
much less common (~0.2% of entries). Below is the query tested:
SELECT id,string
FROM kp
WHERE score1 IS NOT NULL
AND to_tsvector('simple', string) @@ to_tsquery('simple', ?)
-- ? is substituted with the query strings
ORDER BY score1 DESC, score2 ASC
LIMIT 1000;
Limit (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032
rows=142 loops=1)
-> Sort (cost=56.04..56.04 rows=1 width=37) (actual
time=250.008..250.017 rows=142 loops=1)
Sort Key: score1, score2
Sort Method: quicksort Memory: 36kB
-> Bitmap Heap Scan on kp (cost=52.01..56.03 rows=1 width=37)
(actual time=249.711..249.945 rows=142 loops=1)
Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@
'''hotel'' & ''and'' & ''the'''::tsquery) AND (score1 IS NOT NULL))
-> Bitmap Index Scan on textsearch_gin_idx
(cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142
loops=1)
Index Cond: (to_tsvector('simple'::regconfig, string)
@@ '''hotel'' & ''and'' & ''the'''::tsquery)
Total runtime: 250.096 ms
Times are from \timing on.
MASTER
=======
the: 888.436 ms 926.609 ms 885.502 ms
and: 944.052 ms 937.732 ms 920.050 ms
hotel: 53.992 ms 57.039 ms 65.581 ms
and & the & hotel: 260.308 ms 248.275 ms 248.098 ms
These numbers roughly match what we get with Pg 9.2. The time savings
between 'the' and 'and & the & hotel' is mostly heap lookups for the score
and the final sort.
The size of the index on disk is about 2% smaller in the patched version.
PATCHED
=======
the: 1055.169 ms 1081.976 ms 1083.021 ms
and: 912.173 ms 949.364 ms 965.261 ms
hotel: 62.591 ms 64.341 ms 62.923 ms
and & the & hotel: 268.577 ms 259.293 ms 257.408 ms
hotel & and & the: 253.574 ms 258.071 ms 250.280 ms
I was hoping that the 'and & the & hotel' case would improve with this
patch to be closer to the 'hotel' search, as I thought that was the kind of
thing it targeted. Unfortunately, it did not. I actually applied the
patches, compiled, initdb/load data, and ran it again thinking I made a
mistake.
Reordering the terms 'hotel & and & the' doesn't change the result.
On Fri, Nov 15, 2013 at 1:51 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
Show quoted text
On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor <rbt@simple-knowledge.com>wrote:
I checked out master and put together a test case using a small
percentage of production data for a known problem we have with Pg 9.2 and
text search scans.A small percentage in this case means 10 million records randomly
selected; has a few billion records.Tests ran for master successfully and I recorded timings.
Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)initdb, import dump
The GIN index fails to build with a segfault.
Thanks for testing. See fixed version in thread about packed posting lists.
------
With best regards,
Alexander Korotkov.
On Fri, Nov 15, 2013 at 6:57 PM, Rod Taylor <pg@rbt.ca> wrote:
I tried again this morning using gin-packed-postinglists-16.patch and
gin-fast-scan.6.patch. No crashes.It is about a 0.1% random sample of production data (10,000,000 records)
with the below structure. Pg was compiled with debug enabled in both cases.Table "public.kp"
Column | Type | Modifiers
--------+---------+-----------
id | bigint | not null
string | text | not null
score1 | integer |
score2 | integer |
score3 | integer |
score4 | integer |
Indexes:
"kp_pkey" PRIMARY KEY, btree (id)
"kp_string_key" UNIQUE CONSTRAINT, btree (string)
"textsearch_gin_idx" gin (to_tsvector('simple'::regconfig, string))
WHERE score1 IS NOT NULLThis is a query tested. All data is in Pg buffer cache for these timings.
Words like "the" and "and" are very common (~9% of entries, each) and a
word like "hotel" is much less common (~0.2% of entries).SELECT id,string
FROM kp
WHERE score1 IS NOT NULL
AND to_tsvector('simple', string) @@ to_tsquery('simple', ?)
-- ? is substituted with the query strings
ORDER BY score1 DESC, score2 ASC
LIMIT 1000;Limit (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032 rows=142 loops=1) -> Sort (cost=56.04..56.04 rows=1 width=37) (actual time=250.008..250.017 rows=142 loops=1) Sort Key: score1, score2 Sort Method: quicksort Memory: 36kB -> Bitmap Heap Scan on kp (cost=52.01..56.03 rows=1 width=37) (actual time=249.711..249.945 rows=142 loops=1) Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@ '''hotel'' & ''and'' & ''the'''::tsquery) AND (score1 IS NOT NULL)) -> Bitmap Index Scan on textsearch_gin_idx (cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142 loops=1) Index Cond: (to_tsvector('simple'::regconfig, string) @@ '''hotel'' & ''and'' & ''the'''::tsquery) Total runtime: 250.096 msTimes are from \timing on.
MASTER
=======
the: 888.436 ms 926.609 ms 885.502 ms
and: 944.052 ms 937.732 ms 920.050 ms
hotel: 53.992 ms 57.039 ms 65.581 ms
and & the & hotel: 260.308 ms 248.275 ms 248.098 msThese numbers roughly match what we get with Pg 9.2. The time savings
between 'the' and 'and & the & hotel' is mostly heap lookups for the score
and the final sort.The size of the index on disk is about 2% smaller in the patched version.
PATCHED
=======
the: 1055.169 ms 1081.976 ms 1083.021 ms
and: 912.173 ms 949.364 ms 965.261 ms
hotel: 62.591 ms 64.341 ms 62.923 ms
and & the & hotel: 268.577 ms 259.293 ms 257.408 ms
hotel & and & the: 253.574 ms 258.071 ms 250.280 msI was hoping that the 'and & the & hotel' case would improve with this
patch to be closer to the 'hotel' search, as I thought that was the kind of
thing it targeted. Unfortunately, it did not. I actually applied the
patches, compiled, initdb/load data, and ran it again thinking I made a
mistake.Reordering the terms 'hotel & and & the' doesn't change the result.
Oh, in this path new consistent method isn't implemented for tsvector
opclass, for array only. Will be fixed soon.
BTW, was index 2% smaller or 2 times smaller? If it's 2% smaller than I
need to know more about your dataset :)
------
With best regards,
Alexander Korotkov.
2%.
It's essentially sentence fragments from 1 to 5 words in length. I wasn't
expecting it to be much smaller.
10 recent value selections:
white vinegar reduce color running
vinegar cure uti
cane vinegar acidity depends parameter
how remedy fir clogged shower
use vinegar sensitive skin
home remedies removing rust heating
does non raw apple cider
home remedies help maintain healthy
can vinegar mess up your
apple cide vineger ph balance
regards,
Rod
On Fri, Nov 15, 2013 at 12:51 PM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
Show quoted text
On Fri, Nov 15, 2013 at 6:57 PM, Rod Taylor <pg@rbt.ca> wrote:
I tried again this morning using gin-packed-postinglists-16.patch and
gin-fast-scan.6.patch. No crashes.It is about a 0.1% random sample of production data (10,000,000 records)
with the below structure. Pg was compiled with debug enabled in both cases.Table "public.kp"
Column | Type | Modifiers
--------+---------+-----------
id | bigint | not null
string | text | not null
score1 | integer |
score2 | integer |
score3 | integer |
score4 | integer |
Indexes:
"kp_pkey" PRIMARY KEY, btree (id)
"kp_string_key" UNIQUE CONSTRAINT, btree (string)
"textsearch_gin_idx" gin (to_tsvector('simple'::regconfig, string))
WHERE score1 IS NOT NULLThis is a query tested. All data is in Pg buffer cache for these timings.
Words like "the" and "and" are very common (~9% of entries, each) and a
word like "hotel" is much less common (~0.2% of entries).SELECT id,string
FROM kp
WHERE score1 IS NOT NULL
AND to_tsvector('simple', string) @@ to_tsquery('simple', ?)
-- ? is substituted with the query strings
ORDER BY score1 DESC, score2 ASC
LIMIT 1000;Limit (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032 rows=142 loops=1) -> Sort (cost=56.04..56.04 rows=1 width=37) (actual time=250.008..250.017 rows=142 loops=1) Sort Key: score1, score2 Sort Method: quicksort Memory: 36kB -> Bitmap Heap Scan on kp (cost=52.01..56.03 rows=1 width=37) (actual time=249.711..249.945 rows=142 loops=1) Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@ '''hotel'' & ''and'' & ''the'''::tsquery) AND (score1 IS NOT NULL)) -> Bitmap Index Scan on textsearch_gin_idx (cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142 loops=1) Index Cond: (to_tsvector('simple'::regconfig, string) @@ '''hotel'' & ''and'' & ''the'''::tsquery) Total runtime: 250.096 msTimes are from \timing on.
MASTER
=======
the: 888.436 ms 926.609 ms 885.502 ms
and: 944.052 ms 937.732 ms 920.050 ms
hotel: 53.992 ms 57.039 ms 65.581 ms
and & the & hotel: 260.308 ms 248.275 ms 248.098 msThese numbers roughly match what we get with Pg 9.2. The time savings
between 'the' and 'and & the & hotel' is mostly heap lookups for the score
and the final sort.The size of the index on disk is about 2% smaller in the patched version.
PATCHED
=======
the: 1055.169 ms 1081.976 ms 1083.021 ms
and: 912.173 ms 949.364 ms 965.261 ms
hotel: 62.591 ms 64.341 ms 62.923 ms
and & the & hotel: 268.577 ms 259.293 ms 257.408 ms
hotel & and & the: 253.574 ms 258.071 ms 250.280 msI was hoping that the 'and & the & hotel' case would improve with this
patch to be closer to the 'hotel' search, as I thought that was the kind of
thing it targeted. Unfortunately, it did not. I actually applied the
patches, compiled, initdb/load data, and ran it again thinking I made a
mistake.Reordering the terms 'hotel & and & the' doesn't change the result.
Oh, in this path new consistent method isn't implemented for tsvector
opclass, for array only. Will be fixed soon.
BTW, was index 2% smaller or 2 times smaller? If it's 2% smaller than I
need to know more about your dataset :)------
With best regards,
Alexander Korotkov.
On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
2%.
It's essentially sentence fragments from 1 to 5 words in length. I wasn't
expecting it to be much smaller.10 recent value selections:
white vinegar reduce color running
vinegar cure uti
cane vinegar acidity depends parameter
how remedy fir clogged shower
use vinegar sensitive skin
home remedies removing rust heating
does non raw apple cider
home remedies help maintain healthy
can vinegar mess up your
apple cide vineger ph balance
I didn't get why it's not significantly smaller. Is it possible to share
dump?
------
With best regards,
Alexander Korotkov.
On Fri, Nov 15, 2013 at 2:26 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
2%.
It's essentially sentence fragments from 1 to 5 words in length. I wasn't
expecting it to be much smaller.10 recent value selections:
white vinegar reduce color running
vinegar cure uti
cane vinegar acidity depends parameter
how remedy fir clogged shower
use vinegar sensitive skin
home remedies removing rust heating
does non raw apple cider
home remedies help maintain healthy
can vinegar mess up your
apple cide vineger ph balanceI didn't get why it's not significantly smaller. Is it possible to share
dump?
Sorry, I reported that incorrectly. It's not something I was actually
looking for and didn't pay much attention to at the time.
The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.
On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
On Fri, Nov 15, 2013 at 2:26 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor <rod.taylor@gmail.com>wrote:
2%.
It's essentially sentence fragments from 1 to 5 words in length. I
wasn't expecting it to be much smaller.10 recent value selections:
white vinegar reduce color running
vinegar cure uti
cane vinegar acidity depends parameter
how remedy fir clogged shower
use vinegar sensitive skin
home remedies removing rust heating
does non raw apple cider
home remedies help maintain healthy
can vinegar mess up your
apple cide vineger ph balanceI didn't get why it's not significantly smaller. Is it possible to share
dump?Sorry, I reported that incorrectly. It's not something I was actually
looking for and didn't pay much attention to at the time.The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.
Good. That's meet my expectations :)
You mention that both master and patched versions was compiled with debug.
Was cassert enabled?
------
With best regards,
Alexander Korotkov.
On 11/14/13, 12:26 PM, Alexander Korotkov wrote:
Revised version of patch is attached.
This doesn't build:
ginget.c: In function �scanPage�:
ginget.c:1108:2: warning: implicit declaration of function �GinDataLeafPageGetPostingListEnd� [-Wimplicit-function-declaration]
ginget.c:1108:9: warning: assignment makes pointer from integer without a cast [enabled by default]
ginget.c:1109:18: error: �GinDataLeafIndexCount� undeclared (first use in this function)
ginget.c:1109:18: note: each undeclared identifier is reported only once for each function it appears in
ginget.c:1111:3: error: unknown type name �GinDataLeafItemIndex�
ginget.c:1111:3: warning: implicit declaration of function �GinPageGetIndexes� [-Wimplicit-function-declaration]
ginget.c:1111:57: error: subscripted value is neither array nor pointer nor vector
ginget.c:1112:12: error: request for member �pageOffset� in something not a structure or union
ginget.c:1115:38: error: request for member �iptr� in something not a structure or union
ginget.c:1118:230: error: request for member �pageOffset� in something not a structure or union
ginget.c:1119:16: error: request for member �iptr� in something not a structure or union
ginget.c:1123:233: error: request for member �pageOffset� in something not a structure or union
ginget.c:1136:3: warning: implicit declaration of function �ginDataPageLeafReadItemPointer� [-Wimplicit-function-declaration]
ginget.c:1136:7: warning: assignment makes pointer from integer without a cast [enabled by default]
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 16, 2013 at 12:10 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 11/14/13, 12:26 PM, Alexander Korotkov wrote:
Revised version of patch is attached.
This doesn't build:
ginget.c: In function ‘scanPage’:
ginget.c:1108:2: warning: implicit declaration of function
‘GinDataLeafPageGetPostingListEnd’ [-Wimplicit-function-declaration]
ginget.c:1108:9: warning: assignment makes pointer from integer without a
cast [enabled by default]
ginget.c:1109:18: error: ‘GinDataLeafIndexCount’ undeclared (first use in
this function)
ginget.c:1109:18: note: each undeclared identifier is reported only once
for each function it appears in
ginget.c:1111:3: error: unknown type name ‘GinDataLeafItemIndex’
ginget.c:1111:3: warning: implicit declaration of function
‘GinPageGetIndexes’ [-Wimplicit-function-declaration]
ginget.c:1111:57: error: subscripted value is neither array nor pointer
nor vector
ginget.c:1112:12: error: request for member ‘pageOffset’ in something not
a structure or union
ginget.c:1115:38: error: request for member ‘iptr’ in something not a
structure or union
ginget.c:1118:230: error: request for member ‘pageOffset’ in something not
a structure or union
ginget.c:1119:16: error: request for member ‘iptr’ in something not a
structure or union
ginget.c:1123:233: error: request for member ‘pageOffset’ in something not
a structure or union
ginget.c:1136:3: warning: implicit declaration of function
‘ginDataPageLeafReadItemPointer’ [-Wimplicit-function-declaration]
ginget.c:1136:7: warning: assignment makes pointer from integer without a
cast [enabled by default]
This patch is against gin packed posting lists patch. Doesn't compile
separately.
------
With best regards,
Alexander Korotkov.
On Fri, Nov 15, 2013 at 11:42 PM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
On Fri, Nov 15, 2013 at 2:26 PM, Alexander Korotkov <aekorotkov@gmail.com
wrote:
On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor <rod.taylor@gmail.com>wrote:
2%.
It's essentially sentence fragments from 1 to 5 words in length. I
wasn't expecting it to be much smaller.10 recent value selections:
white vinegar reduce color running
vinegar cure uti
cane vinegar acidity depends parameter
how remedy fir clogged shower
use vinegar sensitive skin
home remedies removing rust heating
does non raw apple cider
home remedies help maintain healthy
can vinegar mess up your
apple cide vineger ph balanceI didn't get why it's not significantly smaller. Is it possible to share
dump?Sorry, I reported that incorrectly. It's not something I was actually
looking for and didn't pay much attention to at the time.The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.
Good. That's meet my expectations :)
You mention that both master and patched versions was compiled with debug.
Was cassert enabled?
In the attached version of patch tsvector opclass is enabled to use
fastscan. You can retry your tests.
------
With best regards,
Alexander Korotkov.
Attachments:
gin-fast-scan.7.patch.gzapplication/x-gzip; name=gin-fast-scan.7.patch.gzDownload
� �w�R �=�s�6�?+��/�Y��Q7��q��_�����NGCI��X"U�J������ >$�m���k&�%r.�}�������u#����`�L��7=p&����?�(��6��0O<�I���*E�^���O\o*?��l�&�V��8Mg��<����y����}�w��� xt�������tf��,�s���BP��xM�t��^K`;z|m4[@���'B�
?��`��K!���R8�@:�1�'N$�"�R��b ����;7��7�\����
���g����N����|=Nn]��N=���A�Uc~"C��<�L�U��/�S��\���7������n�j ����2� |u�Z~r������$E�
���8Z�Q�I!��{r= ��+5a�<���a�/���HP��A�>�!?� ���p:�GP�!�2\F�!�GjP_���l��l��@,���l��Z:�'��W���[��
�wr6�9;=?����Ia���+zV��0�Py�K`��i��_����lw�{��7��J�T!w������F3I�����B=Z� �2�0�#�^ ��w2"�J
��SP�>�����/c���2V�TRkg�{/"#�$��.c�@X�u�����E2��+���d�XM������&|���;�
�H.t_�:l:��}{?Y��������Y����%w&�� ����Z_��aY�G�9.�7N�0�.���EP"\�9S���^>T��T�
����f��*�;��+Q�E����Wu�x�� ��-~����Z7V9Rk��h��xh��`t)a�2F �9��*��w��u�$�Bc�&�9����sI`p����n%�H�7��j��S��D��-�brvr�4V����)���d�z��0��y ��JT��
��.��V2Pl�kz��K-$x�����f�D7��������Z}Y�7/I�r�n�=U���?K?�Tx�j0�w�G��-� g< ���h���;���i8�@��T�B���P;�c��p��vf���� 6�\u�;���Zc���&��&��q�����+2zX������V�Pq.�@��V���&F|m7c+��Pq.H�n�)�Z A�<I�!�Z�W�|�E���x�
C�D��������3?WB�_M�m#�h4Y,� �@7}��),�FU��7��t���'�n������$�y�{��Jo"�D ���� G��x:.�6/z�� �NR �� ���_���}� ��m7��v����d�JF�]����%
��4�{>�Z��2m/n�����))�};u�� ���u����&w��h�,���K���k������FK+X�
�����������0�e
�����V�8�8�"�V��k��0;��F�FGYR�R.��]�j���8��T�'Q��d�����&��gq'���`E !���:,K`���d�Zwzy8���|v�E������&
xz_V���)}Q��a7��t?O��� ����~W9E-��>���b��C���{0���������h��L���9�Mi��[0��������a� ;XG��C &���B6�:[�I��O�(�L��_7k�
�$��o�N��Y�B��������.����k@��n5������W>A�Ood8!�(b��y��h�N����$%Vg����cr�B����o��
�E�(Fh�<�a-�w\��i5��=���R ����I CM
*��K�B��K��E~�����������M3N9Q����p����v�uA�t�#�T�T���i����F/����!�#t+�������h��)T���Xw����J�����Fs=�YrW��c�B,dt�O� ��1��-z�H���M�%���j������ �������"�8>?���4X��������s���������f�cc��lh��� ��Iq"���5��qU��u���tP����x\��hLq��:M�L�XM��Ulh�.\��1��A~&��?�G[
�0�i<j1~ ]N���
���(�
[����&RHu��sa�����1�e'M��b�|(�^�������DL��
��T#�#������j�>*��)��O��D��
�<Ji�K�M�#��ff��a*N���z�g��FwL �f��m���T��F*��Y����XU��6���.n���������Z��oX�+�by�|�`�u�h�9�8z�����A�A����X��������IE�vS�eeO��m�q�m�L\(�2���?4�Y��!��CMd�D00W����6����<s�e��U��q��J�L���0ys�#!E�=&���`B-���'OI~���P:��L?:�D^�r���O1&.�(<��1{�����|��o�-)Z���p�}�?���v5w��#K����B��F���6�z�T�~����c��S jU�L~�N�R�N��f
�z:;��|��k�"�`��d����>��9!/��h�H"��*;�YU*/%;������K�Z���~<CA?��f��=���FL�����P
U#�[!Z�:F�V��qq`���z�����J�-��Y�,u�U�T
�"�6��KQ��'cF<.�j�UL�cX�F�Je�D�x�����(�P�Sf�tr(����z�pDjok0�;,�t���~(NR���@@���B\�$�JOd����HD���<��j��K���s��;r��~�L�1aMltO��k'Fub e��51�h7a[_�>��.f����,���4��>0���l�&�\�;=I��(_�00�H��6Y}8~����2��sW'�Kf�8
�Sy#�(�g��,��Gb�/ik�����$���Y��&�*�E6Q� I�Q��EYc4d���K�Cf�[#���c-��@���'7�C�x��rhg��e�� ��0���/|�#rKy���X��M�4Y����d�5���@�j
������r��.e$�j�+�!� N���(���{��;V����3��4�_�"�9��Q�G�Dt�V�����WMfX���A�a2������c�v�~'�(�C���s���~�k�<��6���y#�Gq���� �r��&!/R����W��a1���d�+�A1��������VE�e��0��<:Z�<"L���{�9F=��,���m�z��-� ��|<�$zRO%�n�t5�Fe���
�/���|
���d���)����fa�R���k��p��*\bkR���+�#���z�m���$��$@������>/3�������� c�1����r!5CBn.�J�I���t��YV�8�P��s�m,��}�*�#��t�zo�Lj��3T��A�U`��Hy)��r�N0��.^����`9��m��������hDU|Hb
C�t&a�A9�������lgP�wN���l�,ILlO�}�D^ �C��J:���� ��N�6���d��>]R�#Y01�a���z{�y�3*��7��
4��1�x�p�k�i�-~W������<Y�fh2�M CR�����>77���+�D��S��&s��9��F7
�^��m.�#�� ���.���Lw��B�D�%����#t��
r-��@k6���s��� ����H�����T�
��w��gM�\KYC�D�,S�u�u�l��VQXy
�Tw�4kJf�m�2Y,9�����F�m&Rt+i5��d�]���@1����t�H�o$D�w�
�7L�~���0�����iOd&����E��\J����N�8f����t�r@���6�'X�nM�����s���{flAz�����Tot��y^� �4�2k����t"�
�/\�����
��Te>�(��Rl���5�WmJ/|\��;qQ?�Gic(� i� ���j����}�}�����j�5g:5�V���1TMql~>�X\���2�t�A����b�Z��0\J����JC��-K3������8j���@!�������uK�T}���:*0��U�E3U1��rMmR�)��D C@n��Ta����ZcJ�F*1T*E����`c�$���B�G#����"9#.a�����E�J�$�Dk�f#��� �K�f��
)W��~T��0���>��S��[�R�8_�z�O�����c�T
���ZJ\�o�L�[4��Q��bw�������LVI��*?�t���X������(o*`9����N��q���TMm\�n�uyq~���'�
]6CZ�M�Bi����J�P�8���N5
�sb$W��: �Tfx��JIU����Ll�A����z���?$��{RV�''$l��8��
���;,�+�m�����Q�6)~0��*�4M`R�.[�T=4�X�ke�,�d3�+��%�n������3s����t��I\�@O�|%���/����Q�f�;a��,�bv6\��M���`)�5��������]7C�� �W�{�, ��LOe����f�Xh���{C��z�xU������5���i'�9XH�`a��
r��%���������4A��c��4q
lL�V\�����c��P��7O��Tn��'���W-+�6�W��)����@U����Sk�,�pd��S�~�XTx��rdY��v'���B?��������{!���Oew��K�ha���8�B���'�n���T��Q������f�����3H]���L~��&a�T*��&�Wx��R�����;��^�Ae������'�X����f8I���������a�Y���� �#��d��m���o�szL�U��Foo���O��FG���TE���pZ����+�M���8����Vk�'��l�l�����eX����?,Hj H���6;�x��*���#��}�`����L���6�o� 7�Rj�"��3���i'H����@m���%K��
7�=>�!���(#�4���d����S@5����b�7��$�\%���oY%
d�*�;�F������N�i����r�*a��*;��N���-L ��Sw���^��I�������f�h�/��������$�3o=��������a�@0-�������������___{3�������8�Xg���7"D�B��u(�2V���G!t|~vuru=<���<��o������H9�;a����s7���ifcd��U�����7�����@��� ���0�d�k��A,}�xO7O��%|8�*�
�q�N���q�x�Hc��:�rhd���0*+|�}4��n��C��N����k���(��gD!�Cw�9'��@6�������Z�Z���Z�F��.�2/��2��
�jj3|5w ���&���,���gEOB����K���[j����� N��s-�H�����+w���O�j�d p����#@���jz +�*H�)"��������W����|���^$�5h��RIQH�qr�� ������&H>�#1p���8{� 9+>�B�^)�������shc�����P�M6�{X"��JJ��a�������r����4z/"��K<�;��T�w���
�'��Gj�Wg���XC�W��?9�^�QBcI�G�2���
]V�xx*�:>�uJ`D�L�I��{\FZ� '7����<7t����Q��r��
&knU��OIz� �Z�9�2�gR!�"v�~S�@�i@��@U����l���!0�G��\Fw�T�(s�+JV������#+�1�������} ��:�i6K���_*����
V`1���L�:L�9�d�t?J�'x�i�(�H�Lw�J��}sv�����z���&7S����B�`�s9�����7�fjq���S�<��sGI��?o�S9s����-//�/��
<X\�����C���?�g���gK�Q�(]����*R�a)h� ��+#^����T�d���X)k�%Y�e���
�v��2���T[8��y��%��Vz9����#���k�GI(:��d�=�(��}7T�}�5m�����e��5���F�a^�o.�F���O��*jt����N�?i��~Y;fR!�q������o�������?v\c�A6��������Zm<n������\gf������u���kg��g��P����!�l4��F�8{Q��i�P�)%����hd������\��������<un\��z����J[f�����d�L$�7�B�=��{�Zm�iH�*
k��[7H��lT�a���3�vD��1}^��Y��^��6�� ��F��,I��WEr%���X!����p�D��i0�
2������jxy5�K���7g'g�F���_��aC
�ItG��/�����d����yjo��V��2h�������j��,�K��U��!�7"�n��9�;�����D�+'�_F��?����U
[�i��Km���p��b��e�s�Mt���<���t`�q�2E�61��;���Zm6��)1Q�U�������f��h�gc�K$Y[�LxA��v�����p����S p����E�OB-�;��Wr1��b��J��#h*�cj��U��1��/hw����m�ES�rG����Q�&�W�����G���GG���Hs��;�����K�iu�l$<ct(
~B��8�*��U�3�)��#s�Fr����#�T Y�#vZD�I�\����8$�o�y�>`Jk�R��(����Sl�K�m���$�N���!���cs�t�47QfB���������k:>�@U��,�&������;�t��>r�o��GD9I��a�j�E(4���e��w���r �&i�s���c�l6�l��3��$����{����Z4�N��iIF��Vz}����r�������9>��:�n(� ����/`[�����*���f`�f��u��U�$u-�+�g9�u�#��Y����V��8������ �5i�]r��'go��?�����?�(jG�Bz�P�8(�����M�'���S�?g7��O��'�8����
W+y�/��<�u������(Yi?�o�����Hz�/�xM����A��l��5�nbtgV�����O�?�����e�������[��������"���x���x;)w�i��6n��JJ�Q��Nv*�x|P3��� ��+R�E�j��,����A�
�z���a �:���I�#�����&R��0�]���]�xh����,���������,�L��q�j=0B��^�^L��D�7����*�c�<]<�k����=,r
X����=|^�����=�^�R�w�:4������2�--I��?��c�>�-)�TM��=��{!��p��vO��� �y�g�4�\��nmP�0������Y�4{��F{\oO�N��n��xf�]�u���y"��o�W���gwN���)�+�����E���6Zq�������K�T����D��e�,�Q�T}1�R��� �[����b&��%���E�q�70E5�������/�P�D��B�9����������-�,� ��VK�O�& �/w�0���Q!��4{�^]���i�\]�����L@$p\��k��S���x"\B<MJ$���KH���z�/1���X^B+� ��)���������e������Fg�DH�Bk��u�Q�$3�4c�1�D5�O ��K�_�3��0����|��R���G�����E�bz��4�����Z��I:�n�N�
�rc���r�W.��_���x���q,����F}�rZN�������m��{���*���-��� #c0���-����GI�V��)�r 8�Ma$=����g���D�j��� ��pz�
������;�K�����Z���V�2�
��_���Am���6�jXt�-��V�gj����AW�Z���6��];X|cX|cX+���E4��m��V���V���V�������`���a���a����V�Y�Z��a����V�6��my5��k�.�Z���7�MM;X|cX|cX+i�����J�1��4cX+t�|t;v�V�2��k�n��1hn��1�
��]
��.�6��Y
�U
�Q
i%:|���� k%�JCt�"���!�Rk� �J���"���5�ZI0���`k���|���5��Vk�a���Z1o�/��;P+�eX+�eXt��_D#�V�������V
�� �n��V
�a��Z��_D�n j%�J60��lh�~�7;?1��l`X+���V�v�����
_����amV[���w�����6
k��4��s��8k���-��r.���dX~@?�.x<���am�A�������oo���e���MnGUCnGTCZ �6_�
��J�2���eX+t;|t���4��k�.�n@���`���� @�"�^�\�����m!l��,�Z#�r�@�^��Z]�02c�6�H���`;���]��mr� a��!��V��
����C!\tw*�k��;����
�Z�w�B�V��N���
�ZI�� �Z��;!\tw(�����Z��� �Zi�� �ZI�
�Z��� �Z1���p-����������l����l����;��Ym���r.w(�k������N�p�����p�����tw2����n������������HN�e0
��jx���tcYro
��j5������Q�l�uR�l�-T�����P,�59����J,5��xt�F,[5*r����x�JW�*��m8����ycq��t�����l��f�[��q�����g�e�_���j�9C����,�89��M�EU��"z��{Q�1����/i�����`�go*1��x��&���@K�����V�7!���f������OSe�y�&��i��5��������2��w��o�XP)�����/�~&q��-���e_�#�A�����^������q~��M3Rp�{���<���������fU|�������;�9������w�<�W^T��sn��@��� �)"l�N������ ��p7�>�{�������G�wr�����$�e��X_it�����p�6k�������k�J�Cm<�l6����W;�Y��5��n�u�J-��]�t�mW��_��@~�(5�R��R�}�m��V��_��^��L7x�r���H���w�^��������b�z��#���I���g<�
�r �b0h����4m�����H�L�� On Fri, Nov 15, 2013 at 2:42 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor <rod.taylor@gmail.com> wrote:
The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB.
Good. That's meet my expectations :)
You mention that both master and patched versions was compiled with debug.
Was cassert enabled?
Just debug. I try not to do performance tests with assertions on.
Patch 7 gives the results I was looking for on this small sampling of data.
gin-fast-scan.6.patch/9.4 master performance
=================
the: 1147.413 ms 1159.360 ms 1122.549 ms
and: 1035.540 ms 999.514 ms 1003.042 ms
hotel: 57.670 ms 61.152 ms 58.862 ms
and & the & hotel: 266.121 ms 256.711 ms 267.011 ms
hotel & and & the: 260.213 ms 254.055 ms 255.611 ms
gin-fast-scan.7.patch
=================
the: 1091.735 ms 1068.909 ms 1076.474 ms
and: 985.690 ms 972.833 ms 948.286 ms
hotel: 60.756 ms 59.028 ms 57.836 ms
and & the & hotel: 50.391 ms 38.715 ms 46.168 ms
hotel & and & the: 45.395 ms 40.880 ms 43.978 ms
Thanks,
Rod
On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
hlinnakangas@vmware.comwrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and
exact
false values will be passed in the case of current patch consistent;
exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next
version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before
this, so
that we can assess the actual benefit of this patch. And a new test case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.Thanks! A couple of thoughts after a 5-minute glance:
* documentation
Will provide documented version this week.
* How about defining the tri-state consistent function to also return a
tri-state? True would mean that the tuple definitely matches, false means
the tuple definitely does not match, and Unknown means it might match. Or
does return value true with recheck==true have the same effect? If I
understood the patch, right, returning Unknown or True wouldn't actually
make any difference, but it's conceivable that we might come up with more
optimizations in the future that could take advantage of that. For example,
for a query like "foo OR (bar AND baz)", you could immediately return any
tuples that match foo, and not bother scanning for bar and baz at all.The meaning of recheck flag when input contains unknown is undefined now.
:)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false
instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return
recheck. This meaning is probably logical, but I don't see any usage of it.I'm not against idea of tri-state returning value for consisted, because
it's logical continuation of its tri-state input. However, I don't see
usage of distinguish True and Unknown in returning value for now :)In example you give we can return foo immediately, but we have to create
full bitmap. So we anyway will have to scan (bar AND baz). We could skip
part of trees for bar and baz. But it's possible only when foo contains
large amount of sequential TIDS so we can be sure that we didn't miss any
TIDs. This seems to be very narrow use-case for me.Another point is that one day we probably could immediately return tuples
in gingettuple. And with LIMIT clause and no sorting we can don't search
for other tuples. However, gingettuple was removed because of reasons of
concurrency. And my patches for index-based ordering didn't return it in
previous manner: it collects all the results and then returns them
one-by-one.
I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
out that I don't understand how GinFuzzySearchLimit works. Why with
GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
a bug?
------
With best regards,
Alexander Korotkov.
On Wed, Nov 20, 2013 at 3:06 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov <aekorotkov@gmail.com
wrote:
On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
hlinnakangas@vmware.comwrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and
exact
false values will be passed in the case of current patch consistent;
exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next
version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before
this, so
that we can assess the actual benefit of this patch. And a new test
case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on additional
information now.
2) New API with tri-state logic is introduced.Thanks! A couple of thoughts after a 5-minute glance:
* documentation
Will provide documented version this week.
* How about defining the tri-state consistent function to also return a
tri-state? True would mean that the tuple definitely matches, false means
the tuple definitely does not match, and Unknown means it might match. Or
does return value true with recheck==true have the same effect? If I
understood the patch, right, returning Unknown or True wouldn't actually
make any difference, but it's conceivable that we might come up with more
optimizations in the future that could take advantage of that. For example,
for a query like "foo OR (bar AND baz)", you could immediately return any
tuples that match foo, and not bother scanning for bar and baz at all.The meaning of recheck flag when input contains unknown is undefined now.
:)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false
instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return
recheck. This meaning is probably logical, but I don't see any usage of it.I'm not against idea of tri-state returning value for consisted, because
it's logical continuation of its tri-state input. However, I don't see
usage of distinguish True and Unknown in returning value for now :)In example you give we can return foo immediately, but we have to create
full bitmap. So we anyway will have to scan (bar AND baz). We could skip
part of trees for bar and baz. But it's possible only when foo contains
large amount of sequential TIDS so we can be sure that we didn't miss any
TIDs. This seems to be very narrow use-case for me.Another point is that one day we probably could immediately return tuples
in gingettuple. And with LIMIT clause and no sorting we can don't search
for other tuples. However, gingettuple was removed because of reasons of
concurrency. And my patches for index-based ordering didn't return it in
previous manner: it collects all the results and then returns them
one-by-one.I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
out that I don't understand how GinFuzzySearchLimit works. Why with
GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
a bug?
Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.
------
With best regards,
Alexander Korotkov.
Attachments:
On Thu, Nov 21, 2013 at 12:14 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:
On Wed, Nov 20, 2013 at 3:06 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov <
aekorotkov@gmail.com> wrote:On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 14.11.2013 19:26, Alexander Korotkov wrote:
On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
hlinnakangas@vmware.comwrote:
On 28.06.2013 22:31, Alexander Korotkov wrote:
Now, I got the point of three state consistent: we can keep only one
consistent in opclasses that support new interface. exact true and
exact
false values will be passed in the case of current patch consistent;
exact
false and unknown will be passed in the case of current patch
preConsistent. That's reasonable.I'm going to mark this as "returned with feedback". For the next
version,
I'd like to see the API changed per above. Also, I'd like us to do
something about the tidbitmap overhead, as a separate patch before
this, so
that we can assess the actual benefit of this patch. And a new test
case
that demonstrates the I/O benefits.Revised version of patch is attached.
Changes are so:
1) Patch rebased against packed posting lists, not depends on
additional
information now.
2) New API with tri-state logic is introduced.Thanks! A couple of thoughts after a 5-minute glance:
* documentation
Will provide documented version this week.
* How about defining the tri-state consistent function to also return a
tri-state? True would mean that the tuple definitely matches, false means
the tuple definitely does not match, and Unknown means it might match. Or
does return value true with recheck==true have the same effect? If I
understood the patch, right, returning Unknown or True wouldn't actually
make any difference, but it's conceivable that we might come up with more
optimizations in the future that could take advantage of that. For example,
for a query like "foo OR (bar AND baz)", you could immediately return any
tuples that match foo, and not bother scanning for bar and baz at all.The meaning of recheck flag when input contains unknown is undefined
now. :)
For instance, we could define it in following ways:
1) Like returning Unknown meaning that consistent with true of false
instead of input Unknown could return either true or false.
2) Consistent with true of false instead of input Unknown could return
recheck. This meaning is probably logical, but I don't see any usage of it.I'm not against idea of tri-state returning value for consisted, because
it's logical continuation of its tri-state input. However, I don't see
usage of distinguish True and Unknown in returning value for now :)In example you give we can return foo immediately, but we have to create
full bitmap. So we anyway will have to scan (bar AND baz). We could skip
part of trees for bar and baz. But it's possible only when foo contains
large amount of sequential TIDS so we can be sure that we didn't miss any
TIDs. This seems to be very narrow use-case for me.Another point is that one day we probably could immediately return
tuples in gingettuple. And with LIMIT clause and no sorting we can don't
search for other tuples. However, gingettuple was removed because of
reasons of concurrency. And my patches for index-based ordering didn't
return it in previous manner: it collects all the results and then returns
them one-by-one.I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
out that I don't understand how GinFuzzySearchLimit works. Why with
GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
a bug?Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.
Attached version is rebased against last version of packed posting lists.
------
With best regards,
Alexander Korotkov.
Attachments:
gin-fast-scan.9.patch.gzapplication/x-gzip; name=gin-fast-scan.9.patch.gzDownload
��X�R gin-fast-scan.9.patch �}ks�8��g�W s�f$[��~���r�����������rQds,��J����~�
)
�<��g\E"��F��/4��?����?��4��G+�w�����A��q��'��$f�\�E8���l���'~0�?��l�
��������|���� ����������O�����-A�a��iB3������V�}$�+�����^�d�^r*����lLb?^����7��������/)��s?�+o�@��^���#�JA���z������=u����~1�/�>�}�|�&�0��'�y,�0��(�A��L�@������[���HH�[�V�������e��Sl�g"\��Q��3��_G.|�pG,��1�*^�{Q��R��� ZG�`��y�^���6�ZK�����
o2�s�n�7)>�x�D�������`��[����H�#|t/^I �_iBR\�*�Vn���IO8��������Go��Q�(K�&�rrG�����2���XH/@�b�`<�@����������� ����p���2����Pd���%�����m��W���Or>�C����_�����1�'w�f�B�3H��&}��tp<����^�����255
��1 ��t�E8��/�_�;�Q_z�L�L�r��@�t�f����)��4M9LF������)�*2)�7�8@�K� �.�i���D�jR�Z5$�"�Y��`*>�j��v�����cb��:������@2&k�@
���z%�'��[9m��+v�=��������-�`��8����rH�t���z��=Z���(\,A��>�"j�=������z!��A>L �?�c�����A��o,�w��Q[�8����9<.��7!��
�S`h9
���?.��@���\+P�0l#���]��T���-���� �"����2t��m�����~��X�3��g@?Z��6Z���T/&��V� �M ]�IU��x�UW��[�&�������`�?NN��I��/����4,V�B�m���&|�8 alB�d"�d���W35�db��E0��(
�-�y�U
�m�������Qj���pu�la~�E�� :����x�������k���D�h(^��"E�4.<�^e��j
Z[���d���O���AVQ�41�ht� d��*"L��#
6�cz6���n>��h�)v�\Y�6� ��*����JC t_����#�:�d�K�4%�E@��M�Jl�K>y��U�eI�'���0�w �����f%�;�w��]w3u�PF���� C�� +��N2&9�X�i"i��w`!���}V�*���_�1������{\�2��hO6[���:{�N���^���N�r�l�,��\�������
]�
'w�W���jL��AF�����"B�����{5bt�~o��i�� :tr��� {@�G'�gg>O���r�`b��S$�
����f���W5� �
&���ah���U2���p���BT3@5rP+�j��S�3�v�v�<��r<�B@AT��\�a�
/Y0�[/�����uU�������^�<zA,�n���'��z1&���/Vp8U��Q�c������'�,
za�S�U����H8|��.r��_�I�{��������*xN������6��C��T{TT��������R2�@�(��F��B�9g��) ��X:���S��F���<�K-����ay��^��>%�L:Js�����<okhFt���K�b)�/j#q��5P�������NFx�dj�O0�>��P�(@��>3����u��k�[�^x�z�z������e�������1L�t���yl$��a�*�pG���D��x�&
9����T�i "5��\����X�B���X�����b��"��W>P�o��Z���%������E�~0l���#����&B�V�*gL>�Q�ZYzq����)��p�N.�7�@�6��8"UK�a*���v@���K�<��"�Ey���f��t+���0��������XT��I�*_�tt�|���_����������D�� ���%��r�T� �q���^{"c�(�L�b ��`��t~Q��^f8������4D�Z�I�E`,�UdC?!�,!���\�u-%c%jS��ee
����J��`�N���Q~��X�����s�h�2�<�)Y��#Y�Sx��z8�*��<�����=��{t"(1���)@!$����Zp �L�|!�4xT�8�Wk�����g����nc\#����Fu��>$��e.��Y�<M��2M���_�C+����E6+��
�
�Q ��.W��
ya����+��c����T4h'�� g���*+���Q��6�aOvG��� �@�-�2@��i�Ul '�-
-+�Wse��A���w�O�&h,k�M����R i����x��8v��S&����'��t�}o\�n�k?XNyV���������zZ`iR e����&���jr���/h���rl�6bn����� �K�c^&�V�]�C�0y�
�
-U��l������4�Fm��.9��� �������!�!�����3���|���G����m���I��z}N%���-#~��������3���{/�X��X�!�]G����Oe��9i���o4�a�;mm��)�������^��n��c~��*h��G�K�}�*\}���?�/I�Q���DC���x.q���Fy�;�_����W� XZ��1�i"�%�o�<� rw���P��MJ��O0%������C= UH������4~� �rL�dJ)����H�����#H�&���rm�u>��]���bY�<p,>���V]X����t��������WQ*���0�(_�F�h���cHS�N�|i�7+�l'���o�[��5 ���j��H��@%�YL[�gE�����$���"�ZM��AV|�"���C�m�R�&
�e�I/NR��L�N��J�z!gG���UQ����]�u������,�f�lF�V71 �3�wlL��>)0�������`����YoX�h�Z�r���A2�]14���������{ n!�T
�k!%6�����0��0���nrY�V]O��o�^<�}��oY�����z����%�p`���@@�+�DW��8�-�M��E�pt�T�M�
~���1���&�����N������zs�5�@��� ������`��R������o�79���O'*�,���L���^O3����lU3�_I��P�$��Y�(�}5����3�6���u�P�6>5�����&���|0�%����!T}��p�%�7�>�y>z��z!��r.@���b����F��y�TVv�k<�l0X���� k�5�hh<��V������=n}���g�������#�����������I{����V5d7R�Q�%RL��sP�
M���
����n�a�-����3�����5�S��%08�A��K �@�����y�Q������;���Q2�����-<�������OM*Q�p"-�Mxa������T�z��Q�D+R��*'�S`��Q���\0e���:]��t=l$�X!K�V\����W�����.����i���@&�?��f�c;�A3"B�{�VB'�F�'o���j����$�h��z��
R%��V��5���'�P���?�_~� ���G+]"��p������[��{Uz�F���h�O%cSk�M��q&3>T ���'|�\���ww��6
z�o������� w��yP�~/���W�������mxV}V��7yF� �3���"]^� <�.6PD��,�e���� n�m�(^�Su�����v"��mD&����)M.d���+I]q�A��k�
�Q�g�=�������x��@JXy��3���$^�|2����BJ���$�a�)D�������7�8I������/EryFz�F�h�/�*� R� �=�����7�}�kh���n�T����Yb���pD|q{s{4
5'`�j���6+�`�#X�5���X���H����O4GR ����� /�:�� ��M��������
\
C�/d|N�J ��v��PN.m��R[���CX`Z��R��`%���U`���dz����������[ZV �)Bt�,�8H���V�UO���*mI2���1���vYuX�rA<�Q"D
�:T%6 "s�9
j*DW��s�&��d�,�
vmL��F��5�����*Uh}���zA1u��+�Jn����]�/�i��B.&��j���PN�������J"�T����,�H�JYgFE��Q�P� �nb�Id1�y&T�|��G�j���@�kr�!B����B���v��k�x�b���k�=gMm�ah�o��}��2K���3p,�&T��z�g,��w��~�iUE��~�rIl�;������RO�����M&SZ/���,�w�X
�$ej�YT�bR$'�w��7���}����4�7UT�K�����L�����fnW>i�)�����W��v��[
F.��d�'�0���V
e:-�u���Mn�w�z���Zo8��T�S�>+/0����I���x�t�l�G��Q�a<^�+��`G)�V|����U-�s2f,�V�q�SRmy^��S����L�{P��~�.�/��l��^������u�/k�z�V�mQ�A��P�z[rV��I�Z�N��1�[�N����y/�t�?�u���*�N������ev�&�g��.p�,/BJ�)t�cS���O��t�����J��P]��TR,��|bt)o0A%v"��.vp��1�"��A���Ia�
��V]�mm6���T��B��w��? \���"=��LH!�i���xBRyj��d��rM��c����f0����2�,vt�����<���������>c{/������CDx�e���jG9�<���r�����7���6/,l�����\Uf����_���ps3IK��^�+��� �=S
��r�{�����5@�\�[���������u��&U�kn,M:�x��6����5�����6��xi�]�n�0-h��&K[^s%b�)*H����$��OR��f���C���-���-��Y5.��[��������w�|'FEm�<�e<� m�;�����O&��nMu%��@`������b�|�f?kF-�gX�SF��AB��A�����_~Q:�9����53�e������<��F�n3��~����l��J()������2�4Klz �^"�f������q��������\��������F�^�����b=��m1���_�H����8�����aoJ;��Z�����
�[H�1�q�*�%�`�$�� ��qV����C*g���0��4[�.�?dF������,�����S��F�Qdo���RDT�
2"���a
7'r|��
�a�x@��T��dmT�Wj�%����IuP������<������tf*tL���!��LM�j�Rm���c1�I�8�Q�{^3�;,E�EI"n7?�p���a����� m
�Z�
�"C\����$�D�D-�15���X�+�����Yl;V� �5��O6,�$�����0�������K�g��������� �%'h�nJ/�2��~+_8����A~>qh�6��:��,��(B��k}�7M�?lX�Is��8g���d�*��I �����Z{�-�s�s���?.�\�l�,����Z~f7����L���i>���Qy�?��8?Z%5%S�?�o@�o��
<e���u������`7�m~�N�W�M/��}9�a.��`)����#y���CV��ip������FzleH��uH��H:q���>�����Hw}%�_�a�,j���#�*�%/5k�C�� �)m!�X��������&
�B�+;�t�O��A~��fW�V6nl�g[�qT�1b��F��AM6R:3���R��QYu�{
�m��ZFL���D�#.o�Y���Nxin����c�|�y����)������ ���g�C���������6N��u�J��*[�����$��fX��ml�`�6J����/��V�f������wV���?��>qc;3�B�5���P�1��W[�\���x�"WL�-��ZN�T�6�(� T
KIl�~������6Y6������b�:�� t�'�'"Q�MQ@�|���y-f�����ot;����&�����AnF��-j�$�]�$�s=���J/��NW���W�C�yiV$9�#� I��b�n�v`���T
+���=� {I�W��0e]YB�a-����NeC�\������3��J72�[�������tP�,@��`,�����"�
��1��E`�����%S�F|���z#����4����7fDP��8_����#��y���]"���$i����`��(����*8����GH���v�e�9v��������_P�X r�i�a� ��C�%K��_8F}��s �;2��C�(YC?5���G�Qv��APA@!�F�I(�cj������t�&#{�T$���Rkw�S���e�+�7������i�c���=�����1W��tm�q�bCyn?�KLg�N��������oW��|Q�{E���P&�5�O$�Aa�]Y^�l���L�h�G%;@3~�YA��wA^��Tdq6B��G��,����.5$�e����x�����6�&[�~���(��B�ug�CV����M��-����B;Y�n�: ��n�B|>&I�B[���2���h���\�S��eP��j*���w��������e������TOv�����v�S�����<���z�n��2���F�YTt��r]�s`�y,�@� �����fD�qD��y[$
��=�+ ��[a\��z�^��m���f�[o7�}�����$�,N�3� &D�J�*�e|���}��D-�����^�C�������L�rj��)#I:������J .��d��a��x��2�q���l��M��'t~������J�����gGW��g��'���qO3����m�4P��f�v�9
9�vf�I����nn� f��W�d�-(�Q";j�[�����MU`���T�X���R��~�-�����'����Z�-[5R�5��Xc����m��/i����Y)�XR&��as�u��x2k����2�m��`YY�
-�_�
&������vf�U�������d�d����yM���}I��JH��-�O�y=M�zO�p�8\)aU
e6R���E��z[��^!����3}��5uf���7��|s
���o
�l(t��Y[��^_v{�����(�
����������'�������^�<r��Y]�qv��py_}V��V���7��&��L?h�5X��*���z�>�RU� ���8<������\��8?�0�����U1���}I��1��A��]�^^�]ir�n#���+^%���GP�KO����if-~C-&���H}��������.whe����x�c�=-����rxu��7��?�������aS���>V�mKC�n����_-��I�8��z�W
����_��M��8B?v�sNU������C9/�9�y�N��.D�W��`U�w�q(�t�$Fh~0�����R������N�K��]����i��?�]���k��,�M-��W���y>���!�<{��5�]U����\��8����=2�0B���]�x3���������Z�c�W@�G��W�XH/�gZ.��$O���&�D!��x���^����7��+e��;|+�_�I/jh��m� �qH����&K\��
quy-��u,��w
�����[L/������O��L?JB��+��m���?���D�X��QG30�H��Zy��X�[Ji�#pM�J�I�;��~]d;\mi���)f:�.Ww�7j�XP����Ag����}��*������,���"���
��NW������,�(W�y��s�D�5����e|[�
���{/�Z�����k+��m��m���� F�.��n1K��.����/�T#��z��:�=�Eii<J��t�n�$�+�VIQ o/����*������E�����N�47)~8��tR�2���aLH��n5����m��s�����y��
�T�/.�/�`
\�M��OJ�V}#����zQsI���
P��P�bi1��u�v��Z>��+P���yXf����5����K���+�y�=L�v��Nwf�T�<5���|�!��Fz5-����������N�H��f�W�%���'���������?i���g�&+"����8��pqv����-J[Mm���9v���j�a_6��L��n�d��.}Q�����a���'|��vyv�5h���Fk��{��~��n��i�d���8� ���7�dA���z%���������>����az`O��*[�>��u������p8I�C�g�0yZ�����������+���^}8;=;�~w|���kv
����:�e�+��:*�����('���Z&Jh2Q]xv���N]W��[��]��T�C��~r��s��������EP������vB��V�$�����:�]`�W�E:H���:�!���������9���*=X�#{� F�Ys�M����J���vQ ���������TQ*��r'� h�������?��H���������� ��=���={0=}+���V��9���6u��y�K�6f4 �_����5����iq��������w������EB�% �����qX�th�_���U �� �WH���8�* �U�;���
�s�-
���UI
�"�"�
r��u.g_#�C_I� }Dz8h�R�4�/�� �9���K���=���^���~�0����oO��?���uN�[���U M�#�]�w� t�}��������CT�Q�������9\��Y�� ��w���7���?nuK����L�`�{�|���A��Q�-���.�����7����O]�����.O�;�M.?���b+���_D����h�hu��u���I2G���t#5��6�/m��>��0�o�Br o�K+,��T�?={}���w�����?.)OD�Sz�X��d8
�+��tA� ��S�xf�����K�K6j_���7��p0�-��M-h���G�H����8����k�l���Hv�]��R����,��=����N`���O^6c���o���������V�%��*�$�����s�������x��,�:o��
���~O�m8U����S�_�AY�7������L�y0\Ucb����B�Bb��Ypl�1�VXU�/��w �m1�>�dcX���'��0Y���7�|~@�>�}r�q����������z�y������=�2�r�&�a;�����o��/��cg;��,&q��v}04��{�����3���������:�`�
����
����9���O�K<�t�>x �l+�����tc�c? ��a�mpg4�[���n�*%�\�]��+�):��c��~��F>�9I��cd?o���b1H������]B���:�>���/-��6��Dp�"I���q�'���6�� Y���v�,�g�4w_�X�J"$h�������[������J�Y�?�N�Ao$�A� .�b��DY��L�>�<��~u�Ez��
�x}�W5�RW���z�����>�:��RVq~�Z�����t@W*M13^9}����7 ����c�e��sr��\��(�~A4��d0M�/���2�4H��?H���0�w�?�wSrK��:/m�������k����_x�Q�H�1�N���M<3c�����w��&`��<���n[q/@���q)�4+�yA�� e��UC�?n)���W���@s*�X|
�l
2\
��tz���3�"dE��u��O ��%��������� �Q��r ~$���3����i~�W�/u_�1���/���~����S/��&����:*����:����{������ps��X�Wn�D�=�4G���6�M�Ey�"�[l:C������J�����)���E����u+�i?
��G�%��tz--u��m���l`����`������?*�~�
��\�u"�a]�m��c;���\
�B��u!�����
�@]���.�jXr{������ �� ���6�C���m��N���N���N����o0t�u_�u_�u�o�?��������_�u���0���W�:���N�2���a�?��Q�
�������I���������I�1��m� r{n�N�2���Bn��b�.�� ����\
���>�6�o+�p;�r;��I%���`���N*�a�T�:Y�>�a�\#$�N�a��:i��V�U>�X'
��N�a��X�?���m�1��cX�1��N��������2���2���N�?Dk�/�7�u�7�u�7�u2C� �n�
�d ��@0��n����6A�t�:��u�
������O���I70��=�@]��D/�:���.�m���w������6
�2�4�Sp�����-�u
.�)�dXy�8�>D���b�4��<hXy�����Z����h.��I��� ��NJ���3���iP'���NJ�a������%��A��eX'r���N����N� }�0tmGXe��_����Bzs�������a!L1,D�����
��eO�=���}E��s
�`��q�p��Y)\rQ
���G��u �Q�p����p����p����R�N�>����*����O
������u!��p���(R�N*���p�,��I�:i�G��uc�'��$��'��@��J�:����u�
�'���Q
���G��um�)��\>���<<��vzE
�I�>����'�[B��L��W
7W�����+������y)'������u�~|��Mh���ee���7�C����G����d'�i}�x�P�o�_�e�.g|&��l5p����k��4��Y-C�4qaD+Y�d�-�
p�[��M���k�r�cty���b���-__�����L�U�/��!gH�����g��5-8��.��"z��uM\%�z/�=�������s|��� k��(o�_�� ^��L�����j� �5Y����M}j���BuF���W_d����w���V��sh<��{��Kq~�;�.xpt:���Qnp��/�blA�~"x�z�{����}�z��j�5�9�8��^��s�.���u�2��������N�NDu����C�S���� �]D��H��%�<��� #|�t��v���x�t���n+7j��2��@2�6`��e���%+�����-l����S������J��6ek,��F:uf����n0�����6��:|���\
qf��O���_i� 6���4��q�6�4������������g�~��G��\�y1��;��T�$�s<���M� ��D��]�����aV������"�<� On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
On Thu, Nov 21, 2013 at 12:14 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.Attached version is rebased against last version of packed posting lists.
Quick question: the ginEnableFastScan is just for debugging/testing
purposes, right? There's no reason anyone would turn that off in production.
We should remove it before committing, but I guess it's useful while
we're still hacking..
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 14, 2014 at 11:07 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:
On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
On Thu, Nov 21, 2013 at 12:14 AM, Alexander Korotkov
<aekorotkov@gmail.com>wrote:Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.Attached version is rebased against last version of packed posting lists.
Quick question: the ginEnableFastScan is just for debugging/testing
purposes, right? There's no reason anyone would turn that off in production.We should remove it before committing, but I guess it's useful while we're
still hacking..
Yes, ginEnableFastScan is for debugging and testing.
------
With best regards,
Alexander Korotkov.
On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
Attached version is rebased against last version of packed posting lists.
Thanks!
I think we're missing a trick with multi-key queries. We know that when
multiple scan keys are used, they are ANDed together, so we can do the
skip optimization even without the new tri-state consistent function.
To get started, I propose the three attached patches. These only
implement the optimization for the multi-key case, which doesn't require
any changes to the consistent functions and hence no catalog changes.
Admittedly this isn't anywhere near as useful in practice as the single
key case, but let's go for the low-hanging fruit first. This
nevertheless introduces some machinery that will be needed by the full
patch anyway.
I structured the code somewhat differently than your patch. There is no
separate fast-path for the case where the optimization applies. Instead,
I'm passing the advancePast variable all the way down to where the next
batch of items are loaded from the posting tree. keyGetItem is now
responsible for advancing the entry streams, and the logic in
scanGetItem has been refactored so that it advances advancePast
aggressively, as soon as one of the key streams let us conclude that no
items < a certain point can match.
scanGetItem might yet need to be refactored when we get to the full
preconsistent check stuff, but one step at a time.
The first patch is the most interesting one, and contains the
scanGetItem changes. The second patch allows seeking to the right
segment in a posting tree page, and the third allows starting the
posting tree scan from root, when skipping items (instead of just
following the right-links).
Here are some simple performance test results, demonstrating the effect
of each of these patches. This is a best-case scenario. I don't think
these patches has any adverse effects even in the worst-case scenario,
although I haven't actually tried hard to measure that. The used this to
create a test table:
create table foo (intarr int[]);
-- Every row contains 0 (frequent term), and a unique number.
insert into foo select array[0,g] from generate_series(1, 10000000) g;
-- Add another tuple with 0, 1 combo physically to the end of the table.
insert into foo values (array[0,1]);
The query I used is this:
postgres=# select count(*) from foo where intarr @> array[0] and intarr
@> array[1];
count
-------
2
(1 row)
I measured the time that query takes, and the number of pages hit, using
"explain (analyze, buffers true) ...".
patches time (ms) buffers
---
unpatched 650 1316
patch 1 0.52 1316
patches 1+2 0.50 1316
patches 1+2+3 0.13 15
So, the second patch isn't doing much in this particular case. But it's
trivial, and I think it will make a difference in other queries where
you have the opportunity skip, but return a lot of tuples overall.
In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if
the logic in scanGetItem/keyGetItem looks correct to you. After this, I
think the main fast scan logic will go into keyGetItem.
PS. I find it a bit surprising that in your patch, you're completely
bailing out if there are any partial-match keys involved. Is there some
fundamental reason for that, or just not implemented?
- Heikki
Attachments:
0001-Optimize-GIN-multi-key-queries.patchtext/x-diff; name=0001-Optimize-GIN-multi-key-queries.patchDownload
>From 53e33c931c41f5ff8bb22ecfc011e717d2dbb9fd Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 15:41:43 +0200
Subject: [PATCH 1/3] Optimize GIN multi-key queries.
In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'",
as soon as we find the next item that matches the first criteria, we don't
need to check the second criteria for TIDs smaller the first match. That saves
a lot of effort, especially if one of the first term is rare, while the
second occurs very frequently.
Based on ideas from Alexander Korotkov's fast scan patch
---
src/backend/access/gin/ginget.c | 465 ++++++++++++++++++++++------------------
1 file changed, 255 insertions(+), 210 deletions(-)
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4bdbd45..4de7a10 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -68,29 +68,6 @@ callConsistentFn(GinState *ginstate, GinScanKey key)
}
/*
- * Tries to refind previously taken ItemPointer on a posting page.
- */
-static bool
-needToStepRight(Page page, ItemPointer item)
-{
- if (GinPageGetOpaque(page)->flags & GIN_DELETED)
- /* page was deleted by concurrent vacuum */
- return true;
-
- if (ginCompareItemPointers(item, GinDataPageGetRightBound(page)) > 0
- && !GinPageRightMost(page))
- {
- /*
- * the item we're looking is > the right bound of the page, so it
- * can't be on this page.
- */
- return true;
- }
-
- return false;
-}
-
-/*
* Goes to the next page if current offset is outside of bounds
*/
static bool
@@ -447,8 +424,7 @@ restartScanEntry:
page = BufferGetPage(entry->buffer);
/*
- * Copy page content to memory to avoid keeping it locked for
- * a long time.
+ * Load the first page into memory.
*/
entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
@@ -518,90 +494,87 @@ startScan(IndexScanDesc scan)
}
/*
- * Gets next ItemPointer from PostingTree. Note, that we copy
- * page into GinScanEntry->list array and unlock page, but keep it pinned
- * to prevent interference with vacuum
+ * Load the next batch of item pointers from a posting tree.
+ *
+ * Note that we copy the page into GinScanEntry->list array and unlock it, but
+ * keep it pinned to prevent interference with vacuum.
*/
static void
-entryGetNextItem(GinState *ginstate, GinScanEntry entry)
+entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advancePast)
{
Page page;
int i;
+ LockBuffer(entry->buffer, GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
for (;;)
{
- if (entry->offset < entry->nlist)
+ entry->offset = InvalidOffsetNumber;
+ if (entry->list)
{
- entry->curItem = entry->list[entry->offset++];
- return;
+ pfree(entry->list);
+ entry->list = NULL;
+ entry->nlist = 0;
}
- LockBuffer(entry->buffer, GIN_SHARE);
- page = BufferGetPage(entry->buffer);
- for (;;)
+ /*
+ * We've processed all the entries on this page. If it was the last
+ * page in the tree, we're done.
+ */
+ if (GinPageRightMost(page))
{
- /*
- * It's needed to go by right link. During that we should refind
- * first ItemPointer greater that stored
- */
- if (GinPageRightMost(page))
- {
- UnlockReleaseBuffer(entry->buffer);
- ItemPointerSetInvalid(&entry->curItem);
- entry->buffer = InvalidBuffer;
- entry->isFinished = TRUE;
- return;
- }
+ UnlockReleaseBuffer(entry->buffer);
+ entry->buffer = InvalidBuffer;
+ entry->isFinished = TRUE;
+ return;
+ }
- entry->buffer = ginStepRight(entry->buffer,
- ginstate->index,
- GIN_SHARE);
- page = BufferGetPage(entry->buffer);
+ if (GinPageGetOpaque(page)->flags & GIN_DELETED)
+ continue; /* page was deleted by concurrent vacuum */
- entry->offset = InvalidOffsetNumber;
- if (entry->list)
- {
- pfree(entry->list);
- entry->list = NULL;
- }
+ /*
+ * Step to next page, following the right link. then find the first
+ * ItemPointer greater than advancePast.
+ */
+ entry->buffer = ginStepRight(entry->buffer,
+ ginstate->index,
+ GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
+ /*
+ * The first item > advancePast might not be on this page, but
+ * somewhere to the right, if the page was split. Keep following
+ * the right-links until we re-find the correct page.
+ */
+ if (!GinPageRightMost(page) &&
+ ginCompareItemPointers(&advancePast, GinDataPageGetRightBound(page)) >= 0)
+ {
/*
- * If the page was concurrently split, we have to re-find the
- * item we were stopped on. If the page was split more than once,
- * the item might not be on this page, but somewhere to the right.
- * Keep following the right-links until we re-find the correct
- * page.
+ * the item we're looking is > the right bound of the page, so it
+ * can't be on this page.
*/
- if (ItemPointerIsValid(&entry->curItem) &&
- needToStepRight(page, &entry->curItem))
- {
- continue;
- }
+ continue;
+ }
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
- /* re-find the item we were stopped on. */
- if (ItemPointerIsValid(&entry->curItem))
+ if (ItemPointerIsValid(&advancePast))
+ {
+ for (i = 0; i < entry->nlist; i++)
{
- for (i = 0; i < entry->nlist; i++)
+ if (ginCompareItemPointers(&advancePast, &entry->list[i]) < 0)
{
- if (ginCompareItemPointers(&entry->curItem,
- &entry->list[i]) < 0)
- {
- LockBuffer(entry->buffer, GIN_UNLOCK);
- entry->offset = i + 1;
- entry->curItem = entry->list[entry->offset - 1];
- return;
- }
+ LockBuffer(entry->buffer, GIN_UNLOCK);
+ entry->offset = i;
+ return;
}
}
- else
- {
- LockBuffer(entry->buffer, GIN_UNLOCK);
- entry->offset = 1; /* scan all items on the page. */
- entry->curItem = entry->list[entry->offset - 1];
- return;
- }
+ }
+ else
+ {
+ LockBuffer(entry->buffer, GIN_UNLOCK);
+ entry->offset = 0; /* scan all items on the page. */
+ return;
}
}
}
@@ -610,10 +583,10 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry)
#define dropItem(e) ( gin_rand() > ((double)GinFuzzySearchLimit)/((double)((e)->predictNumberResult)) )
/*
- * Sets entry->curItem to next heap item pointer for one entry of one scan key,
- * or sets entry->isFinished to TRUE if there are no more.
+ * Sets entry->curItem to next heap item pointer > advancePast, for one entry
+ * of one scan key, or sets entry->isFinished to TRUE if there are no more.
*
- * Item pointers must be returned in ascending order.
+ * Item pointers are returned in ascending order.
*
* Note: this can return a "lossy page" item pointer, indicating that the
* entry potentially matches all items on that heap page. However, it is
@@ -623,12 +596,20 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry)
* current implementation this is guaranteed by the behavior of tidbitmaps.
*/
static void
-entryGetItem(GinState *ginstate, GinScanEntry entry)
+entryGetItem(GinState *ginstate, GinScanEntry entry,
+ ItemPointerData advancePast)
{
Assert(!entry->isFinished);
+ Assert(!ItemPointerIsValid(&entry->curItem) ||
+ ginCompareItemPointers(&entry->curItem, &advancePast) <= 0);
+
if (entry->matchBitmap)
{
+ /* A bitmap result */
+ BlockNumber advancePastBlk = GinItemPointerGetBlockNumber(&advancePast);
+ OffsetNumber advancePastOff = GinItemPointerGetOffsetNumber(&advancePast);
+
do
{
if (entry->matchResult == NULL ||
@@ -646,6 +627,18 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
}
/*
+ * If all the matches on this page are <= advancePast, skip
+ * to next page.
+ */
+ if (entry->matchResult->blockno < advancePastBlk ||
+ (entry->matchResult->blockno == advancePastBlk &&
+ entry->matchResult->offsets[entry->offset] <= advancePastOff))
+ {
+ entry->offset = entry->matchResult->ntuples;
+ continue;
+ }
+
+ /*
* Reset counter to the beginning of entry->matchResult. Note:
* entry->offset is still greater than matchResult->ntuples if
* matchResult is lossy. So, on next call we will get next
@@ -670,6 +663,17 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
break;
}
+ if (entry->matchResult->blockno == advancePastBlk)
+ {
+ /*
+ * Skip to the right offset on this page. We already checked
+ * in above loop that there is at least one item > advancePast
+ * on the page.
+ */
+ while (entry->matchResult->offsets[entry->offset] <= advancePastOff)
+ entry->offset++;
+ }
+
ItemPointerSet(&entry->curItem,
entry->matchResult->blockno,
entry->matchResult->offsets[entry->offset]);
@@ -678,29 +682,48 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
}
else if (!BufferIsValid(entry->buffer))
{
- entry->offset++;
- if (entry->offset <= entry->nlist)
- entry->curItem = entry->list[entry->offset - 1];
- else
+ /* A posting list from an entry tuple */
+ do
{
- ItemPointerSetInvalid(&entry->curItem);
- entry->isFinished = TRUE;
- }
+ if (entry->offset >= entry->nlist)
+ {
+ ItemPointerSetInvalid(&entry->curItem);
+ entry->isFinished = TRUE;
+ break;
+ }
+
+ entry->curItem = entry->list[entry->offset++];
+ } while (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0);
+ /* XXX: shouldn't we apply the fuzzy search limit here? */
}
else
{
+ /* A posting tree */
do
{
- entryGetNextItem(ginstate, entry);
- } while (entry->isFinished == FALSE &&
- entry->reduceResult == TRUE &&
- dropItem(entry));
+ /* If we've processed the current batch, load more items */
+ while (entry->offset >= entry->nlist)
+ {
+ entryLoadMoreItems(ginstate, entry, advancePast);
+
+ if (entry->isFinished)
+ {
+ ItemPointerSetInvalid(&entry->curItem);
+ return;
+ }
+ }
+
+ entry->curItem = entry->list[entry->offset++];
+
+ } while (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0 &&
+ entry->reduceResult == TRUE && dropItem(entry));
}
}
/*
- * Identify the "current" item among the input entry streams for this scan key,
- * and test whether it passes the scan key qual condition.
+ * Identify the "current" item among the input entry streams for this scan key
+ * that is greater than advancePast, and test whether it passes the scan key
+ * qual condition.
*
* The current item is the smallest curItem among the inputs. key->curItem
* is set to that value. key->curItemMatches is set to indicate whether that
@@ -719,7 +742,8 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
* logic in scanGetItem.)
*/
static void
-keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
+keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
+ ItemPointerData advancePast)
{
ItemPointerData minItem;
ItemPointerData curPageLossy;
@@ -729,11 +753,20 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
GinScanEntry entry;
bool res;
MemoryContext oldCtx;
+ bool allFinished;
Assert(!key->isFinished);
/*
- * Find the minimum of the active entry curItems.
+ * We might have already tested this item; if so, no need to repeat work.
+ * (Note: the ">" case can happen, if minItem is exact but we previously
+ * had to set curItem to a lossy-page pointer.)
+ */
+ if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
+ return;
+
+ /*
+ * Find the minimum item > advancePast among the active entry streams.
*
* Note: a lossy-page entry is encoded by a ItemPointer with max value for
* offset (0xffff), so that it will sort after any exact entries for the
@@ -741,16 +774,33 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
* pointers, which is good.
*/
ItemPointerSetMax(&minItem);
-
+ allFinished = true;
for (i = 0; i < key->nentries; i++)
{
entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &minItem) < 0)
- minItem = entry->curItem;
+
+ /*
+ * Advance this stream if necessary.
+ *
+ * In particular, since entry->curItem was initialized with
+ * ItemPointerSetMin, this ensures we fetch the first item for each
+ * entry on the first call.
+ */
+ while (entry->isFinished == FALSE &&
+ ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ entryGetItem(ginstate, entry, advancePast);
+ }
+
+ if (!entry->isFinished)
+ {
+ allFinished = FALSE;
+ if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+ minItem = entry->curItem;
+ }
}
- if (ItemPointerIsMax(&minItem))
+ if (allFinished)
{
/* all entries are finished */
key->isFinished = TRUE;
@@ -758,15 +808,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
}
/*
- * We might have already tested this item; if so, no need to repeat work.
- * (Note: the ">" case can happen, if minItem is exact but we previously
- * had to set curItem to a lossy-page pointer.)
- */
- if (ginCompareItemPointers(&key->curItem, &minItem) >= 0)
- return;
-
- /*
- * OK, advance key->curItem and perform consistentFn test.
+ * OK, set key->curItem and perform consistentFn test.
*/
key->curItem = minItem;
@@ -895,117 +937,120 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
* keyGetItem() the combination logic is known only to the consistentFn.
*/
static bool
-scanGetItem(IndexScanDesc scan, ItemPointer advancePast,
+scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
ItemPointerData *item, bool *recheck)
{
GinScanOpaque so = (GinScanOpaque) scan->opaque;
- GinState *ginstate = &so->ginstate;
- ItemPointerData myAdvancePast = *advancePast;
uint32 i;
- bool allFinished;
bool match;
- for (;;)
+ /*----------
+ * Advance the scan keys in lock-step, until we find an item that
+ * matches all the keys. If any key reports isFinished, meaning its
+ * subset of the entries is exhausted, we can stop. Otherwise, set
+ * *item to the next matching item.
+ *
+ * Now *item contains the first ItemPointer after previous result that
+ * passed the consistentFn check for that exact TID, or a lossy reference
+ * to the same page.
+ *
+ * This logic works only if a keyGetItem stream can never contain both
+ * exact and lossy pointers for the same page. Else we could have a
+ * case like
+ *
+ * stream 1 stream 2
+ * ... ...
+ * 42/6 42/7
+ * 50/1 42/0xffff
+ * ... ...
+ *
+ * We would conclude that 42/6 is not a match and advance stream 1,
+ * thus never detecting the match to the lossy pointer in stream 2.
+ * (keyGetItem has a similar problem versus entryGetItem.)
+ *----------
+ */
+ ItemPointerSetMin(item);
+ do
{
- /*
- * Advance any entries that are <= myAdvancePast. In particular,
- * since entry->curItem was initialized with ItemPointerSetMin, this
- * ensures we fetch the first item for each entry on the first call.
- */
- allFinished = TRUE;
-
- for (i = 0; i < so->totalentries; i++)
- {
- GinScanEntry entry = so->entries[i];
-
- while (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem,
- &myAdvancePast) <= 0)
- entryGetItem(ginstate, entry);
-
- if (entry->isFinished == FALSE)
- allFinished = FALSE;
- }
-
- if (allFinished)
- {
- /* all entries exhausted, so we're done */
- return false;
- }
-
- /*
- * Perform the consistentFn test for each scan key. If any key
- * reports isFinished, meaning its subset of the entries is exhausted,
- * we can stop. Otherwise, set *item to the minimum of the key
- * curItems.
- */
- ItemPointerSetMax(item);
-
- for (i = 0; i < so->nkeys; i++)
+ match = true;
+ for (i = 0; i < so->nkeys && match; i++)
{
GinScanKey key = so->keys + i;
- keyGetItem(&so->ginstate, so->tempCtx, key);
+ /* Fetch the next item for this key. */
+ keyGetItem(&so->ginstate, so->tempCtx, key, advancePast);
if (key->isFinished)
- return false; /* finished one of keys */
-
- if (ginCompareItemPointers(&key->curItem, item) < 0)
- *item = key->curItem;
- }
+ return false;
- Assert(!ItemPointerIsMax(item));
+ /*
+ * If it's not a match, we can immediately conclude that nothing
+ * <= this item matches, without checking the rest of the keys.
+ */
+ if (!key->curItemMatches)
+ {
+ advancePast = key->curItem;
+ match = false;
+ break;
+ }
- /*----------
- * Now *item contains first ItemPointer after previous result.
- *
- * The item is a valid hit only if all the keys succeeded for either
- * that exact TID, or a lossy reference to the same page.
- *
- * This logic works only if a keyGetItem stream can never contain both
- * exact and lossy pointers for the same page. Else we could have a
- * case like
- *
- * stream 1 stream 2
- * ... ...
- * 42/6 42/7
- * 50/1 42/0xffff
- * ... ...
- *
- * We would conclude that 42/6 is not a match and advance stream 1,
- * thus never detecting the match to the lossy pointer in stream 2.
- * (keyGetItem has a similar problem versus entryGetItem.)
- *----------
- */
- match = true;
- for (i = 0; i < so->nkeys; i++)
- {
- GinScanKey key = so->keys + i;
+ /*
+ * It's a match. We can conclude that nothing < matches, so
+ * the other key streams can skip to this item.
+ * Beware of lossy pointers, though; for a lossy pointer, we
+ * can only conclude that nothing smaller than this *page*
+ * matches.
+ */
+ advancePast = key->curItem;
+ if (ItemPointerIsLossyPage(&advancePast))
+ {
+ advancePast.ip_posid = 0;
+ }
+ else
+ {
+ Assert(advancePast.ip_posid > 0);
+ advancePast.ip_posid--;
+ }
- if (key->curItemMatches)
+ /*
+ * If this is the first key, remember this location as a
+ * potential match.
+ *
+ * Otherwise, check if this is the same item that we checked the
+ * previous keys for (or a lossy pointer for the same page). If
+ * not, loop back to check the previous keys for this item (we
+ * will check this key again too, but keyGetItem returns quickly
+ * for that)
+ */
+ if (i == 0)
{
- if (ginCompareItemPointers(item, &key->curItem) == 0)
- continue;
- if (ItemPointerIsLossyPage(&key->curItem) &&
- GinItemPointerGetBlockNumber(&key->curItem) ==
- GinItemPointerGetBlockNumber(item))
- continue;
+ *item = key->curItem;
+ }
+ else
+ {
+ if (ItemPointerIsLossyPage(&key->curItem) ||
+ ItemPointerIsLossyPage(item))
+ {
+ Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
+ match = (GinItemPointerGetBlockNumber(&key->curItem) ==
+ GinItemPointerGetBlockNumber(item));
+ }
+ else
+ {
+ Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
+ match = (ginCompareItemPointers(&key->curItem, item) == 0);
+ }
}
- match = false;
- break;
}
+ } while (!match);
- if (match)
- break;
-
- /*
- * No hit. Update myAdvancePast to this TID, so that on the next pass
- * we'll move to the next possible entry.
- */
- myAdvancePast = *item;
- }
+ Assert(!ItemPointerIsMin(item));
/*
+ * Now *item contains the first ItemPointer after previous result that
+ * passed the consistentFn check for that exact TID, or a lossy reference
+ * to the same page.
+ *
* We must return recheck = true if any of the keys are marked recheck.
*/
*recheck = false;
@@ -1536,7 +1581,7 @@ gingetbitmap(PG_FUNCTION_ARGS)
{
CHECK_FOR_INTERRUPTS();
- if (!scanGetItem(scan, &iptr, &iptr, &recheck))
+ if (!scanGetItem(scan, iptr, &iptr, &recheck))
break;
if (ItemPointerIsLossyPage(&iptr))
--
1.8.5.2
0002-Further-optimize-the-multi-key-GIN-searches.patchtext/x-diff; name=0002-Further-optimize-the-multi-key-GIN-searches.patchDownload
>From fdd605789c069e40b7a001c689222a906ebdd6f5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 15:47:54 +0200
Subject: [PATCH 2/3] Further optimize the multi-key GIN searches.
If we're skipping past a certain TID, avoid decoding posting list segments
that only contain smaller TIDs.
---
src/backend/access/gin/gindatapage.c | 32 +++++++++++++++++++++++++++++---
src/backend/access/gin/ginget.c | 6 ++++--
src/include/access/gin_private.h | 2 +-
3 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 8504f4c..a339028 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -97,18 +97,44 @@ static void dataPlaceToPageLeafSplit(Buffer buf,
/*
* Read all TIDs from leaf data page to single uncompressed array.
+ *
+ * If advancePast is valid, the caller is only interested in TIDs > advancePast.
+ * This function can still return items smaller than that, so the caller
+ * must still check them, but passing it allows this function to skip some
+ * items as an optimization.
*/
ItemPointer
-GinDataLeafPageGetItems(Page page, int *nitems)
+GinDataLeafPageGetItems(Page page, int *nitems, ItemPointerData advancePast)
{
ItemPointer result;
if (GinPageIsCompressed(page))
{
- GinPostingList *ptr = GinDataLeafPageGetPostingList(page);
+ GinPostingList *seg = GinDataLeafPageGetPostingList(page);
Size len = GinDataLeafPageGetPostingListSize(page);
+ Pointer endptr = ((Pointer) seg) + len;
+ GinPostingList *next;
- result = ginPostingListDecodeAllSegments(ptr, len, nitems);
+ /* Skip to the segment containing advancePast+1 */
+ if (ItemPointerIsValid(&advancePast))
+ {
+ next = GinNextPostingListSegment(seg);
+ while ((Pointer) next < endptr &&
+ ginCompareItemPointers(&next->first, &advancePast) <= 0)
+ {
+ seg = next;
+ next = GinNextPostingListSegment(seg);
+ }
+ len = endptr - (Pointer) seg;
+ }
+
+ if (len > 0)
+ result = ginPostingListDecodeAllSegments(seg, len, nitems);
+ else
+ {
+ result = palloc(0);
+ *nitems = 0;
+ }
}
else
{
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4de7a10..5d7738f 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -400,6 +400,7 @@ restartScanEntry:
BlockNumber rootPostingTree = GinGetPostingTree(itup);
GinBtreeStack *stack;
Page page;
+ ItemPointerData minItem;
/*
* We should unlock entry page before touching posting tree to
@@ -426,7 +427,8 @@ restartScanEntry:
/*
* Load the first page into memory.
*/
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ ItemPointerSetMin(&minItem);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist, minItem);
entry->predictNumberResult = stack->predictNumber * entry->nlist;
@@ -556,7 +558,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
continue;
}
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist, advancePast);
if (ItemPointerIsValid(&advancePast))
{
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3f92c37..8c350b9 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -692,7 +692,7 @@ extern ItemPointer ginReadTuple(GinState *ginstate, OffsetNumber attnum,
IndexTuple itup, int *nitems);
/* gindatapage.c */
-extern ItemPointer GinDataLeafPageGetItems(Page page, int *nitems);
+extern ItemPointer GinDataLeafPageGetItems(Page page, int *nitems, ItemPointerData advancePast);
extern int GinDataLeafPageGetItemsToTbm(Page page, TIDBitmap *tbm);
extern BlockNumber createPostingTree(Relation index,
ItemPointerData *items, uint32 nitems,
--
1.8.5.2
0003-Further-optimize-GIN-multi-key-searches.patchtext/x-diff; name=0003-Further-optimize-GIN-multi-key-searches.patchDownload
>From a030774cc5fd9720c988e65b500e8ab8a5fb4d7e Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 16:55:51 +0200
Subject: [PATCH 3/3] Further optimize GIN multi-key searches.
When skipping over some items in a posting tree, re-find the new location
by descending the tree from root, rather than walking the right links.
This can save a lot of I/O.
---
src/backend/access/gin/gindatapage.c | 9 ++--
src/backend/access/gin/ginget.c | 90 +++++++++++++++++++++++++++---------
src/include/access/gin_private.h | 3 +-
3 files changed, 75 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index a339028..2961e5c 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -1619,16 +1619,15 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
* Starts a new scan on a posting tree.
*/
GinBtreeStack *
-ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno)
+ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
{
- GinBtreeData btree;
GinBtreeStack *stack;
- ginPrepareDataScan(&btree, index, rootBlkno);
+ ginPrepareDataScan(btree, index, rootBlkno);
- btree.fullScan = TRUE;
+ btree->fullScan = TRUE;
- stack = ginFindLeafPage(&btree, TRUE);
+ stack = ginFindLeafPage(btree, TRUE);
return stack;
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 5d7738f..72c9e61 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -99,12 +99,13 @@ static void
scanPostingTree(Relation index, GinScanEntry scanEntry,
BlockNumber rootPostingTree)
{
+ GinBtreeData btree;
GinBtreeStack *stack;
Buffer buffer;
Page page;
/* Descend to the leftmost leaf page */
- stack = ginScanBeginPostingTree(index, rootPostingTree);
+ stack = ginScanBeginPostingTree(&btree, index, rootPostingTree);
buffer = stack->buffer;
IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
@@ -412,7 +413,8 @@ restartScanEntry:
LockBuffer(stackEntry->buffer, GIN_UNLOCK);
needUnlock = FALSE;
- stack = ginScanBeginPostingTree(ginstate->index, rootPostingTree);
+ stack = ginScanBeginPostingTree(&entry->btree, ginstate->index,
+ rootPostingTree);
entry->buffer = stack->buffer;
/*
@@ -506,8 +508,50 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
{
Page page;
int i;
+ bool stepright;
+
+ /*
+ * We have two strategies for finding the correct page: step right from
+ * the current page, or descend the tree again from the root. If
+ * advancePast equals the current item, the next matching item should be
+ * on the next page, so we step right. Otherwise, descend from root.
+ */
+ if (ginCompareItemPointers(&entry->curItem, &advancePast) == 0)
+ {
+ stepright = true;
+ LockBuffer(entry->buffer, GIN_SHARE);
+ }
+ else
+ {
+ GinBtreeStack *stack;
+
+ ReleaseBuffer(entry->buffer);
+
+ /*
+ * Set the search key, and find the correct leaf page.
+ *
+ * XXX: This is off by one, we're searching for an item > advancePast,
+ * but we're asking the tree for the next item >= advancePast. It only
+ * makes a difference in the corner case that advancePast is the
+ * right bound of a page, in which case we'll scan one page
+ * unnecessarily. Other than that it's harmless.
+ */
+ entry->btree.itemptr = advancePast;
+ entry->btree.fullScan = false;
+ stack = ginFindLeafPage(&entry->btree, true);
+
+ /* we don't need the stack, just the buffer. */
+ entry->buffer = stack->buffer;
+ IncrBufferRefCount(entry->buffer);
+ freeGinBtreeStack(stack);
+ stepright = false;
+ }
+
+ elog(LOG, "entryLoadMoreItems, %u/%u, skip: %d",
+ ItemPointerGetBlockNumber(&advancePast),
+ ItemPointerGetOffsetNumber(&advancePast),
+ !stepright);
- LockBuffer(entry->buffer, GIN_SHARE);
page = BufferGetPage(entry->buffer);
for (;;)
{
@@ -519,31 +563,35 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
entry->nlist = 0;
}
- /*
- * We've processed all the entries on this page. If it was the last
- * page in the tree, we're done.
- */
- if (GinPageRightMost(page))
+ if (stepright)
{
- UnlockReleaseBuffer(entry->buffer);
- entry->buffer = InvalidBuffer;
- entry->isFinished = TRUE;
- return;
+ /*
+ * We've processed all the entries on this page. If it was the last
+ * page in the tree, we're done.
+ */
+ if (GinPageRightMost(page))
+ {
+ UnlockReleaseBuffer(entry->buffer);
+ entry->buffer = InvalidBuffer;
+ entry->isFinished = TRUE;
+ return;
+ }
+
+ /*
+ * Step to next page, following the right link. then find the first
+ * ItemPointer greater than advancePast.
+ */
+ entry->buffer = ginStepRight(entry->buffer,
+ ginstate->index,
+ GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
}
+ stepright = true;
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
continue; /* page was deleted by concurrent vacuum */
/*
- * Step to next page, following the right link. then find the first
- * ItemPointer greater than advancePast.
- */
- entry->buffer = ginStepRight(entry->buffer,
- ginstate->index,
- GIN_SHARE);
- page = BufferGetPage(entry->buffer);
-
- /*
* The first item > advancePast might not be on this page, but
* somewhere to the right, if the page was split. Keep following
* the right-links until we re-find the correct page.
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 8c350b9..a12dfc3 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -703,7 +703,7 @@ extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
-extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
+extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno);
extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
@@ -803,6 +803,7 @@ typedef struct GinScanEntryData
bool isFinished;
bool reduceResult;
uint32 predictNumberResult;
+ GinBtreeData btree;
} GinScanEntryData;
typedef struct GinScanOpaqueData
--
1.8.5.2
On 23.1.2014 17:22, Heikki Linnakangas wrote:
On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
Attached version is rebased against last version of packed posting lists.
Thanks!
I think we're missing a trick with multi-key queries. We know that when
multiple scan keys are used, they are ANDed together, so we can do the
skip optimization even without the new tri-state consistent function.To get started, I propose the three attached patches. These only
implement the optimization for the multi-key case, which doesn't require
any changes to the consistent functions and hence no catalog changes.
Admittedly this isn't anywhere near as useful in practice as the single
key case, but let's go for the low-hanging fruit first. This
nevertheless introduces some machinery that will be needed by the full
patch anyway.I structured the code somewhat differently than your patch. There is no
separate fast-path for the case where the optimization applies. Instead,
I'm passing the advancePast variable all the way down to where the next
batch of items are loaded from the posting tree. keyGetItem is now
responsible for advancing the entry streams, and the logic in
scanGetItem has been refactored so that it advances advancePast
aggressively, as soon as one of the key streams let us conclude that no
items < a certain point can match.scanGetItem might yet need to be refactored when we get to the full
preconsistent check stuff, but one step at a time.The first patch is the most interesting one, and contains the
scanGetItem changes. The second patch allows seeking to the right
segment in a posting tree page, and the third allows starting the
posting tree scan from root, when skipping items (instead of just
following the right-links).Here are some simple performance test results, demonstrating the effect
of each of these patches. This is a best-case scenario. I don't think
these patches has any adverse effects even in the worst-case scenario,
although I haven't actually tried hard to measure that. The used this to
create a test table:create table foo (intarr int[]);
-- Every row contains 0 (frequent term), and a unique number.
insert into foo select array[0,g] from generate_series(1, 10000000) g;
-- Add another tuple with 0, 1 combo physically to the end of the table.
insert into foo values (array[0,1]);The query I used is this:
postgres=# select count(*) from foo where intarr @> array[0] and intarr
@> array[1];
count
-------
2
(1 row)I measured the time that query takes, and the number of pages hit, using
"explain (analyze, buffers true) ...".patches time (ms) buffers
---
unpatched 650 1316
patch 1 0.52 1316
patches 1+2 0.50 1316
patches 1+2+3 0.13 15So, the second patch isn't doing much in this particular case. But it's
trivial, and I think it will make a difference in other queries where
you have the opportunity skip, but return a lot of tuples overall.In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if
the logic in scanGetItem/keyGetItem looks correct to you. After this, I
think the main fast scan logic will go into keyGetItem.PS. I find it a bit surprising that in your patch, you're completely
bailing out if there are any partial-match keys involved. Is there some
fundamental reason for that, or just not implemented?
I've done some initial testing (with all the three patches applied)
today to see if there are any crashes or obvious failures and found none
so far. The only issue I've noticed is this LOG message in ginget.c:
elog(LOG, "entryLoadMoreItems, %u/%u, skip: %d",
ItemPointerGetBlockNumber(&advancePast),
ItemPointerGetOffsetNumber(&advancePast),
!stepright);
which produces enormous amount of messages. I suppose that was used for
debugging purposes and shouldn't be there?
I plan to do more thorough testing over the weekend, but I'd like to
make sure I understand what to expect. My understanding is that this
patch should:
- give the same results as the current code (e.g. the fulltext should
not return different rows / change the ts_rank etc.)
- improve the performance of fulltext queries
Are there any obvious rules what queries will benefit most from this?
The queries generated by the tool I'm using for testing are mostly of
this form:
SELECT id FROM messages
WHERE body_tsvector @ plainto_tsquery('english', 'word1 word2 ...')
ORDER BY ts_rank(...) DESC LIMIT :n;
with varying number of words and LIMIT values. During the testing today
I haven't noticed any obvious performance difference, but I haven't
spent much time on that.
regards
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 24, 2014 at 6:48 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
I plan to do more thorough testing over the weekend, but I'd like to
make sure I understand what to expect. My understanding is that this
patch should:- give the same results as the current code (e.g. the fulltext should
not return different rows / change the ts_rank etc.)- improve the performance of fulltext queries
Are there any obvious rules what queries will benefit most from this?
The queries generated by the tool I'm using for testing are mostly of
this form:SELECT id FROM messages
WHERE body_tsvector @ plainto_tsquery('english', 'word1 word2 ...')
ORDER BY ts_rank(...) DESC LIMIT :n;with varying number of words and LIMIT values. During the testing today
I haven't noticed any obvious performance difference, but I haven't
spent much time on that.
These patches optimizes only query with multiple WHERE clauses. For
instance:
SELECT id FROM messages
WHERE body_tsvector @ plainto_tsquery('english', 'word1')
AND body_tsvector @ plainto_tsquery('english', 'word2')
ORDER BY ts_rank(...) DESC LIMIT :n;
Optimizations inside single clause will be provided as separate patch.
------
With best regards,
Alexander Korotkov.
On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 01/14/2014 05:35 PM, Alexander Korotkov wrote:
Attached version is rebased against last version of packed posting lists.
Thanks!
I think we're missing a trick with multi-key queries. We know that when
multiple scan keys are used, they are ANDed together, so we can do the skip
optimization even without the new tri-state consistent function.To get started, I propose the three attached patches. These only implement
the optimization for the multi-key case, which doesn't require any changes
to the consistent functions and hence no catalog changes. Admittedly this
isn't anywhere near as useful in practice as the single key case, but let's
go for the low-hanging fruit first. This nevertheless introduces some
machinery that will be needed by the full patch anyway.I structured the code somewhat differently than your patch. There is no
separate fast-path for the case where the optimization applies. Instead,
I'm passing the advancePast variable all the way down to where the next
batch of items are loaded from the posting tree. keyGetItem is now
responsible for advancing the entry streams, and the logic in scanGetItem
has been refactored so that it advances advancePast aggressively, as soon
as one of the key streams let us conclude that no items < a certain point
can match.scanGetItem might yet need to be refactored when we get to the full
preconsistent check stuff, but one step at a time.The first patch is the most interesting one, and contains the scanGetItem
changes. The second patch allows seeking to the right segment in a posting
tree page, and the third allows starting the posting tree scan from root,
when skipping items (instead of just following the right-links).Here are some simple performance test results, demonstrating the effect of
each of these patches. This is a best-case scenario. I don't think these
patches has any adverse effects even in the worst-case scenario, although I
haven't actually tried hard to measure that. The used this to create a test
table:create table foo (intarr int[]);
-- Every row contains 0 (frequent term), and a unique number.
insert into foo select array[0,g] from generate_series(1, 10000000) g;
-- Add another tuple with 0, 1 combo physically to the end of the table.
insert into foo values (array[0,1]);The query I used is this:
postgres=# select count(*) from foo where intarr @> array[0] and intarr @>
array[1];
count
-------
2
(1 row)I measured the time that query takes, and the number of pages hit, using
"explain (analyze, buffers true) ...".patches time (ms) buffers
---
unpatched 650 1316
patch 1 0.52 1316
patches 1+2 0.50 1316
patches 1+2+3 0.13 15So, the second patch isn't doing much in this particular case. But it's
trivial, and I think it will make a difference in other queries where you
have the opportunity skip, but return a lot of tuples overall.In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if the
logic in scanGetItem/keyGetItem looks correct to you. After this, I think
the main fast scan logic will go into keyGetItem.
Good, thanks! Now I can reimplement fast scan basing on this patches.
PS. I find it a bit surprising that in your patch, you're completely
bailing out if there are any partial-match keys involved. Is there some
fundamental reason for that, or just not implemented?
Just not implemented. I think there is two possible approaches to handle it:
1) Handle partial-match keys like OR on matching keys.
2) Implement keyAdvancePast for bitmap.
First approach seems to be useful with low number of keys. Probably, we
should implement automatic switching between them.
------
With best regards,
Alexander Korotkov.
On 01/24/2014 01:58 PM, Alexander Korotkov wrote:
On Thu, Jan 23, 2014 at 8:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if the
logic in scanGetItem/keyGetItem looks correct to you. After this, I think
the main fast scan logic will go into keyGetItem.Good, thanks! Now I can reimplement fast scan basing on this patches.
I hope we're not wasting effort doing the same thing, but I was also
hacking that; here's what I got. It applies on top of the previous set
of patches.
I decided to focus on the ginget.c changes, and worry about the catalog
changes later. Instead, I added an abstraction for calling the ternary
consistent function, with a shim implementation that checks if there is
only one UNKNOWN argument, and tries the regular boolean consistent
function "both ways" for the UNKNOWN argument. That's the same strategy
we were already using when dealing with a key with one lossy page, so I
refactored that to also use the new ternary consistent function.
That abstraction can be used to do other optimizations in the future.
For example, building the truth table like I suggested a long time ago,
or simply checking for some common cases, like if the consistent
function implements plain AND logic. Or caching the results of expensive
consistent functions. And obviously, that is where the call to the
opclass-provided tri-state consistent function will go to.
Then, I rewrote keyGetItem to make use of the ternary consistent
function, to perform the "pre-consistent" check. That is the core logic
from your patch. Currently (ie. without the patch), we loop through all
the entries, and advance them to the next item pointer > advancePast,
and then perform the consistent check for the smallest item among those.
With the patch, we first determine the smallest item pointer among the
entries with curItem > advancePast, and call that minItem. The others
are considered as "unknown", as they might contain an item X where
advancePast < X < minItem. Normally, we would have to load the next item
from that entry to determine if X exists, but we can skip it if the
ternary pre-consistent function says that X cannot match anyway.
In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.
(hmm, I should put the above paragraphs in a comment in the patch)
This isn't exactly the same structure as in your patch, but I found the
concept easier to understand when written this way. I did not implement
the sorting of the entries. It seems like a sensible thing to do, but
I'd like to see a test case that shows the difference before bothering
with it. If we do it, a binary heap probably makes more sense than
keeping the array fully sorted.
There's one tradeoff here that should be mentioned: In most cases, it's
extremely cheap to fetch the next item from an entry stream. We load a
page worth of items into the array, so it's just a matter of pulling the
next item from the array. Instead of trying to "refute" such items based
on other entries, would it be better to load them and call the
consistent function the normal way for them? Refuting might slash all
the entries in one consistent check, but OTOH, when refuting fails, the
consistent check was a waste of cycles. If we only tried to refute items
when the alternative would be to load a new page, there would be less
change of a performance regression from this patch.
Anyway, how does this patch look to you? Did I get the logic correct?
Do you have some test data and/or queries that you could share easily?
- Heikki
Attachments:
0001-Add-the-concept-of-a-ternary-consistent-check-and-us.patchtext/x-diff; name=0001-Add-the-concept-of-a-ternary-consistent-check-and-us.patchDownload
>From eb9c6a202cbb0ab03181cb19a434deb6082da497 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 23:08:43 +0200
Subject: [PATCH 1/1] Add the concept of a ternary consistent check, and use it
to skip entries.
When we have loaded the next item from some, but not all, entries in a scan,
it might be possible to prove that there cannot be any matches with smaller
item pointer coming from the other entries. In that case, we can
fast-forward those entries to the smallest item among the already-fetched
sources.
There is no support for opclass-defined ternary consistent functions yet,
but there is a shim function that calls the regular, boolean, consistent
function "both ways", when only one input is unknown.
Per the concept by Alexander Korotkov
---
src/backend/access/gin/Makefile | 2 +-
src/backend/access/gin/ginget.c | 414 ++++++++++++++++++++++----------------
src/backend/access/gin/ginlogic.c | 136 +++++++++++++
src/include/access/gin_private.h | 23 ++-
4 files changed, 397 insertions(+), 178 deletions(-)
create mode 100644 src/backend/access/gin/ginlogic.c
diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile
index aabc62f..db4f496 100644
--- a/src/backend/access/gin/Makefile
+++ b/src/backend/access/gin/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \
ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \
- ginbulk.o ginfast.o ginpostinglist.o
+ ginbulk.o ginfast.o ginpostinglist.o ginlogic.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 72c9e61..a5b2eab 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -32,41 +32,6 @@ typedef struct pendingPosition
bool *hasMatchKey;
} pendingPosition;
-
-/*
- * Convenience function for invoking a key's consistentFn
- */
-static bool
-callConsistentFn(GinState *ginstate, GinScanKey key)
-{
- /*
- * If we're dealing with a dummy EVERYTHING key, we don't want to call the
- * consistentFn; just claim it matches.
- */
- if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
- {
- key->recheckCurItem = false;
- return true;
- }
-
- /*
- * Initialize recheckCurItem in case the consistentFn doesn't know it
- * should set it. The safe assumption in that case is to force recheck.
- */
- key->recheckCurItem = true;
-
- return DatumGetBool(FunctionCall8Coll(&ginstate->consistentFn[key->attnum - 1],
- ginstate->supportCollation[key->attnum - 1],
- PointerGetDatum(key->entryRes),
- UInt16GetDatum(key->strategy),
- key->query,
- UInt32GetDatum(key->nuserentries),
- PointerGetDatum(key->extra_data),
- PointerGetDatum(&key->recheckCurItem),
- PointerGetDatum(key->queryValues),
- PointerGetDatum(key->queryCategories)));
-}
-
/*
* Goes to the next page if current offset is outside of bounds
*/
@@ -460,6 +425,8 @@ startScanKey(GinState *ginstate, GinScanKey key)
key->curItemMatches = false;
key->recheckCurItem = false;
key->isFinished = false;
+
+ GinInitConsistentMethod(ginstate, key);
}
static void
@@ -798,18 +765,19 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
ItemPointerData minItem;
ItemPointerData curPageLossy;
uint32 i;
- uint32 lossyEntry;
bool haveLossyEntry;
GinScanEntry entry;
- bool res;
MemoryContext oldCtx;
bool allFinished;
+ bool allUnknown;
+ int minUnknown;
+ GinLogicValue res;
Assert(!key->isFinished);
/*
* We might have already tested this item; if so, no need to repeat work.
- * (Note: the ">" case can happen, if minItem is exact but we previously
+ * (Note: the ">" case can happen, if advancePast is exact but we previously
* had to set curItem to a lossy-page pointer.)
*/
if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
@@ -823,155 +791,256 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
* same page. So we'll prefer to return exact pointers not lossy
* pointers, which is good.
*/
- ItemPointerSetMax(&minItem);
- allFinished = true;
- for (i = 0; i < key->nentries; i++)
+ oldCtx = CurrentMemoryContext;
+
+ for (;;)
{
- entry = key->scanEntry[i];
+ ItemPointerSetMax(&minItem);
+ allFinished = true;
+ allUnknown = true;
+ minUnknown = -1;
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
- /*
- * Advance this stream if necessary.
- *
- * In particular, since entry->curItem was initialized with
- * ItemPointerSetMin, this ensures we fetch the first item for each
- * entry on the first call.
- */
- while (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ if (entry->isFinished)
+ continue;
+ allFinished = false;
+
+ if (!entry->isFinished &&
+ ginCompareItemPointers(&entry->curItem, &advancePast) > 0)
+ {
+ allUnknown = false;
+ if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+ minItem = entry->curItem;
+ }
+ else if (minUnknown == -1)
+ minUnknown = i;
+ }
+
+ if (allFinished)
{
- entryGetItem(ginstate, entry, advancePast);
+ /* all entries are finished */
+ key->isFinished = TRUE;
+ return;
}
- if (!entry->isFinished)
+ if (allUnknown)
{
- allFinished = FALSE;
- if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
- minItem = entry->curItem;
+ /*
+ * We must have an item from at least one source to have a match.
+ * Fetch the next item > advancePast from the first (non-finished)
+ * entry stream.
+ */
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
}
- }
- if (allFinished)
- {
- /* all entries are finished */
- key->isFinished = TRUE;
- return;
- }
+ /*
+ * We now have minItem set to the minimum among input streams *that*
+ * we know. Some streams might be in unknown state, meaning we don't
+ * know the next value from that input.
+ *
+ * Determine if any items between advancePast and minItem might match.
+ * Such items might come from one of the unknown sources, but it's
+ * possible that the consistent function can refute them all, ie.
+ * the consistent logic says that they cannot match without any of the
+ * sources that we have loaded.
+ */
+ if (minUnknown != -1)
+ {
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ /* this source is 'unloaded' */
+ key->entryRes[i] = GIN_MAYBE;
+ }
+ else
+ {
+ /*
+ * we know the next item from this source to be >= minItem,
+ * hence it's false for any items before < minItem
+ */
+ key->entryRes[i] = GIN_FALSE;
+ }
+ }
- /*
- * OK, set key->curItem and perform consistentFn test.
- */
- key->curItem = minItem;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- /*
- * Lossy-page entries pose a problem, since we don't know the correct
- * entryRes state to pass to the consistentFn, and we also don't know what
- * its combining logic will be (could be AND, OR, or even NOT). If the
- * logic is OR then the consistentFn might succeed for all items in the
- * lossy page even when none of the other entries match.
- *
- * If we have a single lossy-page entry then we check to see if the
- * consistentFn will succeed with only that entry TRUE. If so, we return
- * a lossy-page pointer to indicate that the whole heap page must be
- * checked. (On subsequent calls, we'll do nothing until minItem is past
- * the page altogether, thus ensuring that we never return both regular
- * and lossy pointers for the same page.)
- *
- * This idea could be generalized to more than one lossy-page entry, but
- * ideally lossy-page entries should be infrequent so it would seldom be
- * the case that we have more than one at once. So it doesn't seem worth
- * the extra complexity to optimize that case. If we do find more than
- * one, we just punt and return a lossy-page pointer always.
- *
- * Note that only lossy-page entries pointing to the current item's page
- * should trigger this processing; we might have future lossy pages in the
- * entry array, but they aren't relevant yet.
- */
- ItemPointerSetLossyPage(&curPageLossy,
- GinItemPointerGetBlockNumber(&key->curItem));
+ if (res == GIN_FALSE)
+ {
+ /*
+ * All items between advancePast and minItem have been refuted.
+ * Proceed with minItem.
+ */
+ advancePast = minItem;
+ advancePast.ip_posid--;
+ }
+ else
+ {
+ /*
+ * There might be matches smaller than minItem coming from one
+ * of the unknown sources. Load more items, and retry.
+ */
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
+ }
+ }
- lossyEntry = 0;
- haveLossyEntry = false;
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ /*
+ * Ok, we now know that there are no matches < minItem. Proceed to
+ * check if it's a match.
+ */
+ key->curItem = minItem;
+ ItemPointerSetLossyPage(&curPageLossy,
+ GinItemPointerGetBlockNumber(&minItem));
+
+ /*
+ * Lossy-page entries pose a problem, since we don't know the correct
+ * entryRes state to pass to the consistentFn, and we also don't know
+ * what its combining logic will be (could be AND, OR, or even NOT).
+ * If the logic is OR then the consistentFn might succeed for all items
+ * in the lossy page even when none of the other entries match.
+ *
+ * Our strategy is to call the tri-state consistent function, with the
+ * lossy-page entries set to MAYBE, and all the other entries FALSE.
+ * If it returns FALSE, none of the lossy items alone are enough for a
+ * match, so we don't need to return a lossy-page pointer. Otherwise,
+ * return a lossy-page pointer to indicate that the whole heap page must
+ * be checked. (On subsequent calls, we'll do nothing until minItem is
+ * past the page altogether, thus ensuring that we never return both
+ * regular and lossy pointers for the same page.)
+ *
+ * An exception is that we don't need to try it both ways (ie. pass
+ * MAYBE) if the lossy pointer is in a "hidden" entry, because the
+ * consistentFn's result can't depend on that (but mark the result as
+ * 'recheck').
+ *
+ * Note that only lossy-page entries pointing to the current item's
+ * page should trigger this processing; we might have future lossy
+ * pages in the entry array, but they aren't relevant yet.
+ */
+ haveLossyEntry = false;
+ for (i = 0; i < key->nentries; i++)
{
- if (haveLossyEntry)
+ entry = key->scanEntry[i];
+ if (entry->isFinished == FALSE &&
+ ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
{
- /* Multiple lossy entries, punt */
+ key->entryRes[i] = GIN_MAYBE;
+ haveLossyEntry = true;
+ }
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
+
+ if (haveLossyEntry)
+ {
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
+
+ if (res == GIN_TRUE || res == GIN_MAYBE)
+ {
+ /* Some of the lossy items on the heap page might match, punt */
key->curItem = curPageLossy;
key->curItemMatches = true;
key->recheckCurItem = true;
return;
}
- lossyEntry = i;
- haveLossyEntry = true;
}
- }
- /* prepare for calling consistentFn in temp context */
- oldCtx = MemoryContextSwitchTo(tempCtx);
+ /*
+ * Let's call the consistent function to check if this is a match.
+ *
+ * At this point we know that we don't need to return a lossy
+ * whole-page pointer, but we might have matches for individual exact
+ * item pointers, possibly in combination with a lossy pointer. Pass
+ * lossy pointers as MAYBE to the ternary consistent function, to
+ * let it decide if this tuple satisfies the overall key, even though
+ * we don't know whether the lossy entries match.
+ *
+ * We might also not have advanced all the entry streams up to this
+ * point yet. It's possible that the consistent function can
+ * nevertheless decide that this is definitely a match or not a match,
+ * even though we don't know if those unknown entries match, so we
+ * pass them as MAYBE.
+ */
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ key->entryRes[i] = GIN_MAYBE; /* not loaded yet */
+ else if (ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ key->entryRes[i] = GIN_MAYBE;
+ else if (ginCompareItemPointers(&entry->curItem, &minItem) == 0)
+ key->entryRes[i] = GIN_TRUE;
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
- if (haveLossyEntry)
- {
- /* Single lossy-page entry, so see if whole page matches */
- memset(key->entryRes, FALSE, key->nentries);
- key->entryRes[lossyEntry] = TRUE;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- if (callConsistentFn(ginstate, key))
+ switch (res)
{
- /* Yes, so clean up ... */
- MemoryContextSwitchTo(oldCtx);
- MemoryContextReset(tempCtx);
-
- /* and return lossy pointer for whole page */
- key->curItem = curPageLossy;
- key->curItemMatches = true;
- key->recheckCurItem = true;
- return;
- }
- }
+ case GIN_TRUE:
+ key->curItemMatches = true;
+ /* triConsistentFn set recheckCurItem */
+ break;
- /*
- * At this point we know that we don't need to return a lossy whole-page
- * pointer, but we might have matches for individual exact item pointers,
- * possibly in combination with a lossy pointer. Our strategy if there's
- * a lossy pointer is to try the consistentFn both ways and return a hit
- * if it accepts either one (forcing the hit to be marked lossy so it will
- * be rechecked). An exception is that we don't need to try it both ways
- * if the lossy pointer is in a "hidden" entry, because the consistentFn's
- * result can't depend on that.
- *
- * Prepare entryRes array to be passed to consistentFn.
- */
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &key->curItem) == 0)
- key->entryRes[i] = TRUE;
- else
- key->entryRes[i] = FALSE;
- }
- if (haveLossyEntry)
- key->entryRes[lossyEntry] = TRUE;
+ case GIN_FALSE:
+ key->curItemMatches = false;
+ break;
- res = callConsistentFn(ginstate, key);
+ case GIN_MAYBE:
+ /*
+ * The consistent function cannot decide with the information
+ * we've got. If there are any "unknown" sources left, advance
+ * one of them and try again, in the hope that it can decide
+ * with the extra information.
+ */
+ if (minUnknown != -1)
+ {
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
+ }
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
- if (!res && haveLossyEntry && lossyEntry < key->nuserentries)
- {
- /* try the other way for the lossy item */
- key->entryRes[lossyEntry] = FALSE;
+ default:
+ /*
+ * the 'default' case shouldn't happen, but if the consistent
+ * function returns something bogus, this is the safe result
+ */
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
+ }
- res = callConsistentFn(ginstate, key);
+ /*
+ * We have a tuple, and we know if it mathes or not. If it's a
+ * non-match, we could continue to find the next matching tuple, but
+ * let's break out and give scanGetItem a chance to advance the other
+ * keys. They might be able to skip past to a much higher TID, allowing
+ * us to save work.
+ */
+ break;
}
- key->curItemMatches = res;
- /* If we matched a lossy entry, force recheckCurItem = true */
- if (haveLossyEntry)
- key->recheckCurItem = true;
-
/* clean up after consistentFn calls */
MemoryContextSwitchTo(oldCtx);
MemoryContextReset(tempCtx);
@@ -1064,7 +1133,7 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
/*
* If this is the first key, remember this location as a
- * potential match.
+ * potential match, and proceed to check the rest of the keys.
*
* Otherwise, check if this is the same item that we checked the
* previous keys for (or a lossy pointer for the same page). If
@@ -1075,21 +1144,20 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
if (i == 0)
{
*item = key->curItem;
+ continue;
+ }
+
+ if (ItemPointerIsLossyPage(&key->curItem) ||
+ ItemPointerIsLossyPage(item))
+ {
+ Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
+ match = (GinItemPointerGetBlockNumber(&key->curItem) ==
+ GinItemPointerGetBlockNumber(item));
}
else
{
- if (ItemPointerIsLossyPage(&key->curItem) ||
- ItemPointerIsLossyPage(item))
- {
- Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
- match = (GinItemPointerGetBlockNumber(&key->curItem) ==
- GinItemPointerGetBlockNumber(item));
- }
- else
- {
- Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
- match = (ginCompareItemPointers(&key->curItem, item) == 0);
- }
+ Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
+ match = (ginCompareItemPointers(&key->curItem, item) == 0);
}
}
} while (!match);
@@ -1306,7 +1374,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
{
GinScanKey key = so->keys + i;
- memset(key->entryRes, FALSE, key->nentries);
+ memset(key->entryRes, GIN_FALSE, key->nentries);
}
memset(pos->hasMatchKey, FALSE, so->nkeys);
@@ -1563,7 +1631,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
{
GinScanKey key = so->keys + i;
- if (!callConsistentFn(&so->ginstate, key))
+ if (!key->boolConsistentFn(key))
{
match = false;
break;
diff --git a/src/backend/access/gin/ginlogic.c b/src/backend/access/gin/ginlogic.c
new file mode 100644
index 0000000..e499c6e
--- /dev/null
+++ b/src/backend/access/gin/ginlogic.c
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * ginlogic.c
+ * routines for performing binary- and ternary-logic consistent checks.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/access/gin/ginlogic.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "access/reloptions.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/indexfsm.h"
+#include "storage/lmgr.h"
+
+/*
+ * A dummy consistent function for an EVERYTHING key. Just claim it matches.
+ */
+static bool
+trueConsistentFn(GinScanKey key)
+{
+ key->recheckCurItem = false;
+ return true;
+}
+static GinLogicValue
+trueTriConsistentFn(GinScanKey key)
+{
+ return GIN_MAYBE;
+}
+
+/*
+ * A function for calling a regular, binary logic, consistent function.
+ */
+static bool
+normalBoolConsistentFn(GinScanKey key)
+{
+ /*
+ * Initialize recheckCurItem in case the consistentFn doesn't know it
+ * should set it. The safe assumption in that case is to force recheck.
+ */
+ key->recheckCurItem = true;
+
+ return DatumGetBool(FunctionCall8Coll(key->consistentFmgrInfo,
+ key->collation,
+ PointerGetDatum(key->entryRes),
+ UInt16GetDatum(key->strategy),
+ key->query,
+ UInt32GetDatum(key->nuserentries),
+ PointerGetDatum(key->extra_data),
+ PointerGetDatum(&key->recheckCurItem),
+ PointerGetDatum(key->queryValues),
+ PointerGetDatum(key->queryCategories)));
+}
+
+/*
+ * This function implements a tri-state consistency check, using a boolean
+ * consistent function provided by the opclass.
+ *
+ * If there is only one MAYBE input, our strategy is to try the consistentFn
+ * both ways. If it returns TRUE for both, the tuple matches regardless of
+ * the MAYBE input, so we return TRUE. Likewise, if it returns FALSE for both,
+ * we return FALSE. Otherwise return MAYBE.
+ */
+static GinLogicValue
+shimTriConsistentFn(GinScanKey key)
+{
+ bool foundMaybe = false;
+ int maybeEntry = -1;
+ int i;
+ bool boolResult1;
+ bool boolResult2;
+ bool recheck1;
+ bool recheck2;
+
+ for (i = 0; i < key->nentries; i++)
+ {
+ if (key->entryRes[i] == GIN_MAYBE)
+ {
+ if (foundMaybe)
+ return GIN_MAYBE; /* more than one MAYBE input */
+ maybeEntry = i;
+ foundMaybe = true;
+ }
+ }
+
+ /*
+ * If none of the inputs were MAYBE, so we can just call consistent
+ * function as is.
+ */
+ if (!foundMaybe)
+ return normalBoolConsistentFn(key);
+
+ /* Try the consistent function with the maybe-input set both ways */
+ key->entryRes[maybeEntry] = GIN_TRUE;
+ boolResult1 = normalBoolConsistentFn(key);
+ recheck1 = key->recheckCurItem;
+
+ key->entryRes[maybeEntry] = GIN_FALSE;
+ boolResult2 = normalBoolConsistentFn(key);
+ recheck2 = key->recheckCurItem;
+
+ if (!boolResult1 && !boolResult2)
+ return GIN_FALSE;
+
+ key->recheckCurItem = recheck1 || recheck2;
+ if (boolResult1 && boolResult2)
+ return GIN_TRUE;
+ else
+ return GIN_MAYBE;
+}
+
+void
+GinInitConsistentMethod(GinState *ginstate, GinScanKey key)
+{
+ if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
+ {
+ key->boolConsistentFn = trueConsistentFn;
+ key->triConsistentFn = trueTriConsistentFn;
+ }
+ else
+ {
+ key->consistentFmgrInfo = &ginstate->consistentFn[key->attnum - 1];
+ key->collation = ginstate->supportCollation[key->attnum - 1];
+ key->boolConsistentFn = normalBoolConsistentFn;
+ key->triConsistentFn = shimTriConsistentFn;
+ }
+}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index a12dfc3..6d6a49a 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -17,6 +17,8 @@
#include "storage/bufmgr.h"
#include "utils/rbtree.h"
+typedef struct GinScanKeyData *GinScanKey;
+typedef struct GinScanEntryData *GinScanEntry;
/*
* Page opaque data in an inverted index page.
@@ -588,6 +590,19 @@ extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
GinNullCategory *category);
+/* ginlogic.c */
+
+enum
+{
+ GIN_FALSE = 0,
+ GIN_TRUE = 1,
+ GIN_MAYBE = 2
+} GinLogicValueEnum;
+
+typedef char GinLogicValue;
+
+extern void GinInitConsistentMethod(GinState *ginstate, GinScanKey key);
+
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
extern Datum ginbuildempty(PG_FUNCTION_ARGS);
@@ -733,10 +748,6 @@ extern void ginVacuumPostingTreeLeaf(Relation rel, Buffer buf, GinVacuumState *g
* nuserentries is the number that extractQueryFn returned (which is what
* we report to consistentFn). The "user" entries must come first.
*/
-typedef struct GinScanKeyData *GinScanKey;
-
-typedef struct GinScanEntryData *GinScanEntry;
-
typedef struct GinScanKeyData
{
/* Real number of entries in scanEntry[] (always > 0) */
@@ -749,6 +760,10 @@ typedef struct GinScanKeyData
/* array of check flags, reported to consistentFn */
bool *entryRes;
+ bool (*boolConsistentFn) (GinScanKey key);
+ bool (*triConsistentFn) (GinScanKey key);
+ FmgrInfo *consistentFmgrInfo;
+ Oid collation;
/* other data needed for calling consistentFn */
Datum query;
--
1.8.5.2
On 23.1.2014 17:22, Heikki Linnakangas wrote:
I measured the time that query takes, and the number of pages hit, using
"explain (analyze, buffers true) ...".patches time (ms) buffers
---
unpatched 650 1316
patch 1 0.52 1316
patches 1+2 0.50 1316
patches 1+2+3 0.13 15So, the second patch isn't doing much in this particular case. But it's
trivial, and I think it will make a difference in other queries where
you have the opportunity skip, but return a lot of tuples overall.In summary, these are fairly small patches, and useful on their, so I
think these should be committed now. But please take a look and see if
the logic in scanGetItem/keyGetItem looks correct to you. After this, I
think the main fast scan logic will go into keyGetItem.
Hi,
I've done some testing of the three patches today, and I've ran into an
infinite loop caused by the third patch. I don't know why exactly it
gets stuck, but with patches #1+#2 it works fine, and after applying #3
it runs infinitely.
I can't point to a particular line / condition causing this, but this is
wthat I see in 'perf top'
54.16% postgres [.] gingetbitmap
32.38% postgres [.] ginPostingListDecodeAllSegments
3.03% libc-2.17.so [.] 0x000000000007fb88
I've tracked it down to this loop in ginget.c:840 (I've added the
logging for debugging / demonstration purposes):
=====================================================================
elog(WARNING, "scanning entries");
elog(WARNING, "advacepast=(%u,%d)",
BlockIdGetBlockNumber(&advancePast.ip_blkid),
advancePast.ip_posid);
while (entry->isFinished == FALSE &&
ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
{
elog(WARNING, "current=(%u,%d)",
BlockIdGetBlockNumber(&entry->curItem.ip_blkid),
entry->curItem.ip_posid);
entryGetItem(ginstate, entry, advancePast);
}
elog(WARNING, "entries scanned");
=====================================================================
which is executed repeatedly, but the last invocation gets stuck and
produces this output:
WARNING: scanning entries
WARNING: advacepast=(172058,0)
LOG: entryLoadMoreItems, 172058/0, skip: 1
WARNING: getting item current=(171493,7)
WARNING: getting item current=(116833,2)
WARNING: getting item current=(116833,3)
WARNING: getting item current=(116833,4)
WARNING: getting item current=(116833,5)
WARNING: getting item current=(116838,1)
WARNING: getting item current=(116838,2)
... increasing sequence of block IDs ...
WARNING: getting item current=(170743,5)
WARNING: getting item current=(170746,4)
WARNING: getting item current=(171493,7)
LOG: entryLoadMoreItems, 172058/0, skip: 1
WARNING: getting item current=(116833,2)
WARNING: getting item current=(116833,3)
WARNING: getting item current=(116833,4)
WARNING: getting item current=(116833,5)
... and repeat
=====================================================================
Not sure what went wrong, though - I suspect it does not set the
isFinished flag or something like that, but I don't know where/when
should that happen.
This is rather easy to reproduce - download the dump I already provided
two weeks ago [http://www.fuzzy.cz/tmp/message-b.data.gz] and load it
into a simple table:
CREATE TABLE msgs (body tsvector);
COPY msgs FROM '/tmp/message-b.data';
CREATE INDEX msgidx ON msgs USING gin(body);
ANALYZE msgs;
And then run this query:
SELECT body FROM msgs
WHERE body @@ plainto_tsquery('english','string | x')
AND body @@ plainto_tsquery('english','versions | equivalent')
AND body @@ plainto_tsquery('english','usually | contain');
It should run infinitely. I suspect it's not perfectly stable, i.e, the
this query may work fine / another one will block. In that case try to
run this [http://www.fuzzy.cz/tmp/random-queries.sql] - it's a file with
1000 generated queries, at least one of them should block (that's how I
discovered the issue).
regards
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/25/2014 09:44 PM, Tomas Vondra wrote:
This is rather easy to reproduce - download the dump I already provided
two weeks ago [http://www.fuzzy.cz/tmp/message-b.data.gz] and load it
into a simple table:CREATE TABLE msgs (body tsvector);
COPY msgs FROM '/tmp/message-b.data';
CREATE INDEX msgidx ON msgs USING gin(body);
ANALYZE msgs;And then run this query:
SELECT body FROM msgs
WHERE body @@ plainto_tsquery('english','string | x')
AND body @@ plainto_tsquery('english','versions | equivalent')
AND body @@ plainto_tsquery('english','usually | contain');It should run infinitely. I suspect it's not perfectly stable, i.e, the
this query may work fine / another one will block. In that case try to
run this [http://www.fuzzy.cz/tmp/random-queries.sql] - it's a file with
1000 generated queries, at least one of them should block (that's how I
discovered the issue).
Thanks, that's a great test suite! Indeed, it did get stuck for me as well.
I tracked it down to a logic bug in entryGetItem; an && should've been
||. Also, it tickled an assertion in the debug LOG statement that
bleated "entryLoadMoreItems, %u/%u, skip: %d" (I was using
ItemPointerGetBlockNumber, which contains a check that the argument is a
valid item pointer, which it isn't always in this case). Fixed that too.
Attached is a new version of the patch set, with those bugs fixed.
One interesting detail that I noticed while testing that query:
Using EXPLAIN (BUFFERS) shows that we're actually accessing *more* pages
with the patches than without it. The culprit is patch #3, which makes
us re-descend the posting tree from root, rather than just stepping
right from current page. I was very surprised by that at first - the
patch was supposed to *reduce* the number of page's accessed, by not
having to walk through all the leaf pages. But in this case, even when
you're skipping some items, almost always the next item you're
interested in is on the next posting tree page, so re-descending the
tree is a waste and you land on the right sibling of the original page
anyway.
It's not a big waste, though. The upper levels of the tree are almost
certainly in cache, so it's just a matter of some extra lw-locking and
binary searching, which is cheap compared to actually decoding and
copying all the items from the correct page. Indeed, I couldn't see any
meaningful difference in query time with or without the patch. (I'm sure
a different query that allows skipping more would show the patch to be a
win - this was a worst case scenario)
Alexander's patch contained a more complicated method for re-finding the
right leaf page. It ascended the tree back up the same path it was
originally descended, which is potentially faster if the tree is
many-levels deep, and you only skip forward a little. In practice,
posting trees are very compact, so to have a tree taller than 2 levels,
it must contain millions of items. A 4-level tree would be humongous. So
I don't think it's worth the complexity to try anything smarter than
just re-descending from root. However, that difference in implementation
shows up in EXPLAIN (BUFFERS) output; since Alexander's patch kept the
stack of upper pages pinned, ascending and descending the tree did not
increase the counters of pages accessed (I believe; I didn't actually
test it), while descending from root does.
- Heikki
Attachments:
0001-Optimize-GIN-multi-key-queries.patchtext/x-diff; name=0001-Optimize-GIN-multi-key-queries.patchDownload
>From 39bb64a56c43afa45b18b6d04376002005f22769 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 15:41:43 +0200
Subject: [PATCH 1/4] Optimize GIN multi-key queries.
In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'",
as soon as we find the next item that matches the first criteria, we don't
need to check the second criteria for TIDs smaller the first match. That saves
a lot of effort, especially if one of the first term is rare, while the
second occurs very frequently.
Based on ideas from Alexander Korotkov's fast scan patch
---
src/backend/access/gin/ginget.c | 456 ++++++++++++++++++++++------------------
1 file changed, 246 insertions(+), 210 deletions(-)
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4bdbd45..4e4b51a 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -68,29 +68,6 @@ callConsistentFn(GinState *ginstate, GinScanKey key)
}
/*
- * Tries to refind previously taken ItemPointer on a posting page.
- */
-static bool
-needToStepRight(Page page, ItemPointer item)
-{
- if (GinPageGetOpaque(page)->flags & GIN_DELETED)
- /* page was deleted by concurrent vacuum */
- return true;
-
- if (ginCompareItemPointers(item, GinDataPageGetRightBound(page)) > 0
- && !GinPageRightMost(page))
- {
- /*
- * the item we're looking is > the right bound of the page, so it
- * can't be on this page.
- */
- return true;
- }
-
- return false;
-}
-
-/*
* Goes to the next page if current offset is outside of bounds
*/
static bool
@@ -447,8 +424,7 @@ restartScanEntry:
page = BufferGetPage(entry->buffer);
/*
- * Copy page content to memory to avoid keeping it locked for
- * a long time.
+ * Load the first page into memory.
*/
entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
@@ -518,88 +494,76 @@ startScan(IndexScanDesc scan)
}
/*
- * Gets next ItemPointer from PostingTree. Note, that we copy
- * page into GinScanEntry->list array and unlock page, but keep it pinned
- * to prevent interference with vacuum
+ * Load the next batch of item pointers from a posting tree.
+ *
+ * Note that we copy the page into GinScanEntry->list array and unlock it, but
+ * keep it pinned to prevent interference with vacuum.
*/
static void
-entryGetNextItem(GinState *ginstate, GinScanEntry entry)
+entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advancePast)
{
Page page;
int i;
+ LockBuffer(entry->buffer, GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
for (;;)
{
- if (entry->offset < entry->nlist)
+ entry->offset = InvalidOffsetNumber;
+ if (entry->list)
{
- entry->curItem = entry->list[entry->offset++];
- return;
+ pfree(entry->list);
+ entry->list = NULL;
+ entry->nlist = 0;
}
- LockBuffer(entry->buffer, GIN_SHARE);
- page = BufferGetPage(entry->buffer);
- for (;;)
+ /*
+ * We've processed all the entries on this page. If it was the last
+ * page in the tree, we're done.
+ */
+ if (GinPageRightMost(page))
{
- /*
- * It's needed to go by right link. During that we should refind
- * first ItemPointer greater that stored
- */
- if (GinPageRightMost(page))
- {
- UnlockReleaseBuffer(entry->buffer);
- ItemPointerSetInvalid(&entry->curItem);
- entry->buffer = InvalidBuffer;
- entry->isFinished = TRUE;
- return;
- }
+ UnlockReleaseBuffer(entry->buffer);
+ entry->buffer = InvalidBuffer;
+ entry->isFinished = TRUE;
+ return;
+ }
- entry->buffer = ginStepRight(entry->buffer,
- ginstate->index,
- GIN_SHARE);
- page = BufferGetPage(entry->buffer);
+ if (GinPageGetOpaque(page)->flags & GIN_DELETED)
+ continue; /* page was deleted by concurrent vacuum */
- entry->offset = InvalidOffsetNumber;
- if (entry->list)
- {
- pfree(entry->list);
- entry->list = NULL;
- }
+ /*
+ * Step to next page, following the right link. then find the first
+ * ItemPointer greater than advancePast.
+ */
+ entry->buffer = ginStepRight(entry->buffer,
+ ginstate->index,
+ GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
+ /*
+ * The first item > advancePast might not be on this page, but
+ * somewhere to the right, if the page was split. Keep following
+ * the right-links until we re-find the correct page.
+ */
+ if (!GinPageRightMost(page) &&
+ ginCompareItemPointers(&advancePast, GinDataPageGetRightBound(page)) >= 0)
+ {
/*
- * If the page was concurrently split, we have to re-find the
- * item we were stopped on. If the page was split more than once,
- * the item might not be on this page, but somewhere to the right.
- * Keep following the right-links until we re-find the correct
- * page.
+ * the item we're looking is > the right bound of the page, so it
+ * can't be on this page.
*/
- if (ItemPointerIsValid(&entry->curItem) &&
- needToStepRight(page, &entry->curItem))
- {
- continue;
- }
+ continue;
+ }
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
- /* re-find the item we were stopped on. */
- if (ItemPointerIsValid(&entry->curItem))
- {
- for (i = 0; i < entry->nlist; i++)
- {
- if (ginCompareItemPointers(&entry->curItem,
- &entry->list[i]) < 0)
- {
- LockBuffer(entry->buffer, GIN_UNLOCK);
- entry->offset = i + 1;
- entry->curItem = entry->list[entry->offset - 1];
- return;
- }
- }
- }
- else
+ for (i = 0; i < entry->nlist; i++)
+ {
+ if (ginCompareItemPointers(&advancePast, &entry->list[i]) < 0)
{
LockBuffer(entry->buffer, GIN_UNLOCK);
- entry->offset = 1; /* scan all items on the page. */
- entry->curItem = entry->list[entry->offset - 1];
+ entry->offset = i;
return;
}
}
@@ -610,10 +574,10 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry)
#define dropItem(e) ( gin_rand() > ((double)GinFuzzySearchLimit)/((double)((e)->predictNumberResult)) )
/*
- * Sets entry->curItem to next heap item pointer for one entry of one scan key,
- * or sets entry->isFinished to TRUE if there are no more.
+ * Sets entry->curItem to next heap item pointer > advancePast, for one entry
+ * of one scan key, or sets entry->isFinished to TRUE if there are no more.
*
- * Item pointers must be returned in ascending order.
+ * Item pointers are returned in ascending order.
*
* Note: this can return a "lossy page" item pointer, indicating that the
* entry potentially matches all items on that heap page. However, it is
@@ -623,12 +587,20 @@ entryGetNextItem(GinState *ginstate, GinScanEntry entry)
* current implementation this is guaranteed by the behavior of tidbitmaps.
*/
static void
-entryGetItem(GinState *ginstate, GinScanEntry entry)
+entryGetItem(GinState *ginstate, GinScanEntry entry,
+ ItemPointerData advancePast)
{
Assert(!entry->isFinished);
+ Assert(!ItemPointerIsValid(&entry->curItem) ||
+ ginCompareItemPointers(&entry->curItem, &advancePast) <= 0);
+
if (entry->matchBitmap)
{
+ /* A bitmap result */
+ BlockNumber advancePastBlk = GinItemPointerGetBlockNumber(&advancePast);
+ OffsetNumber advancePastOff = GinItemPointerGetOffsetNumber(&advancePast);
+
do
{
if (entry->matchResult == NULL ||
@@ -646,6 +618,18 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
}
/*
+ * If all the matches on this page are <= advancePast, skip
+ * to next page.
+ */
+ if (entry->matchResult->blockno < advancePastBlk ||
+ (entry->matchResult->blockno == advancePastBlk &&
+ entry->matchResult->offsets[entry->offset] <= advancePastOff))
+ {
+ entry->offset = entry->matchResult->ntuples;
+ continue;
+ }
+
+ /*
* Reset counter to the beginning of entry->matchResult. Note:
* entry->offset is still greater than matchResult->ntuples if
* matchResult is lossy. So, on next call we will get next
@@ -670,6 +654,17 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
break;
}
+ if (entry->matchResult->blockno == advancePastBlk)
+ {
+ /*
+ * Skip to the right offset on this page. We already checked
+ * in above loop that there is at least one item > advancePast
+ * on the page.
+ */
+ while (entry->matchResult->offsets[entry->offset] <= advancePastOff)
+ entry->offset++;
+ }
+
ItemPointerSet(&entry->curItem,
entry->matchResult->blockno,
entry->matchResult->offsets[entry->offset]);
@@ -678,29 +673,48 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
}
else if (!BufferIsValid(entry->buffer))
{
- entry->offset++;
- if (entry->offset <= entry->nlist)
- entry->curItem = entry->list[entry->offset - 1];
- else
+ /* A posting list from an entry tuple */
+ do
{
- ItemPointerSetInvalid(&entry->curItem);
- entry->isFinished = TRUE;
- }
+ if (entry->offset >= entry->nlist)
+ {
+ ItemPointerSetInvalid(&entry->curItem);
+ entry->isFinished = TRUE;
+ break;
+ }
+
+ entry->curItem = entry->list[entry->offset++];
+ } while (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0);
+ /* XXX: shouldn't we apply the fuzzy search limit here? */
}
else
{
+ /* A posting tree */
do
{
- entryGetNextItem(ginstate, entry);
- } while (entry->isFinished == FALSE &&
- entry->reduceResult == TRUE &&
- dropItem(entry));
+ /* If we've processed the current batch, load more items */
+ while (entry->offset >= entry->nlist)
+ {
+ entryLoadMoreItems(ginstate, entry, advancePast);
+
+ if (entry->isFinished)
+ {
+ ItemPointerSetInvalid(&entry->curItem);
+ return;
+ }
+ }
+
+ entry->curItem = entry->list[entry->offset++];
+
+ } while (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0 ||
+ (entry->reduceResult == TRUE && dropItem(entry)));
}
}
/*
- * Identify the "current" item among the input entry streams for this scan key,
- * and test whether it passes the scan key qual condition.
+ * Identify the "current" item among the input entry streams for this scan key
+ * that is greater than advancePast, and test whether it passes the scan key
+ * qual condition.
*
* The current item is the smallest curItem among the inputs. key->curItem
* is set to that value. key->curItemMatches is set to indicate whether that
@@ -719,7 +733,8 @@ entryGetItem(GinState *ginstate, GinScanEntry entry)
* logic in scanGetItem.)
*/
static void
-keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
+keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
+ ItemPointerData advancePast)
{
ItemPointerData minItem;
ItemPointerData curPageLossy;
@@ -729,11 +744,20 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
GinScanEntry entry;
bool res;
MemoryContext oldCtx;
+ bool allFinished;
Assert(!key->isFinished);
/*
- * Find the minimum of the active entry curItems.
+ * We might have already tested this item; if so, no need to repeat work.
+ * (Note: the ">" case can happen, if minItem is exact but we previously
+ * had to set curItem to a lossy-page pointer.)
+ */
+ if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
+ return;
+
+ /*
+ * Find the minimum item > advancePast among the active entry streams.
*
* Note: a lossy-page entry is encoded by a ItemPointer with max value for
* offset (0xffff), so that it will sort after any exact entries for the
@@ -741,16 +765,33 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
* pointers, which is good.
*/
ItemPointerSetMax(&minItem);
-
+ allFinished = true;
for (i = 0; i < key->nentries; i++)
{
entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &minItem) < 0)
- minItem = entry->curItem;
+
+ /*
+ * Advance this stream if necessary.
+ *
+ * In particular, since entry->curItem was initialized with
+ * ItemPointerSetMin, this ensures we fetch the first item for each
+ * entry on the first call.
+ */
+ while (entry->isFinished == FALSE &&
+ ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ entryGetItem(ginstate, entry, advancePast);
+ }
+
+ if (!entry->isFinished)
+ {
+ allFinished = FALSE;
+ if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+ minItem = entry->curItem;
+ }
}
- if (ItemPointerIsMax(&minItem))
+ if (allFinished)
{
/* all entries are finished */
key->isFinished = TRUE;
@@ -758,15 +799,7 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
}
/*
- * We might have already tested this item; if so, no need to repeat work.
- * (Note: the ">" case can happen, if minItem is exact but we previously
- * had to set curItem to a lossy-page pointer.)
- */
- if (ginCompareItemPointers(&key->curItem, &minItem) >= 0)
- return;
-
- /*
- * OK, advance key->curItem and perform consistentFn test.
+ * OK, set key->curItem and perform consistentFn test.
*/
key->curItem = minItem;
@@ -895,117 +928,120 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key)
* keyGetItem() the combination logic is known only to the consistentFn.
*/
static bool
-scanGetItem(IndexScanDesc scan, ItemPointer advancePast,
+scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
ItemPointerData *item, bool *recheck)
{
GinScanOpaque so = (GinScanOpaque) scan->opaque;
- GinState *ginstate = &so->ginstate;
- ItemPointerData myAdvancePast = *advancePast;
uint32 i;
- bool allFinished;
bool match;
- for (;;)
+ /*----------
+ * Advance the scan keys in lock-step, until we find an item that
+ * matches all the keys. If any key reports isFinished, meaning its
+ * subset of the entries is exhausted, we can stop. Otherwise, set
+ * *item to the next matching item.
+ *
+ * Now *item contains the first ItemPointer after previous result that
+ * passed the consistentFn check for that exact TID, or a lossy reference
+ * to the same page.
+ *
+ * This logic works only if a keyGetItem stream can never contain both
+ * exact and lossy pointers for the same page. Else we could have a
+ * case like
+ *
+ * stream 1 stream 2
+ * ... ...
+ * 42/6 42/7
+ * 50/1 42/0xffff
+ * ... ...
+ *
+ * We would conclude that 42/6 is not a match and advance stream 1,
+ * thus never detecting the match to the lossy pointer in stream 2.
+ * (keyGetItem has a similar problem versus entryGetItem.)
+ *----------
+ */
+ ItemPointerSetMin(item);
+ do
{
- /*
- * Advance any entries that are <= myAdvancePast. In particular,
- * since entry->curItem was initialized with ItemPointerSetMin, this
- * ensures we fetch the first item for each entry on the first call.
- */
- allFinished = TRUE;
-
- for (i = 0; i < so->totalentries; i++)
- {
- GinScanEntry entry = so->entries[i];
-
- while (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem,
- &myAdvancePast) <= 0)
- entryGetItem(ginstate, entry);
-
- if (entry->isFinished == FALSE)
- allFinished = FALSE;
- }
-
- if (allFinished)
- {
- /* all entries exhausted, so we're done */
- return false;
- }
-
- /*
- * Perform the consistentFn test for each scan key. If any key
- * reports isFinished, meaning its subset of the entries is exhausted,
- * we can stop. Otherwise, set *item to the minimum of the key
- * curItems.
- */
- ItemPointerSetMax(item);
-
- for (i = 0; i < so->nkeys; i++)
+ match = true;
+ for (i = 0; i < so->nkeys && match; i++)
{
GinScanKey key = so->keys + i;
- keyGetItem(&so->ginstate, so->tempCtx, key);
+ /* Fetch the next item for this key. */
+ keyGetItem(&so->ginstate, so->tempCtx, key, advancePast);
if (key->isFinished)
- return false; /* finished one of keys */
-
- if (ginCompareItemPointers(&key->curItem, item) < 0)
- *item = key->curItem;
- }
+ return false;
- Assert(!ItemPointerIsMax(item));
+ /*
+ * If it's not a match, we can immediately conclude that nothing
+ * <= this item matches, without checking the rest of the keys.
+ */
+ if (!key->curItemMatches)
+ {
+ advancePast = key->curItem;
+ match = false;
+ break;
+ }
- /*----------
- * Now *item contains first ItemPointer after previous result.
- *
- * The item is a valid hit only if all the keys succeeded for either
- * that exact TID, or a lossy reference to the same page.
- *
- * This logic works only if a keyGetItem stream can never contain both
- * exact and lossy pointers for the same page. Else we could have a
- * case like
- *
- * stream 1 stream 2
- * ... ...
- * 42/6 42/7
- * 50/1 42/0xffff
- * ... ...
- *
- * We would conclude that 42/6 is not a match and advance stream 1,
- * thus never detecting the match to the lossy pointer in stream 2.
- * (keyGetItem has a similar problem versus entryGetItem.)
- *----------
- */
- match = true;
- for (i = 0; i < so->nkeys; i++)
- {
- GinScanKey key = so->keys + i;
+ /*
+ * It's a match. We can conclude that nothing < matches, so
+ * the other key streams can skip to this item.
+ * Beware of lossy pointers, though; for a lossy pointer, we
+ * can only conclude that nothing smaller than this *page*
+ * matches.
+ */
+ advancePast = key->curItem;
+ if (ItemPointerIsLossyPage(&advancePast))
+ {
+ advancePast.ip_posid = 0;
+ }
+ else
+ {
+ Assert(advancePast.ip_posid > 0);
+ advancePast.ip_posid--;
+ }
- if (key->curItemMatches)
+ /*
+ * If this is the first key, remember this location as a
+ * potential match.
+ *
+ * Otherwise, check if this is the same item that we checked the
+ * previous keys for (or a lossy pointer for the same page). If
+ * not, loop back to check the previous keys for this item (we
+ * will check this key again too, but keyGetItem returns quickly
+ * for that)
+ */
+ if (i == 0)
{
- if (ginCompareItemPointers(item, &key->curItem) == 0)
- continue;
- if (ItemPointerIsLossyPage(&key->curItem) &&
- GinItemPointerGetBlockNumber(&key->curItem) ==
- GinItemPointerGetBlockNumber(item))
- continue;
+ *item = key->curItem;
+ }
+ else
+ {
+ if (ItemPointerIsLossyPage(&key->curItem) ||
+ ItemPointerIsLossyPage(item))
+ {
+ Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
+ match = (GinItemPointerGetBlockNumber(&key->curItem) ==
+ GinItemPointerGetBlockNumber(item));
+ }
+ else
+ {
+ Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
+ match = (ginCompareItemPointers(&key->curItem, item) == 0);
+ }
}
- match = false;
- break;
}
+ } while (!match);
- if (match)
- break;
-
- /*
- * No hit. Update myAdvancePast to this TID, so that on the next pass
- * we'll move to the next possible entry.
- */
- myAdvancePast = *item;
- }
+ Assert(!ItemPointerIsMin(item));
/*
+ * Now *item contains the first ItemPointer after previous result that
+ * passed the consistentFn check for that exact TID, or a lossy reference
+ * to the same page.
+ *
* We must return recheck = true if any of the keys are marked recheck.
*/
*recheck = false;
@@ -1536,7 +1572,7 @@ gingetbitmap(PG_FUNCTION_ARGS)
{
CHECK_FOR_INTERRUPTS();
- if (!scanGetItem(scan, &iptr, &iptr, &recheck))
+ if (!scanGetItem(scan, iptr, &iptr, &recheck))
break;
if (ItemPointerIsLossyPage(&iptr))
--
1.8.5.2
0002-Further-optimize-the-multi-key-GIN-searches.patchtext/x-diff; name=0002-Further-optimize-the-multi-key-GIN-searches.patchDownload
>From 85e27d2aa08d134e03cb81026111c890c4778fb0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 15:47:54 +0200
Subject: [PATCH 2/4] Further optimize the multi-key GIN searches.
If we're skipping past a certain TID, avoid decoding posting list segments
that only contain smaller TIDs.
---
src/backend/access/gin/gindatapage.c | 32 +++++++++++++++++++++++++++++---
src/backend/access/gin/ginget.c | 6 ++++--
src/include/access/gin_private.h | 2 +-
3 files changed, 34 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 91934f0..534dae3 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -97,18 +97,44 @@ static void dataPlaceToPageLeafSplit(Buffer buf,
/*
* Read all TIDs from leaf data page to single uncompressed array.
+ *
+ * If advancePast is valid, the caller is only interested in TIDs > advancePast.
+ * This function can still return items smaller than that, so the caller
+ * must still check them, but passing it allows this function to skip some
+ * items as an optimization.
*/
ItemPointer
-GinDataLeafPageGetItems(Page page, int *nitems)
+GinDataLeafPageGetItems(Page page, int *nitems, ItemPointerData advancePast)
{
ItemPointer result;
if (GinPageIsCompressed(page))
{
- GinPostingList *ptr = GinDataLeafPageGetPostingList(page);
+ GinPostingList *seg = GinDataLeafPageGetPostingList(page);
Size len = GinDataLeafPageGetPostingListSize(page);
+ Pointer endptr = ((Pointer) seg) + len;
+ GinPostingList *next;
- result = ginPostingListDecodeAllSegments(ptr, len, nitems);
+ /* Skip to the segment containing advancePast+1 */
+ if (ItemPointerIsValid(&advancePast))
+ {
+ next = GinNextPostingListSegment(seg);
+ while ((Pointer) next < endptr &&
+ ginCompareItemPointers(&next->first, &advancePast) <= 0)
+ {
+ seg = next;
+ next = GinNextPostingListSegment(seg);
+ }
+ len = endptr - (Pointer) seg;
+ }
+
+ if (len > 0)
+ result = ginPostingListDecodeAllSegments(seg, len, nitems);
+ else
+ {
+ result = palloc(0);
+ *nitems = 0;
+ }
}
else
{
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4e4b51a..e303700 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -400,6 +400,7 @@ restartScanEntry:
BlockNumber rootPostingTree = GinGetPostingTree(itup);
GinBtreeStack *stack;
Page page;
+ ItemPointerData minItem;
/*
* We should unlock entry page before touching posting tree to
@@ -426,7 +427,8 @@ restartScanEntry:
/*
* Load the first page into memory.
*/
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ ItemPointerSetMin(&minItem);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist, minItem);
entry->predictNumberResult = stack->predictNumber * entry->nlist;
@@ -556,7 +558,7 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
continue;
}
- entry->list = GinDataLeafPageGetItems(page, &entry->nlist);
+ entry->list = GinDataLeafPageGetItems(page, &entry->nlist, advancePast);
for (i = 0; i < entry->nlist; i++)
{
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 3f92c37..8c350b9 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -692,7 +692,7 @@ extern ItemPointer ginReadTuple(GinState *ginstate, OffsetNumber attnum,
IndexTuple itup, int *nitems);
/* gindatapage.c */
-extern ItemPointer GinDataLeafPageGetItems(Page page, int *nitems);
+extern ItemPointer GinDataLeafPageGetItems(Page page, int *nitems, ItemPointerData advancePast);
extern int GinDataLeafPageGetItemsToTbm(Page page, TIDBitmap *tbm);
extern BlockNumber createPostingTree(Relation index,
ItemPointerData *items, uint32 nitems,
--
1.8.5.2
0003-Further-optimize-GIN-multi-key-searches.patchtext/x-diff; name=0003-Further-optimize-GIN-multi-key-searches.patchDownload
>From 13792c0d32fc97f2b0a4ff6543858f0ef0d8c2a7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 16:55:51 +0200
Subject: [PATCH 3/4] Further optimize GIN multi-key searches.
When skipping over some items in a posting tree, re-find the new location
by descending the tree from root, rather than walking the right links.
This can save a lot of I/O.
---
src/backend/access/gin/gindatapage.c | 9 ++--
src/backend/access/gin/ginget.c | 90 +++++++++++++++++++++++++++---------
src/include/access/gin_private.h | 3 +-
3 files changed, 75 insertions(+), 27 deletions(-)
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 534dae3..2f86c6a 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -1635,16 +1635,15 @@ ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
* Starts a new scan on a posting tree.
*/
GinBtreeStack *
-ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno)
+ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno)
{
- GinBtreeData btree;
GinBtreeStack *stack;
- ginPrepareDataScan(&btree, index, rootBlkno);
+ ginPrepareDataScan(btree, index, rootBlkno);
- btree.fullScan = TRUE;
+ btree->fullScan = TRUE;
- stack = ginFindLeafPage(&btree, TRUE);
+ stack = ginFindLeafPage(btree, TRUE);
return stack;
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index e303700..4285a03 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -99,12 +99,13 @@ static void
scanPostingTree(Relation index, GinScanEntry scanEntry,
BlockNumber rootPostingTree)
{
+ GinBtreeData btree;
GinBtreeStack *stack;
Buffer buffer;
Page page;
/* Descend to the leftmost leaf page */
- stack = ginScanBeginPostingTree(index, rootPostingTree);
+ stack = ginScanBeginPostingTree(&btree, index, rootPostingTree);
buffer = stack->buffer;
IncrBufferRefCount(buffer); /* prevent unpin in freeGinBtreeStack */
@@ -412,7 +413,8 @@ restartScanEntry:
LockBuffer(stackEntry->buffer, GIN_UNLOCK);
needUnlock = FALSE;
- stack = ginScanBeginPostingTree(ginstate->index, rootPostingTree);
+ stack = ginScanBeginPostingTree(&entry->btree, ginstate->index,
+ rootPostingTree);
entry->buffer = stack->buffer;
/*
@@ -506,8 +508,50 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
{
Page page;
int i;
+ bool stepright;
+
+ /*
+ * We have two strategies for finding the correct page: step right from
+ * the current page, or descend the tree again from the root. If
+ * advancePast equals the current item, the next matching item should be
+ * on the next page, so we step right. Otherwise, descend from root.
+ */
+ if (ginCompareItemPointers(&entry->curItem, &advancePast) == 0)
+ {
+ stepright = true;
+ LockBuffer(entry->buffer, GIN_SHARE);
+ }
+ else
+ {
+ GinBtreeStack *stack;
+
+ ReleaseBuffer(entry->buffer);
+
+ /*
+ * Set the search key, and find the correct leaf page.
+ *
+ * XXX: This is off by one, we're searching for an item > advancePast,
+ * but we're asking the tree for the next item >= advancePast. It only
+ * makes a difference in the corner case that advancePast is the
+ * right bound of a page, in which case we'll scan one page
+ * unnecessarily. Other than that it's harmless.
+ */
+ entry->btree.itemptr = advancePast;
+ entry->btree.fullScan = false;
+ stack = ginFindLeafPage(&entry->btree, true);
+
+ /* we don't need the stack, just the buffer. */
+ entry->buffer = stack->buffer;
+ IncrBufferRefCount(entry->buffer);
+ freeGinBtreeStack(stack);
+ stepright = false;
+ }
+
+ elog(DEBUG2, "entryLoadMoreItems, %u/%u, skip: %d",
+ GinItemPointerGetBlockNumber(&advancePast),
+ GinItemPointerGetOffsetNumber(&advancePast),
+ !stepright);
- LockBuffer(entry->buffer, GIN_SHARE);
page = BufferGetPage(entry->buffer);
for (;;)
{
@@ -519,31 +563,35 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
entry->nlist = 0;
}
- /*
- * We've processed all the entries on this page. If it was the last
- * page in the tree, we're done.
- */
- if (GinPageRightMost(page))
+ if (stepright)
{
- UnlockReleaseBuffer(entry->buffer);
- entry->buffer = InvalidBuffer;
- entry->isFinished = TRUE;
- return;
+ /*
+ * We've processed all the entries on this page. If it was the last
+ * page in the tree, we're done.
+ */
+ if (GinPageRightMost(page))
+ {
+ UnlockReleaseBuffer(entry->buffer);
+ entry->buffer = InvalidBuffer;
+ entry->isFinished = TRUE;
+ return;
+ }
+
+ /*
+ * Step to next page, following the right link. then find the first
+ * ItemPointer greater than advancePast.
+ */
+ entry->buffer = ginStepRight(entry->buffer,
+ ginstate->index,
+ GIN_SHARE);
+ page = BufferGetPage(entry->buffer);
}
+ stepright = true;
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
continue; /* page was deleted by concurrent vacuum */
/*
- * Step to next page, following the right link. then find the first
- * ItemPointer greater than advancePast.
- */
- entry->buffer = ginStepRight(entry->buffer,
- ginstate->index,
- GIN_SHARE);
- page = BufferGetPage(entry->buffer);
-
- /*
* The first item > advancePast might not be on this page, but
* somewhere to the right, if the page was split. Keep following
* the right-links until we re-find the correct page.
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 8c350b9..a12dfc3 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -703,7 +703,7 @@ extern void GinPageDeletePostingItem(Page page, OffsetNumber offset);
extern void ginInsertItemPointers(Relation index, BlockNumber rootBlkno,
ItemPointerData *items, uint32 nitem,
GinStatsData *buildStats);
-extern GinBtreeStack *ginScanBeginPostingTree(Relation index, BlockNumber rootBlkno);
+extern GinBtreeStack *ginScanBeginPostingTree(GinBtree btree, Relation index, BlockNumber rootBlkno);
extern void ginDataFillRoot(GinBtree btree, Page root, BlockNumber lblkno, Page lpage, BlockNumber rblkno, Page rpage);
extern void ginPrepareDataScan(GinBtree btree, Relation index, BlockNumber rootBlkno);
@@ -803,6 +803,7 @@ typedef struct GinScanEntryData
bool isFinished;
bool reduceResult;
uint32 predictNumberResult;
+ GinBtreeData btree;
} GinScanEntryData;
typedef struct GinScanOpaqueData
--
1.8.5.2
0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patchtext/x-diff; name=0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patchDownload
>From c9087c8d5d3501deceb433966206d4d69e135042 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Thu, 23 Jan 2014 23:08:43 +0200
Subject: [PATCH 4/4] Add the concept of a ternary consistent check, and use it
to skip entries.
When we have loaded the next item from some, but not all, entries in a scan,
it might be possible to prove that there cannot be any matches with smaller
item pointer coming from the other entries. In that case, we can
fast-forward those entries to the smallest item among the already-fetched
sources.
There is no support for opclass-defined ternary consistent functions yet,
but there is a shim function that calls the regular, boolean, consistent
function "both ways", when only one input is unknown.
Per the concept by Alexander Korotkov
---
src/backend/access/gin/Makefile | 2 +-
src/backend/access/gin/ginget.c | 414 ++++++++++++++++++++++----------------
src/backend/access/gin/ginlogic.c | 136 +++++++++++++
src/include/access/gin_private.h | 23 ++-
4 files changed, 397 insertions(+), 178 deletions(-)
create mode 100644 src/backend/access/gin/ginlogic.c
diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile
index aabc62f..db4f496 100644
--- a/src/backend/access/gin/Makefile
+++ b/src/backend/access/gin/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \
ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \
- ginbulk.o ginfast.o ginpostinglist.o
+ ginbulk.o ginfast.o ginpostinglist.o ginlogic.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 4285a03..f2f9dc6 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -32,41 +32,6 @@ typedef struct pendingPosition
bool *hasMatchKey;
} pendingPosition;
-
-/*
- * Convenience function for invoking a key's consistentFn
- */
-static bool
-callConsistentFn(GinState *ginstate, GinScanKey key)
-{
- /*
- * If we're dealing with a dummy EVERYTHING key, we don't want to call the
- * consistentFn; just claim it matches.
- */
- if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
- {
- key->recheckCurItem = false;
- return true;
- }
-
- /*
- * Initialize recheckCurItem in case the consistentFn doesn't know it
- * should set it. The safe assumption in that case is to force recheck.
- */
- key->recheckCurItem = true;
-
- return DatumGetBool(FunctionCall8Coll(&ginstate->consistentFn[key->attnum - 1],
- ginstate->supportCollation[key->attnum - 1],
- PointerGetDatum(key->entryRes),
- UInt16GetDatum(key->strategy),
- key->query,
- UInt32GetDatum(key->nuserentries),
- PointerGetDatum(key->extra_data),
- PointerGetDatum(&key->recheckCurItem),
- PointerGetDatum(key->queryValues),
- PointerGetDatum(key->queryCategories)));
-}
-
/*
* Goes to the next page if current offset is outside of bounds
*/
@@ -460,6 +425,8 @@ startScanKey(GinState *ginstate, GinScanKey key)
key->curItemMatches = false;
key->recheckCurItem = false;
key->isFinished = false;
+
+ GinInitConsistentMethod(ginstate, key);
}
static void
@@ -789,18 +756,19 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
ItemPointerData minItem;
ItemPointerData curPageLossy;
uint32 i;
- uint32 lossyEntry;
bool haveLossyEntry;
GinScanEntry entry;
- bool res;
MemoryContext oldCtx;
bool allFinished;
+ bool allUnknown;
+ int minUnknown;
+ GinLogicValue res;
Assert(!key->isFinished);
/*
* We might have already tested this item; if so, no need to repeat work.
- * (Note: the ">" case can happen, if minItem is exact but we previously
+ * (Note: the ">" case can happen, if advancePast is exact but we previously
* had to set curItem to a lossy-page pointer.)
*/
if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
@@ -814,155 +782,256 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
* same page. So we'll prefer to return exact pointers not lossy
* pointers, which is good.
*/
- ItemPointerSetMax(&minItem);
- allFinished = true;
- for (i = 0; i < key->nentries; i++)
+ oldCtx = CurrentMemoryContext;
+
+ for (;;)
{
- entry = key->scanEntry[i];
+ ItemPointerSetMax(&minItem);
+ allFinished = true;
+ allUnknown = true;
+ minUnknown = -1;
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
- /*
- * Advance this stream if necessary.
- *
- * In particular, since entry->curItem was initialized with
- * ItemPointerSetMin, this ensures we fetch the first item for each
- * entry on the first call.
- */
- while (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ if (entry->isFinished)
+ continue;
+ allFinished = false;
+
+ if (!entry->isFinished &&
+ ginCompareItemPointers(&entry->curItem, &advancePast) > 0)
+ {
+ allUnknown = false;
+ if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
+ minItem = entry->curItem;
+ }
+ else if (minUnknown == -1)
+ minUnknown = i;
+ }
+
+ if (allFinished)
{
- entryGetItem(ginstate, entry, advancePast);
+ /* all entries are finished */
+ key->isFinished = TRUE;
+ return;
}
- if (!entry->isFinished)
+ if (allUnknown)
{
- allFinished = FALSE;
- if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
- minItem = entry->curItem;
+ /*
+ * We must have an item from at least one source to have a match.
+ * Fetch the next item > advancePast from the first (non-finished)
+ * entry stream.
+ */
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
}
- }
- if (allFinished)
- {
- /* all entries are finished */
- key->isFinished = TRUE;
- return;
- }
+ /*
+ * We now have minItem set to the minimum among input streams *that*
+ * we know. Some streams might be in unknown state, meaning we don't
+ * know the next value from that input.
+ *
+ * Determine if any items between advancePast and minItem might match.
+ * Such items might come from one of the unknown sources, but it's
+ * possible that the consistent function can refute them all, ie.
+ * the consistent logic says that they cannot match without any of the
+ * sources that we have loaded.
+ */
+ if (minUnknown != -1)
+ {
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ /* this source is 'unloaded' */
+ key->entryRes[i] = GIN_MAYBE;
+ }
+ else
+ {
+ /*
+ * we know the next item from this source to be >= minItem,
+ * hence it's false for any items before < minItem
+ */
+ key->entryRes[i] = GIN_FALSE;
+ }
+ }
- /*
- * OK, set key->curItem and perform consistentFn test.
- */
- key->curItem = minItem;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- /*
- * Lossy-page entries pose a problem, since we don't know the correct
- * entryRes state to pass to the consistentFn, and we also don't know what
- * its combining logic will be (could be AND, OR, or even NOT). If the
- * logic is OR then the consistentFn might succeed for all items in the
- * lossy page even when none of the other entries match.
- *
- * If we have a single lossy-page entry then we check to see if the
- * consistentFn will succeed with only that entry TRUE. If so, we return
- * a lossy-page pointer to indicate that the whole heap page must be
- * checked. (On subsequent calls, we'll do nothing until minItem is past
- * the page altogether, thus ensuring that we never return both regular
- * and lossy pointers for the same page.)
- *
- * This idea could be generalized to more than one lossy-page entry, but
- * ideally lossy-page entries should be infrequent so it would seldom be
- * the case that we have more than one at once. So it doesn't seem worth
- * the extra complexity to optimize that case. If we do find more than
- * one, we just punt and return a lossy-page pointer always.
- *
- * Note that only lossy-page entries pointing to the current item's page
- * should trigger this processing; we might have future lossy pages in the
- * entry array, but they aren't relevant yet.
- */
- ItemPointerSetLossyPage(&curPageLossy,
- GinItemPointerGetBlockNumber(&key->curItem));
+ if (res == GIN_FALSE)
+ {
+ /*
+ * All items between advancePast and minItem have been refuted.
+ * Proceed with minItem.
+ */
+ advancePast = minItem;
+ advancePast.ip_posid--;
+ }
+ else
+ {
+ /*
+ * There might be matches smaller than minItem coming from one
+ * of the unknown sources. Load more items, and retry.
+ */
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
+ }
+ }
- lossyEntry = 0;
- haveLossyEntry = false;
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ /*
+ * Ok, we now know that there are no matches < minItem. Proceed to
+ * check if it's a match.
+ */
+ key->curItem = minItem;
+ ItemPointerSetLossyPage(&curPageLossy,
+ GinItemPointerGetBlockNumber(&minItem));
+
+ /*
+ * Lossy-page entries pose a problem, since we don't know the correct
+ * entryRes state to pass to the consistentFn, and we also don't know
+ * what its combining logic will be (could be AND, OR, or even NOT).
+ * If the logic is OR then the consistentFn might succeed for all items
+ * in the lossy page even when none of the other entries match.
+ *
+ * Our strategy is to call the tri-state consistent function, with the
+ * lossy-page entries set to MAYBE, and all the other entries FALSE.
+ * If it returns FALSE, none of the lossy items alone are enough for a
+ * match, so we don't need to return a lossy-page pointer. Otherwise,
+ * return a lossy-page pointer to indicate that the whole heap page must
+ * be checked. (On subsequent calls, we'll do nothing until minItem is
+ * past the page altogether, thus ensuring that we never return both
+ * regular and lossy pointers for the same page.)
+ *
+ * An exception is that we don't need to try it both ways (ie. pass
+ * MAYBE) if the lossy pointer is in a "hidden" entry, because the
+ * consistentFn's result can't depend on that (but mark the result as
+ * 'recheck').
+ *
+ * Note that only lossy-page entries pointing to the current item's
+ * page should trigger this processing; we might have future lossy
+ * pages in the entry array, but they aren't relevant yet.
+ */
+ haveLossyEntry = false;
+ for (i = 0; i < key->nentries; i++)
{
- if (haveLossyEntry)
+ entry = key->scanEntry[i];
+ if (entry->isFinished == FALSE &&
+ ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
{
- /* Multiple lossy entries, punt */
+ key->entryRes[i] = GIN_MAYBE;
+ haveLossyEntry = true;
+ }
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
+
+ if (haveLossyEntry)
+ {
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
+
+ if (res == GIN_TRUE || res == GIN_MAYBE)
+ {
+ /* Some of the lossy items on the heap page might match, punt */
key->curItem = curPageLossy;
key->curItemMatches = true;
key->recheckCurItem = true;
return;
}
- lossyEntry = i;
- haveLossyEntry = true;
}
- }
- /* prepare for calling consistentFn in temp context */
- oldCtx = MemoryContextSwitchTo(tempCtx);
+ /*
+ * Let's call the consistent function to check if this is a match.
+ *
+ * At this point we know that we don't need to return a lossy
+ * whole-page pointer, but we might have matches for individual exact
+ * item pointers, possibly in combination with a lossy pointer. Pass
+ * lossy pointers as MAYBE to the ternary consistent function, to
+ * let it decide if this tuple satisfies the overall key, even though
+ * we don't know whether the lossy entries match.
+ *
+ * We might also not have advanced all the entry streams up to this
+ * point yet. It's possible that the consistent function can
+ * nevertheless decide that this is definitely a match or not a match,
+ * even though we don't know if those unknown entries match, so we
+ * pass them as MAYBE.
+ */
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ key->entryRes[i] = GIN_MAYBE; /* not loaded yet */
+ else if (ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ key->entryRes[i] = GIN_MAYBE;
+ else if (ginCompareItemPointers(&entry->curItem, &minItem) == 0)
+ key->entryRes[i] = GIN_TRUE;
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
- if (haveLossyEntry)
- {
- /* Single lossy-page entry, so see if whole page matches */
- memset(key->entryRes, FALSE, key->nentries);
- key->entryRes[lossyEntry] = TRUE;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- if (callConsistentFn(ginstate, key))
+ switch (res)
{
- /* Yes, so clean up ... */
- MemoryContextSwitchTo(oldCtx);
- MemoryContextReset(tempCtx);
-
- /* and return lossy pointer for whole page */
- key->curItem = curPageLossy;
- key->curItemMatches = true;
- key->recheckCurItem = true;
- return;
- }
- }
+ case GIN_TRUE:
+ key->curItemMatches = true;
+ /* triConsistentFn set recheckCurItem */
+ break;
- /*
- * At this point we know that we don't need to return a lossy whole-page
- * pointer, but we might have matches for individual exact item pointers,
- * possibly in combination with a lossy pointer. Our strategy if there's
- * a lossy pointer is to try the consistentFn both ways and return a hit
- * if it accepts either one (forcing the hit to be marked lossy so it will
- * be rechecked). An exception is that we don't need to try it both ways
- * if the lossy pointer is in a "hidden" entry, because the consistentFn's
- * result can't depend on that.
- *
- * Prepare entryRes array to be passed to consistentFn.
- */
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &key->curItem) == 0)
- key->entryRes[i] = TRUE;
- else
- key->entryRes[i] = FALSE;
- }
- if (haveLossyEntry)
- key->entryRes[lossyEntry] = TRUE;
+ case GIN_FALSE:
+ key->curItemMatches = false;
+ break;
- res = callConsistentFn(ginstate, key);
+ case GIN_MAYBE:
+ /*
+ * The consistent function cannot decide with the information
+ * we've got. If there are any "unknown" sources left, advance
+ * one of them and try again, in the hope that it can decide
+ * with the extra information.
+ */
+ if (minUnknown != -1)
+ {
+ entry = key->scanEntry[minUnknown];
+ entryGetItem(ginstate, entry, advancePast);
+ continue;
+ }
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
- if (!res && haveLossyEntry && lossyEntry < key->nuserentries)
- {
- /* try the other way for the lossy item */
- key->entryRes[lossyEntry] = FALSE;
+ default:
+ /*
+ * the 'default' case shouldn't happen, but if the consistent
+ * function returns something bogus, this is the safe result
+ */
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
+ }
- res = callConsistentFn(ginstate, key);
+ /*
+ * We have a tuple, and we know if it mathes or not. If it's a
+ * non-match, we could continue to find the next matching tuple, but
+ * let's break out and give scanGetItem a chance to advance the other
+ * keys. They might be able to skip past to a much higher TID, allowing
+ * us to save work.
+ */
+ break;
}
- key->curItemMatches = res;
- /* If we matched a lossy entry, force recheckCurItem = true */
- if (haveLossyEntry)
- key->recheckCurItem = true;
-
/* clean up after consistentFn calls */
MemoryContextSwitchTo(oldCtx);
MemoryContextReset(tempCtx);
@@ -1055,7 +1124,7 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
/*
* If this is the first key, remember this location as a
- * potential match.
+ * potential match, and proceed to check the rest of the keys.
*
* Otherwise, check if this is the same item that we checked the
* previous keys for (or a lossy pointer for the same page). If
@@ -1066,21 +1135,20 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
if (i == 0)
{
*item = key->curItem;
+ continue;
+ }
+
+ if (ItemPointerIsLossyPage(&key->curItem) ||
+ ItemPointerIsLossyPage(item))
+ {
+ Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
+ match = (GinItemPointerGetBlockNumber(&key->curItem) ==
+ GinItemPointerGetBlockNumber(item));
}
else
{
- if (ItemPointerIsLossyPage(&key->curItem) ||
- ItemPointerIsLossyPage(item))
- {
- Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
- match = (GinItemPointerGetBlockNumber(&key->curItem) ==
- GinItemPointerGetBlockNumber(item));
- }
- else
- {
- Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
- match = (ginCompareItemPointers(&key->curItem, item) == 0);
- }
+ Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
+ match = (ginCompareItemPointers(&key->curItem, item) == 0);
}
}
} while (!match);
@@ -1297,7 +1365,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
{
GinScanKey key = so->keys + i;
- memset(key->entryRes, FALSE, key->nentries);
+ memset(key->entryRes, GIN_FALSE, key->nentries);
}
memset(pos->hasMatchKey, FALSE, so->nkeys);
@@ -1554,7 +1622,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
{
GinScanKey key = so->keys + i;
- if (!callConsistentFn(&so->ginstate, key))
+ if (!key->boolConsistentFn(key))
{
match = false;
break;
diff --git a/src/backend/access/gin/ginlogic.c b/src/backend/access/gin/ginlogic.c
new file mode 100644
index 0000000..e499c6e
--- /dev/null
+++ b/src/backend/access/gin/ginlogic.c
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * ginlogic.c
+ * routines for performing binary- and ternary-logic consistent checks.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/access/gin/ginlogic.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "access/reloptions.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/indexfsm.h"
+#include "storage/lmgr.h"
+
+/*
+ * A dummy consistent function for an EVERYTHING key. Just claim it matches.
+ */
+static bool
+trueConsistentFn(GinScanKey key)
+{
+ key->recheckCurItem = false;
+ return true;
+}
+static GinLogicValue
+trueTriConsistentFn(GinScanKey key)
+{
+ return GIN_MAYBE;
+}
+
+/*
+ * A function for calling a regular, binary logic, consistent function.
+ */
+static bool
+normalBoolConsistentFn(GinScanKey key)
+{
+ /*
+ * Initialize recheckCurItem in case the consistentFn doesn't know it
+ * should set it. The safe assumption in that case is to force recheck.
+ */
+ key->recheckCurItem = true;
+
+ return DatumGetBool(FunctionCall8Coll(key->consistentFmgrInfo,
+ key->collation,
+ PointerGetDatum(key->entryRes),
+ UInt16GetDatum(key->strategy),
+ key->query,
+ UInt32GetDatum(key->nuserentries),
+ PointerGetDatum(key->extra_data),
+ PointerGetDatum(&key->recheckCurItem),
+ PointerGetDatum(key->queryValues),
+ PointerGetDatum(key->queryCategories)));
+}
+
+/*
+ * This function implements a tri-state consistency check, using a boolean
+ * consistent function provided by the opclass.
+ *
+ * If there is only one MAYBE input, our strategy is to try the consistentFn
+ * both ways. If it returns TRUE for both, the tuple matches regardless of
+ * the MAYBE input, so we return TRUE. Likewise, if it returns FALSE for both,
+ * we return FALSE. Otherwise return MAYBE.
+ */
+static GinLogicValue
+shimTriConsistentFn(GinScanKey key)
+{
+ bool foundMaybe = false;
+ int maybeEntry = -1;
+ int i;
+ bool boolResult1;
+ bool boolResult2;
+ bool recheck1;
+ bool recheck2;
+
+ for (i = 0; i < key->nentries; i++)
+ {
+ if (key->entryRes[i] == GIN_MAYBE)
+ {
+ if (foundMaybe)
+ return GIN_MAYBE; /* more than one MAYBE input */
+ maybeEntry = i;
+ foundMaybe = true;
+ }
+ }
+
+ /*
+ * If none of the inputs were MAYBE, so we can just call consistent
+ * function as is.
+ */
+ if (!foundMaybe)
+ return normalBoolConsistentFn(key);
+
+ /* Try the consistent function with the maybe-input set both ways */
+ key->entryRes[maybeEntry] = GIN_TRUE;
+ boolResult1 = normalBoolConsistentFn(key);
+ recheck1 = key->recheckCurItem;
+
+ key->entryRes[maybeEntry] = GIN_FALSE;
+ boolResult2 = normalBoolConsistentFn(key);
+ recheck2 = key->recheckCurItem;
+
+ if (!boolResult1 && !boolResult2)
+ return GIN_FALSE;
+
+ key->recheckCurItem = recheck1 || recheck2;
+ if (boolResult1 && boolResult2)
+ return GIN_TRUE;
+ else
+ return GIN_MAYBE;
+}
+
+void
+GinInitConsistentMethod(GinState *ginstate, GinScanKey key)
+{
+ if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
+ {
+ key->boolConsistentFn = trueConsistentFn;
+ key->triConsistentFn = trueTriConsistentFn;
+ }
+ else
+ {
+ key->consistentFmgrInfo = &ginstate->consistentFn[key->attnum - 1];
+ key->collation = ginstate->supportCollation[key->attnum - 1];
+ key->boolConsistentFn = normalBoolConsistentFn;
+ key->triConsistentFn = shimTriConsistentFn;
+ }
+}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index a12dfc3..6d6a49a 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -17,6 +17,8 @@
#include "storage/bufmgr.h"
#include "utils/rbtree.h"
+typedef struct GinScanKeyData *GinScanKey;
+typedef struct GinScanEntryData *GinScanEntry;
/*
* Page opaque data in an inverted index page.
@@ -588,6 +590,19 @@ extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
GinNullCategory *category);
+/* ginlogic.c */
+
+enum
+{
+ GIN_FALSE = 0,
+ GIN_TRUE = 1,
+ GIN_MAYBE = 2
+} GinLogicValueEnum;
+
+typedef char GinLogicValue;
+
+extern void GinInitConsistentMethod(GinState *ginstate, GinScanKey key);
+
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
extern Datum ginbuildempty(PG_FUNCTION_ARGS);
@@ -733,10 +748,6 @@ extern void ginVacuumPostingTreeLeaf(Relation rel, Buffer buf, GinVacuumState *g
* nuserentries is the number that extractQueryFn returned (which is what
* we report to consistentFn). The "user" entries must come first.
*/
-typedef struct GinScanKeyData *GinScanKey;
-
-typedef struct GinScanEntryData *GinScanEntry;
-
typedef struct GinScanKeyData
{
/* Real number of entries in scanEntry[] (always > 0) */
@@ -749,6 +760,10 @@ typedef struct GinScanKeyData
/* array of check flags, reported to consistentFn */
bool *entryRes;
+ bool (*boolConsistentFn) (GinScanKey key);
+ bool (*triConsistentFn) (GinScanKey key);
+ FmgrInfo *consistentFmgrInfo;
+ Oid collation;
/* other data needed for calling consistentFn */
Datum query;
--
1.8.5.2
Hi!
On 25.1.2014 22:21, Heikki Linnakangas wrote:
Attached is a new version of the patch set, with those bugs fixed.
I've done a bunch of tests with all the 4 patches applied, and it seems
to work now. I've done tests with various conditions (AND/OR, number of
words, number of conditions) and I so far I did not get any crashes,
infinite loops or anything like that.
I've also compared the results to 9.3 - by dumping the database and
running the same set of queries on both machines, and indeed I got 100%
match.
I also did some performance tests, and that's when I started to worry.
For example, I generated and ran 1000 queries that look like this:
SELECT id FROM messages
WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 &
useful & dropped)')
ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53 &
32 & useful & dropped)')) DESC;
i.e. in this case the query always was 5 words connected by AND. This
query is a pretty common pattern for fulltext search - sort by a list of
words and give me the best ranked results.
On 9.3, the script was running for ~23 seconds, on patched HEAD it was
~40. It's perfectly reproducible, I've repeated the test several times
with exactly the same results. The test is CPU bound, there's no I/O
activity at all. I got the same results with more queries (~100k).
Attached is a simple chart with x-axis used for durations measured on
9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
a vast majority of queries is up to 2x slower - that's pretty obvious
from the chart.
Only about 50 queries are faster on HEAD, and >700 queries are more than
50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes >150ms
on HEAD).
Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):
http://explain.depesz.com/s/5tv
and on HEAD (same query):
http://explain.depesz.com/s/1lI
Clearly the main difference is in the "Bitmap Index Scan" which takes
60ms on 9.3 and 120ms on HEAD.
On 9.3 the "perf top" looks like this:
34.79% postgres [.] gingetbitmap
28.96% postgres [.] ginCompareItemPointers
9.36% postgres [.] TS_execute
5.36% postgres [.] check_stack_depth
3.57% postgres [.] FunctionCall8Coll
while on 9.4 it looks like this:
28.20% postgres [.] gingetbitmap
21.17% postgres [.] TS_execute
8.08% postgres [.] check_stack_depth
7.11% postgres [.] FunctionCall8Coll
4.34% postgres [.] shimTriConsistentFn
Not sure how to interpret that, though. For example where did the
ginCompareItemPointers go? I suspect it's thanks to inlining, and that
it might be related to the performance decrease. Or maybe not.
I've repeated the test several times, checked all I could think of, but
I've found nothing so far. The flags were exactly the same in both cases
(just --enable-debug and nothing else).
regards
Tomas
Attachments:
On 2014-01-26 07:24:58 +0100, Tomas Vondra wrote:
Not sure how to interpret that, though. For example where did the
ginCompareItemPointers go? I suspect it's thanks to inlining, and that
it might be related to the performance decrease. Or maybe not.
Try recompiling with CFLAGS="-fno-omit-frame-pointers -O2" and then use
perf record -g. That gives you a hierarchical profile which often makes
such questions easier to answer.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/26/2014 08:24 AM, Tomas Vondra wrote:
Hi!
On 25.1.2014 22:21, Heikki Linnakangas wrote:
Attached is a new version of the patch set, with those bugs fixed.
I've done a bunch of tests with all the 4 patches applied, and it seems
to work now. I've done tests with various conditions (AND/OR, number of
words, number of conditions) and I so far I did not get any crashes,
infinite loops or anything like that.I've also compared the results to 9.3 - by dumping the database and
running the same set of queries on both machines, and indeed I got 100%
match.I also did some performance tests, and that's when I started to worry.
For example, I generated and ran 1000 queries that look like this:
SELECT id FROM messages
WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 &
useful & dropped)')
ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53 &
32 & useful & dropped)')) DESC;i.e. in this case the query always was 5 words connected by AND. This
query is a pretty common pattern for fulltext search - sort by a list of
words and give me the best ranked results.On 9.3, the script was running for ~23 seconds, on patched HEAD it was
~40. It's perfectly reproducible, I've repeated the test several times
with exactly the same results. The test is CPU bound, there's no I/O
activity at all. I got the same results with more queries (~100k).Attached is a simple chart with x-axis used for durations measured on
9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
a vast majority of queries is up to 2x slower - that's pretty obvious
from the chart.Only about 50 queries are faster on HEAD, and >700 queries are more than
50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes >150ms
on HEAD).Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):
http://explain.depesz.com/s/5tv
and on HEAD (same query):
http://explain.depesz.com/s/1lI
Clearly the main difference is in the "Bitmap Index Scan" which takes
60ms on 9.3 and 120ms on HEAD.On 9.3 the "perf top" looks like this:
34.79% postgres [.] gingetbitmap
28.96% postgres [.] ginCompareItemPointers
9.36% postgres [.] TS_execute
5.36% postgres [.] check_stack_depth
3.57% postgres [.] FunctionCall8Collwhile on 9.4 it looks like this:
28.20% postgres [.] gingetbitmap
21.17% postgres [.] TS_execute
8.08% postgres [.] check_stack_depth
7.11% postgres [.] FunctionCall8Coll
4.34% postgres [.] shimTriConsistentFnNot sure how to interpret that, though. For example where did the
ginCompareItemPointers go? I suspect it's thanks to inlining, and that
it might be related to the performance decrease. Or maybe not.
Yeah, inlining makes it disappear from the profile, and spreads that
time to the functions calling it.
The profile tells us that the consistent function is called a lot more
than before. That is expected - with the fast scan feature, we're
calling consistent not only for potential matches, but also to refute
TIDs based on just a few entries matching. If that's effective, it
allows us to skip many TIDs and avoid consistent calls, which
compensates, but if it's not effective, it's just overhead.
I would actually expect it to be fairly effective for that query, so
that's a bit surprising. I added counters to see where the calls are
coming from, and it seems that about 80% of the calls are actually
coming from this little the feature I explained earlier:
In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.
So, that clearly isn't worth the cycles :-). At least not with an
expensive consistent function; it might be worthwhile if we pre-build
the truth-table, or cache the results of the consistent function.
Attached is a quick patch to remove that, on top of all the other
patches, if you want to test the effect.
- Heikki
Attachments:
load-all-entries-before-consistent-check-1.patchtext/x-diff; name=load-all-entries-before-consistent-check-1.patchDownload
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index f2f9dc6..76a70a0 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -895,6 +895,25 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
GinItemPointerGetBlockNumber(&minItem));
/*
+ * We might not have loaded all the entry streams for this TID. We
+ * could call the consistent function, passing MAYBE for those entries,
+ * to see if it can decide if this TID matches based on the information
+ * we have. But if the consistent-function is expensive, and cannot
+ * in fact decide with partial information, that could be a big loss.
+ * So, loop back to load the missing entries, before calling the
+ * consistent function.
+ */
+ if (minUnknown != -1)
+ {
+ for (i = minUnknown; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ entryGetItem(ginstate, entry, advancePast);
+ }
+ }
+
+ /*
* Lossy-page entries pose a problem, since we don't know the correct
* entryRes state to pass to the consistentFn, and we also don't know
* what its combining logic will be (could be AND, OR, or even NOT).
@@ -996,18 +1015,6 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
break;
case GIN_MAYBE:
- /*
- * The consistent function cannot decide with the information
- * we've got. If there are any "unknown" sources left, advance
- * one of them and try again, in the hope that it can decide
- * with the extra information.
- */
- if (minUnknown != -1)
- {
- entry = key->scanEntry[minUnknown];
- entryGetItem(ginstate, entry, advancePast);
- continue;
- }
key->curItemMatches = true;
key->recheckCurItem = true;
break;
On 26.1.2014 17:14, Heikki Linnakangas wrote:
I would actually expect it to be fairly effective for that query, so
that's a bit surprising. I added counters to see where the calls are
coming from, and it seems that about 80% of the calls are actually
coming from this little the feature I explained earlier:In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.So, that clearly isn't worth the cycles :-). At least not with an
expensive consistent function; it might be worthwhile if we pre-build
the truth-table, or cache the results of the consistent function.Attached is a quick patch to remove that, on top of all the other
patches, if you want to test the effect.
Indeed, the patch significantly improved the performance. The total
runtime is almost exactly the same as on 9.3 (~22 seconds for 1000
queries). The timing chart (patched vs. 9.3) is attached.
A table with number of queries with duration ratio below some threshold
looks like this:
threshold | count | percentage
-------------------------------------
0.5 | 3 | 0.3%
0.75 | 45 | 4.5%
0.9 | 224 | 22.4%
1.0 | 667 | 66.7%
1.05 | 950 | 95.0%
1.1 | 992 | 99.2%
A ratio is just a measure of how much time it took compared to 9.3
ratio = (duration on patched HEAD) / (duration on 9.3)
The table is cumulative, e.g. values in the 0.9 row mean that for 224
queries the duration with the patches was below 90% of the duration on 9.3.
IMHO the table suggests with the last patch we're fine - majority of
queries (~66%) is faster than on 9.3, and the tail is very short. There
are just 2 queries that took more than 15% longer, compared to 9.3. And
we're talking about 20ms vs. 30ms, so chances are this is just a random
noise.
So IMHO we can go ahead, and maybe tune this a bit more in the future.
regards
Tomas
Attachments:
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
On 01/26/2014 08:24 AM, Tomas Vondra wrote:
Hi!
On 25.1.2014 22:21, Heikki Linnakangas wrote:
Attached is a new version of the patch set, with those bugs fixed.
I've done a bunch of tests with all the 4 patches applied, and it seems
to work now. I've done tests with various conditions (AND/OR, number of
words, number of conditions) and I so far I did not get any crashes,
infinite loops or anything like that.I've also compared the results to 9.3 - by dumping the database and
running the same set of queries on both machines, and indeed I got 100%
match.I also did some performance tests, and that's when I started to worry.
For example, I generated and ran 1000 queries that look like this:
SELECT id FROM messages
WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 &
useful & dropped)')
ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53 &
32 & useful & dropped)')) DESC;i.e. in this case the query always was 5 words connected by AND. This
query is a pretty common pattern for fulltext search - sort by a list of
words and give me the best ranked results.On 9.3, the script was running for ~23 seconds, on patched HEAD it was
~40. It's perfectly reproducible, I've repeated the test several times
with exactly the same results. The test is CPU bound, there's no I/O
activity at all. I got the same results with more queries (~100k).Attached is a simple chart with x-axis used for durations measured on
9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
a vast majority of queries is up to 2x slower - that's pretty obvious
from the chart.Only about 50 queries are faster on HEAD, and >700 queries are more than
50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes >150ms
on HEAD).Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):
http://explain.depesz.com/s/5tv
and on HEAD (same query):
http://explain.depesz.com/s/1lI
Clearly the main difference is in the "Bitmap Index Scan" which takes
60ms on 9.3 and 120ms on HEAD.On 9.3 the "perf top" looks like this:
34.79% postgres [.] gingetbitmap
28.96% postgres [.] ginCompareItemPointers
9.36% postgres [.] TS_execute
5.36% postgres [.] check_stack_depth
3.57% postgres [.] FunctionCall8Collwhile on 9.4 it looks like this:
28.20% postgres [.] gingetbitmap
21.17% postgres [.] TS_execute
8.08% postgres [.] check_stack_depth
7.11% postgres [.] FunctionCall8Coll
4.34% postgres [.] shimTriConsistentFnNot sure how to interpret that, though. For example where did the
ginCompareItemPointers go? I suspect it's thanks to inlining, and that
it might be related to the performance decrease. Or maybe not.Yeah, inlining makes it disappear from the profile, and spreads that time
to the functions calling it.The profile tells us that the consistent function is called a lot more
than before. That is expected - with the fast scan feature, we're calling
consistent not only for potential matches, but also to refute TIDs based on
just a few entries matching. If that's effective, it allows us to skip many
TIDs and avoid consistent calls, which compensates, but if it's not
effective, it's just overhead.I would actually expect it to be fairly effective for that query, so
that's a bit surprising. I added counters to see where the calls are coming
from, and it seems that about 80% of the calls are actually coming from
this little the feature I explained earlier:In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.So, that clearly isn't worth the cycles :-). At least not with an
expensive consistent function; it might be worthwhile if we pre-build the
truth-table, or cache the results of the consistent function.Attached is a quick patch to remove that, on top of all the other patches,
if you want to test the effect.
Every single change you did in fast scan seems to be reasonable, but
testing shows that something went wrong. Simple test with 3 words of
different selectivities.
After applying your patches:
# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)
Time: 21,252 ms
In original fast-scan:
# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)
Time: 3,382 ms
I'm trying to get deeper into it.
------
With best regards,
Alexander Korotkov.
On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov <aekorotkov@gmail.com>wrote:
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:On 01/26/2014 08:24 AM, Tomas Vondra wrote:
Hi!
On 25.1.2014 22:21, Heikki Linnakangas wrote:
Attached is a new version of the patch set, with those bugs fixed.
I've done a bunch of tests with all the 4 patches applied, and it seems
to work now. I've done tests with various conditions (AND/OR, number of
words, number of conditions) and I so far I did not get any crashes,
infinite loops or anything like that.I've also compared the results to 9.3 - by dumping the database and
running the same set of queries on both machines, and indeed I got 100%
match.I also did some performance tests, and that's when I started to worry.
For example, I generated and ran 1000 queries that look like this:
SELECT id FROM messages
WHERE body_tsvector @@ to_tsquery('english','(header & 53 & 32 &
useful & dropped)')
ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header & 53 &
32 & useful & dropped)')) DESC;i.e. in this case the query always was 5 words connected by AND. This
query is a pretty common pattern for fulltext search - sort by a list of
words and give me the best ranked results.On 9.3, the script was running for ~23 seconds, on patched HEAD it was
~40. It's perfectly reproducible, I've repeated the test several times
with exactly the same results. The test is CPU bound, there's no I/O
activity at all. I got the same results with more queries (~100k).Attached is a simple chart with x-axis used for durations measured on
9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
a vast majority of queries is up to 2x slower - that's pretty obvious
from the chart.Only about 50 queries are faster on HEAD, and >700 queries are more than
50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes >150ms
on HEAD).Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):
http://explain.depesz.com/s/5tv
and on HEAD (same query):
http://explain.depesz.com/s/1lI
Clearly the main difference is in the "Bitmap Index Scan" which takes
60ms on 9.3 and 120ms on HEAD.On 9.3 the "perf top" looks like this:
34.79% postgres [.] gingetbitmap
28.96% postgres [.] ginCompareItemPointers
9.36% postgres [.] TS_execute
5.36% postgres [.] check_stack_depth
3.57% postgres [.] FunctionCall8Collwhile on 9.4 it looks like this:
28.20% postgres [.] gingetbitmap
21.17% postgres [.] TS_execute
8.08% postgres [.] check_stack_depth
7.11% postgres [.] FunctionCall8Coll
4.34% postgres [.] shimTriConsistentFnNot sure how to interpret that, though. For example where did the
ginCompareItemPointers go? I suspect it's thanks to inlining, and that
it might be related to the performance decrease. Or maybe not.Yeah, inlining makes it disappear from the profile, and spreads that time
to the functions calling it.The profile tells us that the consistent function is called a lot more
than before. That is expected - with the fast scan feature, we're calling
consistent not only for potential matches, but also to refute TIDs based on
just a few entries matching. If that's effective, it allows us to skip many
TIDs and avoid consistent calls, which compensates, but if it's not
effective, it's just overhead.I would actually expect it to be fairly effective for that query, so
that's a bit surprising. I added counters to see where the calls are coming
from, and it seems that about 80% of the calls are actually coming from
this little the feature I explained earlier:In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.So, that clearly isn't worth the cycles :-). At least not with an
expensive consistent function; it might be worthwhile if we pre-build the
truth-table, or cache the results of the consistent function.Attached is a quick patch to remove that, on top of all the other
patches, if you want to test the effect.Every single change you did in fast scan seems to be reasonable, but
testing shows that something went wrong. Simple test with 3 words of
different selectivities.After applying your patches:
# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)Time: 21,252 ms
In original fast-scan:
# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)Time: 3,382 ms
I'm trying to get deeper into it.
I had two guesses about why it's become so slower than in my original
fast-scan:
1) Not using native consistent function
2) Not sorting entries
I attach two patches which rollback these two features (sorry for awful
quality of second). Native consistent function accelerates thing
significantly, as expected. Tt seems that sorting entries have almost no
effect. However it's still not as fast as initial fast-scan:
# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)
Time: 5,381 ms
Tomas, could you rerun your tests with first and both these patches applied
against patches by Heikki?
------
With best regards,
Alexander Korotkov.
Attachments:
0005-Ternary-consistent-implementation.patchapplication/octet-stream; name=0005-Ternary-consistent-implementation.patchDownload
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
new file mode 100644
index 9ffa8be..c8a6b50
*** a/doc/src/sgml/gin.sgml
--- b/doc/src/sgml/gin.sgml
***************
*** 216,221 ****
--- 216,230 ----
arrays previously returned by <function>extractQuery</>.
<literal>extra_data</> is the extra-data array returned by
<function>extractQuery</>, or <symbol>NULL</symbol> if none.
+ <function>consistent</> can be declared as either 1st or 6th support
+ function of opclass. If it's declared as 6th then it must support
+ tri-state logic can be used for fast scan technique which accelerating
+ gin index scan by skipping parts of large posting-trees. Tri-state
+ version of <function>consistent</> accepts <literal>UNKNOWN</> values
+ in <literal>check</> array. These values means that indexed item can
+ either contain or not contain corresponding query key. Consistent
+ might return <literal>UNKNOWN</> values as well when given information
+ is lacking for exact answer.
</para>
<para>
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
new file mode 100644
index 3f0c3e2..6750769
*** a/src/backend/access/gin/README
--- b/src/backend/access/gin/README
*************** page-deletions safe; it stamps the delet
*** 335,340 ****
--- 335,346 ----
deleted pages around with the right-link intact until all concurrent scans
have finished.)
+ Fast scan
+ ---------
+
+ Fast scan is technique which allows to skip parts of large posting-trees during
+ gin index scan.
+
Compatibility
-------------
*************** posting list fits in the space occupied
*** 365,370 ****
--- 371,400 ----
assume that the compressed version of the page, with the dead items removed,
takes less space than the old uncompressed version.
+ Fast scan
+ ---------
+
+ Fast scan is technique allowing to skip parts of large posting trees during
+ gin index scans. Fast scan is based on tri-state consistent function. Tri-state
+ consistent function must support following values:
+ 1) TRUE
+ 2) FALSE
+ 3) UNKNOWN
+ GIN passes UNKNOWN into check array for those keys which parts of posting trees
+ it tries to skip. If consistent function returns false then it can actually
+ skip part of posting tree.
+
+ In more details fast scans works following:
+ 1) Keep entries sorted by their current TIDs descending.
+ 2) Tries to skip some of entries at the end of sorted order. It passes
+ TRUE for the first part of entries and UNKNOWN for others. It moves the
+ border splitting TRUE and UNKNOWN until finds where consistent begins to
+ return FALSE.
+ 3) If tri-state consistent returns FALSE, it skips part of shortest entry which
+ was UNKNOWN (other entries could be skipped later for greater value).
+ 4) If tri-state consistent doesn't return FALSE then it calls exact consistent
+ and moves entries TIDs like regular gin index scan.
+
Limitations
-----------
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
new file mode 100644
index 486f2ef..d561d60
*** a/src/backend/access/gin/ginutil.c
--- b/src/backend/access/gin/ginutil.c
*************** initGinState(GinState *state, Relation i
*** 67,75 ****
fmgr_info_copy(&(state->extractQueryFn[i]),
index_getprocinfo(index, i + 1, GIN_EXTRACTQUERY_PROC),
CurrentMemoryContext);
! fmgr_info_copy(&(state->consistentFn[i]),
! index_getprocinfo(index, i + 1, GIN_CONSISTENT_PROC),
! CurrentMemoryContext);
/*
* Check opclass capability to do partial match.
--- 67,89 ----
fmgr_info_copy(&(state->extractQueryFn[i]),
index_getprocinfo(index, i + 1, GIN_EXTRACTQUERY_PROC),
CurrentMemoryContext);
! /*
! * Check opclass capability to do tri-state logic consistent check.
! */
! if (index_getprocid(index, i + 1, GIN_CONSISTENT_TRISTATE_PROC) != InvalidOid)
! {
! fmgr_info_copy(&(state->consistentFn[i]),
! index_getprocinfo(index, i + 1, GIN_CONSISTENT_TRISTATE_PROC),
! CurrentMemoryContext);
! state->consistentSupportMaybe[i] = true;
! }
! else
! {
! fmgr_info_copy(&(state->consistentFn[i]),
! index_getprocinfo(index, i + 1, GIN_CONSISTENT_PROC),
! CurrentMemoryContext);
! state->consistentSupportMaybe[i] = false;
! }
/*
* Check opclass capability to do partial match.
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
new file mode 100644
index 9f6e8e9..e50278b
*** a/src/backend/utils/adt/tsginidx.c
--- b/src/backend/utils/adt/tsginidx.c
***************
*** 15,20 ****
--- 15,21 ----
#include "access/gin.h"
#include "access/skey.h"
+ #include "miscadmin.h"
#include "tsearch/ts_type.h"
#include "tsearch/ts_utils.h"
#include "utils/builtins.h"
*************** gin_extract_tsquery(PG_FUNCTION_ARGS)
*** 172,183 ****
typedef struct
{
QueryItem *first_item;
! bool *check;
int *map_item_operand;
bool *need_recheck;
} GinChkVal;
! static bool
checkcondition_gin(void *checkval, QueryOperand *val)
{
GinChkVal *gcv = (GinChkVal *) checkval;
--- 173,184 ----
typedef struct
{
QueryItem *first_item;
! GinLogicValue *check;
int *map_item_operand;
bool *need_recheck;
} GinChkVal;
! static GinLogicValue
checkcondition_gin(void *checkval, QueryOperand *val)
{
GinChkVal *gcv = (GinChkVal *) checkval;
*************** checkcondition_gin(void *checkval, Query
*** 194,203 ****
return gcv->check[j];
}
Datum
gin_tsquery_consistent(PG_FUNCTION_ARGS)
{
! bool *check = (bool *) PG_GETARG_POINTER(0);
/* StrategyNumber strategy = PG_GETARG_UINT16(1); */
TSQuery query = PG_GETARG_TSQUERY(2);
--- 195,254 ----
return gcv->check[j];
}
+ /*
+ * Evaluate tsquery boolean expression.
+ *
+ * chkcond is a callback function used to evaluate each VAL node in the query.
+ * checkval can be used to pass information to the callback. TS_execute doesn't
+ * do anything with it.
+ * if calcnot is false, NOT expressions are always evaluated to be true. This
+ * is used in ranking.
+ */
+ static GinLogicValue
+ TS_execute_tri_state(QueryItem *curitem, void *checkval, bool calcnot,
+ GinLogicValue (*chkcond) (void *checkval, QueryOperand *val))
+ {
+ GinLogicValue result;
+ /* since this function recurses, it could be driven to stack overflow */
+ check_stack_depth();
+
+ if (curitem->type == QI_VAL)
+ return chkcond(checkval, (QueryOperand *) curitem);
+
+ switch (curitem->qoperator.oper)
+ {
+ case OP_NOT:
+ result = TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ if (result == GIN_MAYBE)
+ return result;
+ return !result;
+
+ case OP_AND:
+ result = TS_execute(curitem + curitem->qoperator.left, checkval, calcnot, chkcond);
+ if (result == GIN_TRUE)
+ return TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ else
+ return result;
+
+ case OP_OR:
+ result = TS_execute(curitem + curitem->qoperator.left, checkval, calcnot, chkcond);
+ if (result == GIN_FALSE)
+ return TS_execute(curitem + 1, checkval, calcnot, chkcond);
+ else
+ return result;
+
+ default:
+ elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
+ }
+
+ /* not reachable, but keep compiler quiet */
+ return false;
+ }
+
Datum
gin_tsquery_consistent(PG_FUNCTION_ARGS)
{
! GinLogicValue *check = (bool *) PG_GETARG_POINTER(0);
/* StrategyNumber strategy = PG_GETARG_UINT16(1); */
TSQuery query = PG_GETARG_TSQUERY(2);
*************** gin_tsquery_consistent(PG_FUNCTION_ARGS)
*** 205,211 ****
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
! bool res = FALSE;
/* The query requires recheck only if it involves weights */
*recheck = false;
--- 256,262 ----
/* int32 nkeys = PG_GETARG_INT32(3); */
Pointer *extra_data = (Pointer *) PG_GETARG_POINTER(4);
bool *recheck = (bool *) PG_GETARG_POINTER(5);
! GinLogicValue res = FALSE;
/* The query requires recheck only if it involves weights */
*recheck = false;
*************** gin_tsquery_consistent(PG_FUNCTION_ARGS)
*** 224,233 ****
gcv.map_item_operand = (int *) (extra_data[0]);
gcv.need_recheck = recheck;
! res = TS_execute(GETQUERY(query),
! &gcv,
! true,
! checkcondition_gin);
}
PG_RETURN_BOOL(res);
--- 275,284 ----
gcv.map_item_operand = (int *) (extra_data[0]);
gcv.need_recheck = recheck;
! res = TS_execute_tri_state(GETQUERY(query),
! &gcv,
! true,
! checkcondition_gin);
}
PG_RETURN_BOOL(res);
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
new file mode 100644
index 03e58c9..9c77a0e
*** a/src/include/access/gin.h
--- b/src/include/access/gin.h
***************
*** 23,29 ****
#define GIN_EXTRACTQUERY_PROC 3
#define GIN_CONSISTENT_PROC 4
#define GIN_COMPARE_PARTIAL_PROC 5
! #define GINNProcs 5
/*
* searchMode settings for extractQueryFn.
--- 23,30 ----
#define GIN_EXTRACTQUERY_PROC 3
#define GIN_CONSISTENT_PROC 4
#define GIN_COMPARE_PARTIAL_PROC 5
! #define GIN_CONSISTENT_TRISTATE_PROC 6
! #define GINNProcs 6
/*
* searchMode settings for extractQueryFn.
*************** typedef struct GinStatsData
*** 46,51 ****
--- 47,61 ----
int32 ginVersion;
} GinStatsData;
+ enum
+ {
+ GIN_FALSE = 0,
+ GIN_TRUE = 1,
+ GIN_MAYBE = 2
+ } GinLogicValueEnum;
+
+ typedef char GinLogicValue;
+
/* GUC parameter */
extern PGDLLIMPORT int GinFuzzySearchLimit;
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index 6d6a49a..8f83137
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*************** typedef struct GinState
*** 355,360 ****
--- 355,362 ----
bool canPartialMatch[INDEX_MAX_KEYS];
/* Collations to pass to the support functions */
Oid supportCollation[INDEX_MAX_KEYS];
+ /* Consistent function supportsunknown values? */
+ bool consistentSupportMaybe[INDEX_MAX_KEYS];
} GinState;
*************** extern Datum gintuple_get_key(GinState *
*** 592,606 ****
/* ginlogic.c */
- enum
- {
- GIN_FALSE = 0,
- GIN_TRUE = 1,
- GIN_MAYBE = 2
- } GinLogicValueEnum;
-
- typedef char GinLogicValue;
-
extern void GinInitConsistentMethod(GinState *ginstate, GinScanKey key);
/* gininsert.c */
--- 594,599 ----
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
new file mode 100644
index 4f46ddd..759ea70
*** a/src/include/catalog/pg_am.h
--- b/src/include/catalog/pg_am.h
*************** DESCR("hash index access method");
*** 126,132 ****
DATA(insert OID = 783 ( gist 0 8 f t f f t t f t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup - gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
! DATA(insert OID = 2742 ( gin 0 5 f f f f t t f f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup - gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
DATA(insert OID = 4000 ( spgist 0 5 f f f f f t f t f f f 0 spginsert spgbeginscan spggettuple spggetbitmap spgrescan spgendscan spgmarkpos spgrestrpos spgbuild spgbuildempty spgbulkdelete spgvacuumcleanup spgcanreturn spgcostestimate spgoptions ));
--- 126,132 ----
DATA(insert OID = 783 ( gist 0 8 f t f f t t f t t t f 0 gistinsert gistbeginscan gistgettuple gistgetbitmap gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbuildempty gistbulkdelete gistvacuumcleanup - gistcostestimate gistoptions ));
DESCR("GiST index access method");
#define GIST_AM_OID 783
! DATA(insert OID = 2742 ( gin 0 6 f f f f t t f f t f f 0 gininsert ginbeginscan - gingetbitmap ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbuildempty ginbulkdelete ginvacuumcleanup - gincostestimate ginoptions ));
DESCR("GIN index access method");
#define GIN_AM_OID 2742
DATA(insert OID = 4000 ( spgist 0 5 f f f f f t f t f f f 0 spginsert spgbeginscan spggettuple spggetbitmap spgrescan spgendscan spgmarkpos spgrestrpos spgbuild spgbuildempty spgbulkdelete spgvacuumcleanup spgcanreturn spgcostestimate spgoptions ));
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
new file mode 100644
index c090be4..ce09c64
*** a/src/include/catalog/pg_amproc.h
--- b/src/include/catalog/pg_amproc.h
*************** DATA(insert ( 1029 600 600 8 3064 ));
*** 223,349 ****
DATA(insert ( 2745 1007 1007 1 351 ));
DATA(insert ( 2745 1007 1007 2 2743 ));
DATA(insert ( 2745 1007 1007 3 2774 ));
! DATA(insert ( 2745 1007 1007 4 2744 ));
DATA(insert ( 2745 1009 1009 1 360 ));
DATA(insert ( 2745 1009 1009 2 2743 ));
DATA(insert ( 2745 1009 1009 3 2774 ));
! DATA(insert ( 2745 1009 1009 4 2744 ));
DATA(insert ( 2745 1015 1015 1 360 ));
DATA(insert ( 2745 1015 1015 2 2743 ));
DATA(insert ( 2745 1015 1015 3 2774 ));
! DATA(insert ( 2745 1015 1015 4 2744 ));
DATA(insert ( 2745 1023 1023 1 357 ));
DATA(insert ( 2745 1023 1023 2 2743 ));
DATA(insert ( 2745 1023 1023 3 2774 ));
! DATA(insert ( 2745 1023 1023 4 2744 ));
DATA(insert ( 2745 1561 1561 1 1596 ));
DATA(insert ( 2745 1561 1561 2 2743 ));
DATA(insert ( 2745 1561 1561 3 2774 ));
! DATA(insert ( 2745 1561 1561 4 2744 ));
DATA(insert ( 2745 1000 1000 1 1693 ));
DATA(insert ( 2745 1000 1000 2 2743 ));
DATA(insert ( 2745 1000 1000 3 2774 ));
! DATA(insert ( 2745 1000 1000 4 2744 ));
DATA(insert ( 2745 1014 1014 1 1078 ));
DATA(insert ( 2745 1014 1014 2 2743 ));
DATA(insert ( 2745 1014 1014 3 2774 ));
! DATA(insert ( 2745 1014 1014 4 2744 ));
DATA(insert ( 2745 1001 1001 1 1954 ));
DATA(insert ( 2745 1001 1001 2 2743 ));
DATA(insert ( 2745 1001 1001 3 2774 ));
! DATA(insert ( 2745 1001 1001 4 2744 ));
DATA(insert ( 2745 1002 1002 1 358 ));
DATA(insert ( 2745 1002 1002 2 2743 ));
DATA(insert ( 2745 1002 1002 3 2774 ));
! DATA(insert ( 2745 1002 1002 4 2744 ));
DATA(insert ( 2745 1182 1182 1 1092 ));
DATA(insert ( 2745 1182 1182 2 2743 ));
DATA(insert ( 2745 1182 1182 3 2774 ));
! DATA(insert ( 2745 1182 1182 4 2744 ));
DATA(insert ( 2745 1021 1021 1 354 ));
DATA(insert ( 2745 1021 1021 2 2743 ));
DATA(insert ( 2745 1021 1021 3 2774 ));
! DATA(insert ( 2745 1021 1021 4 2744 ));
DATA(insert ( 2745 1022 1022 1 355 ));
DATA(insert ( 2745 1022 1022 2 2743 ));
DATA(insert ( 2745 1022 1022 3 2774 ));
! DATA(insert ( 2745 1022 1022 4 2744 ));
DATA(insert ( 2745 1041 1041 1 926 ));
DATA(insert ( 2745 1041 1041 2 2743 ));
DATA(insert ( 2745 1041 1041 3 2774 ));
! DATA(insert ( 2745 1041 1041 4 2744 ));
DATA(insert ( 2745 651 651 1 926 ));
DATA(insert ( 2745 651 651 2 2743 ));
DATA(insert ( 2745 651 651 3 2774 ));
! DATA(insert ( 2745 651 651 4 2744 ));
DATA(insert ( 2745 1005 1005 1 350 ));
DATA(insert ( 2745 1005 1005 2 2743 ));
DATA(insert ( 2745 1005 1005 3 2774 ));
! DATA(insert ( 2745 1005 1005 4 2744 ));
DATA(insert ( 2745 1016 1016 1 842 ));
DATA(insert ( 2745 1016 1016 2 2743 ));
DATA(insert ( 2745 1016 1016 3 2774 ));
! DATA(insert ( 2745 1016 1016 4 2744 ));
DATA(insert ( 2745 1187 1187 1 1315 ));
DATA(insert ( 2745 1187 1187 2 2743 ));
DATA(insert ( 2745 1187 1187 3 2774 ));
! DATA(insert ( 2745 1187 1187 4 2744 ));
DATA(insert ( 2745 1040 1040 1 836 ));
DATA(insert ( 2745 1040 1040 2 2743 ));
DATA(insert ( 2745 1040 1040 3 2774 ));
! DATA(insert ( 2745 1040 1040 4 2744 ));
DATA(insert ( 2745 1003 1003 1 359 ));
DATA(insert ( 2745 1003 1003 2 2743 ));
DATA(insert ( 2745 1003 1003 3 2774 ));
! DATA(insert ( 2745 1003 1003 4 2744 ));
DATA(insert ( 2745 1231 1231 1 1769 ));
DATA(insert ( 2745 1231 1231 2 2743 ));
DATA(insert ( 2745 1231 1231 3 2774 ));
! DATA(insert ( 2745 1231 1231 4 2744 ));
DATA(insert ( 2745 1028 1028 1 356 ));
DATA(insert ( 2745 1028 1028 2 2743 ));
DATA(insert ( 2745 1028 1028 3 2774 ));
! DATA(insert ( 2745 1028 1028 4 2744 ));
DATA(insert ( 2745 1013 1013 1 404 ));
DATA(insert ( 2745 1013 1013 2 2743 ));
DATA(insert ( 2745 1013 1013 3 2774 ));
! DATA(insert ( 2745 1013 1013 4 2744 ));
DATA(insert ( 2745 1183 1183 1 1107 ));
DATA(insert ( 2745 1183 1183 2 2743 ));
DATA(insert ( 2745 1183 1183 3 2774 ));
! DATA(insert ( 2745 1183 1183 4 2744 ));
DATA(insert ( 2745 1185 1185 1 1314 ));
DATA(insert ( 2745 1185 1185 2 2743 ));
DATA(insert ( 2745 1185 1185 3 2774 ));
! DATA(insert ( 2745 1185 1185 4 2744 ));
DATA(insert ( 2745 1270 1270 1 1358 ));
DATA(insert ( 2745 1270 1270 2 2743 ));
DATA(insert ( 2745 1270 1270 3 2774 ));
! DATA(insert ( 2745 1270 1270 4 2744 ));
DATA(insert ( 2745 1563 1563 1 1672 ));
DATA(insert ( 2745 1563 1563 2 2743 ));
DATA(insert ( 2745 1563 1563 3 2774 ));
! DATA(insert ( 2745 1563 1563 4 2744 ));
DATA(insert ( 2745 1115 1115 1 2045 ));
DATA(insert ( 2745 1115 1115 2 2743 ));
DATA(insert ( 2745 1115 1115 3 2774 ));
! DATA(insert ( 2745 1115 1115 4 2744 ));
DATA(insert ( 2745 791 791 1 377 ));
DATA(insert ( 2745 791 791 2 2743 ));
DATA(insert ( 2745 791 791 3 2774 ));
! DATA(insert ( 2745 791 791 4 2744 ));
DATA(insert ( 2745 1024 1024 1 380 ));
DATA(insert ( 2745 1024 1024 2 2743 ));
DATA(insert ( 2745 1024 1024 3 2774 ));
! DATA(insert ( 2745 1024 1024 4 2744 ));
DATA(insert ( 2745 1025 1025 1 381 ));
DATA(insert ( 2745 1025 1025 2 2743 ));
DATA(insert ( 2745 1025 1025 3 2774 ));
! DATA(insert ( 2745 1025 1025 4 2744 ));
DATA(insert ( 3659 3614 3614 1 3724 ));
DATA(insert ( 3659 3614 3614 2 3656 ));
DATA(insert ( 3659 3614 3614 3 3657 ));
! DATA(insert ( 3659 3614 3614 4 3658 ));
DATA(insert ( 3659 3614 3614 5 2700 ));
DATA(insert ( 3626 3614 3614 1 3622 ));
DATA(insert ( 3683 3615 3615 1 3668 ));
--- 223,349 ----
DATA(insert ( 2745 1007 1007 1 351 ));
DATA(insert ( 2745 1007 1007 2 2743 ));
DATA(insert ( 2745 1007 1007 3 2774 ));
! DATA(insert ( 2745 1007 1007 6 2744 ));
DATA(insert ( 2745 1009 1009 1 360 ));
DATA(insert ( 2745 1009 1009 2 2743 ));
DATA(insert ( 2745 1009 1009 3 2774 ));
! DATA(insert ( 2745 1009 1009 6 2744 ));
DATA(insert ( 2745 1015 1015 1 360 ));
DATA(insert ( 2745 1015 1015 2 2743 ));
DATA(insert ( 2745 1015 1015 3 2774 ));
! DATA(insert ( 2745 1015 1015 6 2744 ));
DATA(insert ( 2745 1023 1023 1 357 ));
DATA(insert ( 2745 1023 1023 2 2743 ));
DATA(insert ( 2745 1023 1023 3 2774 ));
! DATA(insert ( 2745 1023 1023 6 2744 ));
DATA(insert ( 2745 1561 1561 1 1596 ));
DATA(insert ( 2745 1561 1561 2 2743 ));
DATA(insert ( 2745 1561 1561 3 2774 ));
! DATA(insert ( 2745 1561 1561 6 2744 ));
DATA(insert ( 2745 1000 1000 1 1693 ));
DATA(insert ( 2745 1000 1000 2 2743 ));
DATA(insert ( 2745 1000 1000 3 2774 ));
! DATA(insert ( 2745 1000 1000 6 2744 ));
DATA(insert ( 2745 1014 1014 1 1078 ));
DATA(insert ( 2745 1014 1014 2 2743 ));
DATA(insert ( 2745 1014 1014 3 2774 ));
! DATA(insert ( 2745 1014 1014 6 2744 ));
DATA(insert ( 2745 1001 1001 1 1954 ));
DATA(insert ( 2745 1001 1001 2 2743 ));
DATA(insert ( 2745 1001 1001 3 2774 ));
! DATA(insert ( 2745 1001 1001 6 2744 ));
DATA(insert ( 2745 1002 1002 1 358 ));
DATA(insert ( 2745 1002 1002 2 2743 ));
DATA(insert ( 2745 1002 1002 3 2774 ));
! DATA(insert ( 2745 1002 1002 6 2744 ));
DATA(insert ( 2745 1182 1182 1 1092 ));
DATA(insert ( 2745 1182 1182 2 2743 ));
DATA(insert ( 2745 1182 1182 3 2774 ));
! DATA(insert ( 2745 1182 1182 6 2744 ));
DATA(insert ( 2745 1021 1021 1 354 ));
DATA(insert ( 2745 1021 1021 2 2743 ));
DATA(insert ( 2745 1021 1021 3 2774 ));
! DATA(insert ( 2745 1021 1021 6 2744 ));
DATA(insert ( 2745 1022 1022 1 355 ));
DATA(insert ( 2745 1022 1022 2 2743 ));
DATA(insert ( 2745 1022 1022 3 2774 ));
! DATA(insert ( 2745 1022 1022 6 2744 ));
DATA(insert ( 2745 1041 1041 1 926 ));
DATA(insert ( 2745 1041 1041 2 2743 ));
DATA(insert ( 2745 1041 1041 3 2774 ));
! DATA(insert ( 2745 1041 1041 6 2744 ));
DATA(insert ( 2745 651 651 1 926 ));
DATA(insert ( 2745 651 651 2 2743 ));
DATA(insert ( 2745 651 651 3 2774 ));
! DATA(insert ( 2745 651 651 6 2744 ));
DATA(insert ( 2745 1005 1005 1 350 ));
DATA(insert ( 2745 1005 1005 2 2743 ));
DATA(insert ( 2745 1005 1005 3 2774 ));
! DATA(insert ( 2745 1005 1005 6 2744 ));
DATA(insert ( 2745 1016 1016 1 842 ));
DATA(insert ( 2745 1016 1016 2 2743 ));
DATA(insert ( 2745 1016 1016 3 2774 ));
! DATA(insert ( 2745 1016 1016 6 2744 ));
DATA(insert ( 2745 1187 1187 1 1315 ));
DATA(insert ( 2745 1187 1187 2 2743 ));
DATA(insert ( 2745 1187 1187 3 2774 ));
! DATA(insert ( 2745 1187 1187 6 2744 ));
DATA(insert ( 2745 1040 1040 1 836 ));
DATA(insert ( 2745 1040 1040 2 2743 ));
DATA(insert ( 2745 1040 1040 3 2774 ));
! DATA(insert ( 2745 1040 1040 6 2744 ));
DATA(insert ( 2745 1003 1003 1 359 ));
DATA(insert ( 2745 1003 1003 2 2743 ));
DATA(insert ( 2745 1003 1003 3 2774 ));
! DATA(insert ( 2745 1003 1003 6 2744 ));
DATA(insert ( 2745 1231 1231 1 1769 ));
DATA(insert ( 2745 1231 1231 2 2743 ));
DATA(insert ( 2745 1231 1231 3 2774 ));
! DATA(insert ( 2745 1231 1231 6 2744 ));
DATA(insert ( 2745 1028 1028 1 356 ));
DATA(insert ( 2745 1028 1028 2 2743 ));
DATA(insert ( 2745 1028 1028 3 2774 ));
! DATA(insert ( 2745 1028 1028 6 2744 ));
DATA(insert ( 2745 1013 1013 1 404 ));
DATA(insert ( 2745 1013 1013 2 2743 ));
DATA(insert ( 2745 1013 1013 3 2774 ));
! DATA(insert ( 2745 1013 1013 6 2744 ));
DATA(insert ( 2745 1183 1183 1 1107 ));
DATA(insert ( 2745 1183 1183 2 2743 ));
DATA(insert ( 2745 1183 1183 3 2774 ));
! DATA(insert ( 2745 1183 1183 6 2744 ));
DATA(insert ( 2745 1185 1185 1 1314 ));
DATA(insert ( 2745 1185 1185 2 2743 ));
DATA(insert ( 2745 1185 1185 3 2774 ));
! DATA(insert ( 2745 1185 1185 6 2744 ));
DATA(insert ( 2745 1270 1270 1 1358 ));
DATA(insert ( 2745 1270 1270 2 2743 ));
DATA(insert ( 2745 1270 1270 3 2774 ));
! DATA(insert ( 2745 1270 1270 6 2744 ));
DATA(insert ( 2745 1563 1563 1 1672 ));
DATA(insert ( 2745 1563 1563 2 2743 ));
DATA(insert ( 2745 1563 1563 3 2774 ));
! DATA(insert ( 2745 1563 1563 6 2744 ));
DATA(insert ( 2745 1115 1115 1 2045 ));
DATA(insert ( 2745 1115 1115 2 2743 ));
DATA(insert ( 2745 1115 1115 3 2774 ));
! DATA(insert ( 2745 1115 1115 6 2744 ));
DATA(insert ( 2745 791 791 1 377 ));
DATA(insert ( 2745 791 791 2 2743 ));
DATA(insert ( 2745 791 791 3 2774 ));
! DATA(insert ( 2745 791 791 6 2744 ));
DATA(insert ( 2745 1024 1024 1 380 ));
DATA(insert ( 2745 1024 1024 2 2743 ));
DATA(insert ( 2745 1024 1024 3 2774 ));
! DATA(insert ( 2745 1024 1024 6 2744 ));
DATA(insert ( 2745 1025 1025 1 381 ));
DATA(insert ( 2745 1025 1025 2 2743 ));
DATA(insert ( 2745 1025 1025 3 2774 ));
! DATA(insert ( 2745 1025 1025 6 2744 ));
DATA(insert ( 3659 3614 3614 1 3724 ));
DATA(insert ( 3659 3614 3614 2 3656 ));
DATA(insert ( 3659 3614 3614 3 3657 ));
! DATA(insert ( 3659 3614 3614 6 3658 ));
DATA(insert ( 3659 3614 3614 5 2700 ));
DATA(insert ( 3626 3614 3614 1 3622 ));
DATA(insert ( 3683 3615 3615 1 3668 ));
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
new file mode 100644
index 26abe8a..3b91877
*** a/src/test/regress/expected/opr_sanity.out
--- b/src/test/regress/expected/opr_sanity.out
*************** WHERE p2.opfmethod = p1.oid AND p3.ampro
*** 1306,1312 ****
p4.amproclefttype = p3.amproclefttype AND
p4.amprocrighttype = p3.amprocrighttype)
NOT BETWEEN
! (CASE WHEN p1.amname IN ('btree', 'gist', 'gin') THEN p1.amsupport - 1
ELSE p1.amsupport END)
AND p1.amsupport;
amname | opfname | amproclefttype | amprocrighttype
--- 1306,1313 ----
p4.amproclefttype = p3.amproclefttype AND
p4.amprocrighttype = p3.amprocrighttype)
NOT BETWEEN
! (CASE WHEN p1.amname IN ('btree', 'gist') THEN p1.amsupport - 1
! WHEN p1.amname = 'gin' THEN p1.amsupport - 2
ELSE p1.amsupport END)
AND p1.amsupport;
amname | opfname | amproclefttype | amprocrighttype
*************** FROM pg_am am JOIN pg_opclass op ON opcm
*** 1333,1339 ****
amproclefttype = amprocrighttype AND amproclefttype = opcintype
WHERE am.amname = 'btree' OR am.amname = 'gist' OR am.amname = 'gin'
GROUP BY amname, amsupport, opcname, amprocfamily
! HAVING (count(*) != amsupport AND count(*) != amsupport - 1)
OR amprocfamily IS NULL;
amname | opcname | count
--------+---------+-------
--- 1334,1341 ----
amproclefttype = amprocrighttype AND amproclefttype = opcintype
WHERE am.amname = 'btree' OR am.amname = 'gist' OR am.amname = 'gin'
GROUP BY amname, amsupport, opcname, amprocfamily
! HAVING (count(*) != amsupport AND count(*) != amsupport - 1 AND
! (count(*) != amsupport - 2 OR am.amname <> 'gin'))
OR amprocfamily IS NULL;
amname | opcname | count
--------+---------+-------
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
new file mode 100644
index 40e1be2..6dd93f9
*** a/src/test/regress/sql/opr_sanity.sql
--- b/src/test/regress/sql/opr_sanity.sql
*************** WHERE p2.opfmethod = p1.oid AND p3.ampro
*** 1002,1008 ****
p4.amproclefttype = p3.amproclefttype AND
p4.amprocrighttype = p3.amprocrighttype)
NOT BETWEEN
! (CASE WHEN p1.amname IN ('btree', 'gist', 'gin') THEN p1.amsupport - 1
ELSE p1.amsupport END)
AND p1.amsupport;
--- 1002,1009 ----
p4.amproclefttype = p3.amproclefttype AND
p4.amprocrighttype = p3.amprocrighttype)
NOT BETWEEN
! (CASE WHEN p1.amname IN ('btree', 'gist') THEN p1.amsupport - 1
! WHEN p1.amname = 'gin' THEN p1.amsupport - 2
ELSE p1.amsupport END)
AND p1.amsupport;
*************** FROM pg_am am JOIN pg_opclass op ON opcm
*** 1024,1030 ****
amproclefttype = amprocrighttype AND amproclefttype = opcintype
WHERE am.amname = 'btree' OR am.amname = 'gist' OR am.amname = 'gin'
GROUP BY amname, amsupport, opcname, amprocfamily
! HAVING (count(*) != amsupport AND count(*) != amsupport - 1)
OR amprocfamily IS NULL;
-- Unfortunately, we can't check the amproc link very well because the
--- 1025,1032 ----
amproclefttype = amprocrighttype AND amproclefttype = opcintype
WHERE am.amname = 'btree' OR am.amname = 'gist' OR am.amname = 'gin'
GROUP BY amname, amsupport, opcname, amprocfamily
! HAVING (count(*) != amsupport AND count(*) != amsupport - 1 AND
! (count(*) != amsupport - 2 OR am.amname <> 'gin'))
OR amprocfamily IS NULL;
-- Unfortunately, we can't check the amproc link very well because the
0006-Sort-entries.patchapplication/octet-stream; name=0006-Sort-entries.patchDownload
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
new file mode 100644
index 76a70a0..00a45bb
*** a/src/backend/access/gin/ginget.c
--- b/src/backend/access/gin/ginget.c
*************** entryGetItem(GinState *ginstate, GinScan
*** 729,734 ****
--- 729,754 ----
}
/*
+ * Comparison function for scan entry indexes. Sorts them by descending of
+ * curItem assuming lossy page is lowest item pointer in page.
+ */
+ static int
+ cmpEntries(const void *a1, const void *a2, void *arg)
+ {
+ const GinScanKey key = (const GinScanKey)arg;
+ int i1 = *(const int *)a1;
+ int i2 = *(const int *)a2;
+ ItemPointerData iptr1 = key->scanEntry[i1]->curItem;
+ ItemPointerData iptr2 = key->scanEntry[i2]->curItem;
+
+ if (ItemPointerIsLossyPage(&iptr1))
+ iptr1.ip_posid = 0;
+ if (ItemPointerIsLossyPage(&iptr2))
+ iptr2.ip_posid = 0;
+ return -ginCompareItemPointers(&iptr1, &iptr2);
+ }
+
+ /*
* Identify the "current" item among the input entry streams for this scan key
* that is greater than advancePast, and test whether it passes the scan key
* qual condition.
*************** keyGetItem(GinState *ginstate, MemoryCon
*** 763,768 ****
--- 783,789 ----
bool allUnknown;
int minUnknown;
GinLogicValue res;
+ int *entryIndexes;
Assert(!key->isFinished);
*************** keyGetItem(GinState *ginstate, MemoryCon
*** 783,791 ****
--- 804,856 ----
* pointers, which is good.
*/
oldCtx = CurrentMemoryContext;
+ entryIndexes = (int *)MemoryContextAlloc(tempCtx,
+ sizeof(int) * key->nentries);
+ for (i = 0; i < key->nentries; i++)
+ entryIndexes[i] = i;
for (;;)
{
+ restart:
+ qsort_arg(entryIndexes, key->nentries, sizeof(int), cmpEntries, key);
+ for (i = 0; i < key->nentries; i++)
+ key->entryRes[i] = GIN_MAYBE;
+ for (i = 0; i < key->nentries - 1; i++)
+ {
+ uint32 minPredictNumberResult;
+ int minPredictNumberResultIndex = -1;
+
+ key->entryRes[entryIndexes[i]] = GIN_FALSE;
+
+ if (ginCompareItemPointers(
+ &key->scanEntry[entryIndexes[i]]->curItem,
+ &key->scanEntry[entryIndexes[i + 1]]->curItem) == 0)
+ continue;
+
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
+ if (res == GIN_FALSE)
+ {
+ int j;
+ advancePast = key->scanEntry[entryIndexes[i]]->curItem;
+ advancePast.ip_posid--;
+ for (j = i + 1; j < key->nentries; j++)
+ {
+ GinScanEntry entry = key->scanEntry[entryIndexes[j]];
+ if (minPredictNumberResultIndex == -1 ||
+ entry->predictNumberResult < minPredictNumberResult)
+ {
+ minPredictNumberResult = entry->predictNumberResult;
+ minPredictNumberResultIndex = entryIndexes[j];
+ }
+ }
+ Assert(minPredictNumberResultIndex >= 0);
+ entryGetItem(ginstate, key->scanEntry[minPredictNumberResultIndex], advancePast);
+ goto restart;
+ }
+ }
+
ItemPointerSetMax(&minItem);
allFinished = true;
allUnknown = true;
*************** keyGetItem(GinState *ginstate, MemoryCon
*** 798,812 ****
continue;
allFinished = false;
! if (!entry->isFinished &&
! ginCompareItemPointers(&entry->curItem, &advancePast) > 0)
! {
! allUnknown = false;
! if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
! minItem = entry->curItem;
! }
! else if (minUnknown == -1)
! minUnknown = i;
}
if (allFinished)
--- 863,873 ----
continue;
allFinished = false;
! if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
! entryGetItem(ginstate, entry, advancePast);
!
! if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
! minItem = entry->curItem;
}
if (allFinished)
*************** keyGetItem(GinState *ginstate, MemoryCon
*** 816,891 ****
return;
}
- if (allUnknown)
- {
- /*
- * We must have an item from at least one source to have a match.
- * Fetch the next item > advancePast from the first (non-finished)
- * entry stream.
- */
- entry = key->scanEntry[minUnknown];
- entryGetItem(ginstate, entry, advancePast);
- continue;
- }
-
- /*
- * We now have minItem set to the minimum among input streams *that*
- * we know. Some streams might be in unknown state, meaning we don't
- * know the next value from that input.
- *
- * Determine if any items between advancePast and minItem might match.
- * Such items might come from one of the unknown sources, but it's
- * possible that the consistent function can refute them all, ie.
- * the consistent logic says that they cannot match without any of the
- * sources that we have loaded.
- */
- if (minUnknown != -1)
- {
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished)
- key->entryRes[i] = GIN_FALSE;
- else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
- {
- /* this source is 'unloaded' */
- key->entryRes[i] = GIN_MAYBE;
- }
- else
- {
- /*
- * we know the next item from this source to be >= minItem,
- * hence it's false for any items before < minItem
- */
- key->entryRes[i] = GIN_FALSE;
- }
- }
-
- MemoryContextSwitchTo(tempCtx);
- res = key->triConsistentFn(key);
- MemoryContextSwitchTo(oldCtx);
-
- if (res == GIN_FALSE)
- {
- /*
- * All items between advancePast and minItem have been refuted.
- * Proceed with minItem.
- */
- advancePast = minItem;
- advancePast.ip_posid--;
- }
- else
- {
- /*
- * There might be matches smaller than minItem coming from one
- * of the unknown sources. Load more items, and retry.
- */
- entry = key->scanEntry[minUnknown];
- entryGetItem(ginstate, entry, advancePast);
- continue;
- }
- }
-
/*
* Ok, we now know that there are no matches < minItem. Proceed to
* check if it's a match.
--- 877,882 ----
*************** keyGetItem(GinState *ginstate, MemoryCon
*** 895,919 ****
GinItemPointerGetBlockNumber(&minItem));
/*
- * We might not have loaded all the entry streams for this TID. We
- * could call the consistent function, passing MAYBE for those entries,
- * to see if it can decide if this TID matches based on the information
- * we have. But if the consistent-function is expensive, and cannot
- * in fact decide with partial information, that could be a big loss.
- * So, loop back to load the missing entries, before calling the
- * consistent function.
- */
- if (minUnknown != -1)
- {
- for (i = minUnknown; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
- entryGetItem(ginstate, entry, advancePast);
- }
- }
-
- /*
* Lossy-page entries pose a problem, since we don't know the correct
* entryRes state to pass to the consistentFn, and we also don't know
* what its combining logic will be (could be AND, OR, or even NOT).
--- 886,891 ----
On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
In addition to that, I'm using the ternary consistent function to check
if minItem is a match, even if we haven't loaded all the entries yet.
That's less important, but I think for something like "rare1 | (rare2 &
frequent)" it might be useful. It would allow us to skip fetching
'frequent', when we already know that 'rare1' matches for the current
item. I'm not sure if that's worth the cycles, but it seemed like an
obvious thing to do, now that we have the ternary consistent function.So, that clearly isn't worth the cycles :-). At least not with an
expensive consistent function; it might be worthwhile if we pre-build the
truth-table, or cache the results of the consistent function.
I believe cache consistent function results is quite same as lazy
truth-table. I think it's a good option to use with two-state consistent
function. However, I don't think it's a reason to refuse from three-state
consistent function because number of entries could be large.
------
With best regards,
Alexander Korotkov.
On 27.1.2014 16:30, Alexander Korotkov wrote:
On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov
<aekorotkov@gmail.com <mailto:aekorotkov@gmail.com>> wrote:I attach two patches which rollback these two features (sorry for awful
quality of second). Native consistent function accelerates thing
significantly, as expected. Tt seems that sorting entries have almost no
effect. However it's still not as fast as initial fast-scan:# select count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
count
───────
627
(1 row)Time: 5,381 ms
Tomas, could you rerun your tests with first and both these patches
applied against patches by Heikki?
Done, and the results are somewhat disappointing.
I've generated 1000 queries with either 3 or 6 words, based on how often
they occur in the documents. For example 1% means there's 1% of
documents containing the word. In this case, I've used ranges 0-2%, 1-3%
and 3-9%.
Which gives six combinations
| 0-2% | 1-3% | 3-9% |
--------------------------------
3 words | | | |
--------------------------------
6 words | | | |
--------------------------------
Each word had ~5% probability to be negated (i.e. "!" in front of it).
So these queries are a bit different than the ones I ran yesterday.
Then I ran those scripts on:
* 9.3
* 9.4 with Heikki's patches (9.4-heikki)
* 9.4 with Heikki's and first patch (9.4-alex-1)
* 9.4 with Heikki's and both patches (9.4-alex-2)
I've always created a new DB, loaded the data, done VACUUM (FREEZE,
ANALYZE) and then ran the script 5x but only measured the fifth run.
The full results are available here (and attached as ODT, but just the
numbers without the charts)
https://docs.google.com/spreadsheet/ccc?key=0Alm8ruV3ChcgdHJfZTdOY2JBSlkwZjNuWGlIaGM0REE
On all the charts the x-axis is "how long it took without the patch" and
y-axis means "how much longer it took with the patch". 1 means exactly
the same, >1 slower, <1 faster. Sometimes one (or both) of the axes is
log-scale. The durations are in microseconds (i.e. 1e-6 sec).
I'll analyze the results for 3-words first.
The Heikki's patch seems fine, at least compared to 9.3. See for example
the heikki-vs-9.3.png image. This is the case with 3 words, each
contained in less than 2% of documents (i.e. rare words). Vast majority
of the queries is much faster, and the ~1.0 results are below 1
milisecond, which is somewhat tricky to measure.
Now, see alexander-1.png / alexander-2.png, for one / both of the
patches, compared to results with Heikki's patches. Not really good,
IMHO, especially for the first patch - most of the queries is much
slower, even by an order of magnitude. The second patch fixes the worst
cases, but does not really make it better than 9.4-heikki.
It however gets better as the words become more common. See for example
alexander-common-words.png - which once again compares 9.4-alex-1 vs.
9.4-heikki on 3 words in the 3-9% range. This time the performance is
rather consistently better.
On 6 words the results are similar, i.e bad with rare words but getting
better on the more common ones. Except that in this case it never gets
better than 9.4-heikki.
I can provide the queries but without the dataset I'm testing this on,
that's pretty useless. I'll try to analyze this a bit more later today,
but I'm afraid I don't have the necessary insight.
regards
Tomas
On 01/28/2014 05:54 AM, Tomas Vondra wrote:
Then I ran those scripts on:
* 9.3
* 9.4 with Heikki's patches (9.4-heikki)
* 9.4 with Heikki's and first patch (9.4-alex-1)
* 9.4 with Heikki's and both patches (9.4-alex-2)
It would be good to also test with unpatched 9.4 (ie. git master). The
packed posting lists patch might account for a large part of the
differences between 9.3 and the patched 9.4 versions.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 28.1.2014 08:29, Heikki Linnakangas wrote:
On 01/28/2014 05:54 AM, Tomas Vondra wrote:
Then I ran those scripts on:
* 9.3
* 9.4 with Heikki's patches (9.4-heikki)
* 9.4 with Heikki's and first patch (9.4-alex-1)
* 9.4 with Heikki's and both patches (9.4-alex-2)It would be good to also test with unpatched 9.4 (ie. git master). The
packed posting lists patch might account for a large part of the
differences between 9.3 and the patched 9.4 versions.- Heikki
Hi,
the e-mail I sent yesterday apparently did not make it into the list,
probably because of the attachments, so I'll just link them this time.
I added the results from 9.4 master to the spreadsheet:
https://docs.google.com/spreadsheet/ccc?key=0Alm8ruV3ChcgdHJfZTdOY2JBSlkwZjNuWGlIaGM0REE
It's a bit cumbersome to analyze though, so I've quickly hacked up a
simple jqplot page that allows comparing the results. It's available
here: http://www.fuzzy.cz/tmp/gin/
It's likely there are some quirks and issues - let me know about them.
The ODT with the data is available here:
http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods
Three quick basic observations:
(1) The current 9.4 master is consistently better than 9.3 by about 15%
on rare words, and up to 30% on common words. See the charts for
6-word queries:
http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png
http://www.fuzzy.cz/tmp/gin/6-words-rare-94-vs-93.png
With 3-word queries the effects are even stronger & clearer,
especially with the common words.
(2) Heikki's patches seem to work OK, i.e. improve the performance, but
only with rare words.
http://www.fuzzy.cz/tmp/gin/heikki-vs-94-rare.png
With 3 words the impact is much stronger than with 6 words,
presumably because it depends on how frequent the combination of
words is (~ multiplication of probabilities). See
http://www.fuzzy.cz/tmp/gin/heikki-vs-94-3-common-words.png
http://www.fuzzy.cz/tmp/gin/heikki-vs-94-6-common-words.png
for comparison of 9.4 master vs. 9.4+heikki's patches.
(3) A file with explain plans for 4 queries suffering ~2x slowdown,
and explain plans with 9.4 master and Heikki's patches is available
here:
http://www.fuzzy.cz/tmp/gin/queries.txt
All the queries have 6 common words, and the explain plans look
just fine to me - exactly like the plans for other queries.
Two things now caught my eye. First some of these queries actually
have words repeated - either exactly like "database & database" or
in negated form like "!anything & anything". Second, while
generating the queries, I use "dumb" frequency, where only exact
matches count. I.e. "write != written" etc. But the actual number
of hits may be much higher - for example "write" matches exactly
just 5% documents, but using @@ it matches more than 20%.
I don't know if that's the actual cause though.
regards
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/30/2014 01:53 AM, Tomas Vondra wrote:
(3) A file with explain plans for 4 queries suffering ~2x slowdown,
and explain plans with 9.4 master and Heikki's patches is available
here:http://www.fuzzy.cz/tmp/gin/queries.txt
All the queries have 6 common words, and the explain plans look
just fine to me - exactly like the plans for other queries.Two things now caught my eye. First some of these queries actually
have words repeated - either exactly like "database & database" or
in negated form like "!anything & anything". Second, while
generating the queries, I use "dumb" frequency, where only exact
matches count. I.e. "write != written" etc. But the actual number
of hits may be much higher - for example "write" matches exactly
just 5% documents, but using @@ it matches more than 20%.I don't know if that's the actual cause though.
I tried these queries with the data set you posted here:
/messages/by-id/52E4141E.60609@fuzzy.cz. The reason
for the slowdown is the extra consistent calls it causes. That's
expected - the patch certainly does call consistent in situations where
it didn't before, and if the "pre-consistent" checks are not able to
eliminate many tuples, you lose.
So, what can we do to mitigate that? Some ideas:
1. Implement the catalog changes from Alexander's patch. That ought to
remove the overhead, as you only need to call the consistent function
once, not "both ways". OTOH, currently we only call the consistent
function if there is only one unknown column. If with the catalog
changes, we always call the consistent function even if there are more
unknown columns, we might end up calling it even more often.
2. Cache the result of the consistent function.
3. Make the consistent function cheaper. (How? Magic?)
4. Use some kind of a heuristic, and stop doing the pre-consistent
checks if they're not effective. Not sure what the heuristic would look
like.
The caching we could easily do. It's very simple and very effective, as
long as the number of number of entries is limited. The amount of space
required to cache all combinations grows exponentially, so it's only
feasible for up to 10 or so entries.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/30/2014 01:53 AM, Tomas Vondra wrote:
(3) A file with explain plans for 4 queries suffering ~2x slowdown,
and explain plans with 9.4 master and Heikki's patches is available
here:http://www.fuzzy.cz/tmp/gin/queries.txt
All the queries have 6 common words, and the explain plans look
just fine to me - exactly like the plans for other queries.Two things now caught my eye. First some of these queries actually
have words repeated - either exactly like "database & database" or
in negated form like "!anything & anything". Second, while
generating the queries, I use "dumb" frequency, where only exact
matches count. I.e. "write != written" etc. But the actual number
of hits may be much higher - for example "write" matches exactly
just 5% documents, but using @@ it matches more than 20%.I don't know if that's the actual cause though.
Ok, here's another variant of these patches. Compared to git master, it
does three things:
1. It adds the concept of ternary consistent function internally, but no
catalog changes. It's implemented by calling the regular boolean
consistent function "both ways".
2. Use a binary heap to get the "next" item from the entries in a scan.
I'm pretty sure this makes sense, because arguably it makes the code
more readable, and reduces the number of item pointer comparisons
significantly for queries with a lot of entries.
3. Only perform the pre-consistent check to try skipping entries, if we
don't already have the next item from the entry loaded in the array.
This is a tradeoff, you will lose some of the performance gain you might
get from pre-consistent checks, but it also limits the performance loss
you might get from doing useless pre-consistent checks.
So taken together, I would expect this patch to make some of the
performance gains less impressive, but also limit the loss we saw with
some of the other patches.
Tomas, could you run your test suite with this patch, please?
- Heikki
Attachments:
gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patchtext/x-diff; name=gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patchDownload
diff --git a/src/backend/access/gin/Makefile b/src/backend/access/gin/Makefile
index aabc62f..db4f496 100644
--- a/src/backend/access/gin/Makefile
+++ b/src/backend/access/gin/Makefile
@@ -14,6 +14,6 @@ include $(top_builddir)/src/Makefile.global
OBJS = ginutil.o gininsert.o ginxlog.o ginentrypage.o gindatapage.o \
ginbtree.o ginscan.o ginget.o ginvacuum.o ginarrayproc.o \
- ginbulk.o ginfast.o ginpostinglist.o
+ ginbulk.o ginfast.o ginpostinglist.o ginlogic.o
include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index a45d722..f2ea962 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -32,41 +32,6 @@ typedef struct pendingPosition
bool *hasMatchKey;
} pendingPosition;
-
-/*
- * Convenience function for invoking a key's consistentFn
- */
-static bool
-callConsistentFn(GinState *ginstate, GinScanKey key)
-{
- /*
- * If we're dealing with a dummy EVERYTHING key, we don't want to call the
- * consistentFn; just claim it matches.
- */
- if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
- {
- key->recheckCurItem = false;
- return true;
- }
-
- /*
- * Initialize recheckCurItem in case the consistentFn doesn't know it
- * should set it. The safe assumption in that case is to force recheck.
- */
- key->recheckCurItem = true;
-
- return DatumGetBool(FunctionCall8Coll(&ginstate->consistentFn[key->attnum - 1],
- ginstate->supportCollation[key->attnum - 1],
- PointerGetDatum(key->entryRes),
- UInt16GetDatum(key->strategy),
- key->query,
- UInt32GetDatum(key->nuserentries),
- PointerGetDatum(key->extra_data),
- PointerGetDatum(&key->recheckCurItem),
- PointerGetDatum(key->queryValues),
- PointerGetDatum(key->queryCategories)));
-}
-
/*
* Goes to the next page if current offset is outside of bounds
*/
@@ -453,13 +418,31 @@ restartScanEntry:
freeGinBtreeStack(stackEntry);
}
+static int
+entryCmp(Datum a, Datum b, void *arg)
+{
+ GinScanEntry ea = (GinScanEntry) DatumGetPointer(a);
+ GinScanEntry eb = (GinScanEntry) DatumGetPointer(b);
+ return -ginCompareItemPointers(&ea->curItem, &eb->curItem);
+}
+
static void
startScanKey(GinState *ginstate, GinScanKey key)
{
+ int i;
ItemPointerSetMin(&key->curItem);
key->curItemMatches = false;
key->recheckCurItem = false;
key->isFinished = false;
+
+ GinInitConsistentMethod(ginstate, key);
+
+ key->entryHeap = binaryheap_allocate(key->nentries, entryCmp, NULL);
+ for (i = 0; i < key->nentries; i++)
+ binaryheap_add(key->entryHeap, PointerGetDatum(key->scanEntry[i]));
+
+ key->nunloaded = 0;
+ key->unloadedEntries = palloc(key->nentries * sizeof(GinScanEntry));
}
static void
@@ -649,6 +632,11 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
*
* Item pointers are returned in ascending order.
*
+ * If 'ifcheap' is passed as TRUE, the function only advances curItem if it's
+ * relatively cheap to do so. In the current implementation, cheap means that
+ * the next item loaded in the entry->list array. Returns TRUE if curItem was
+ * advanced, FALSE otherwise.
+ *
* Note: this can return a "lossy page" item pointer, indicating that the
* entry potentially matches all items on that heap page. However, it is
* not allowed to return both a lossy page pointer and exact (regular)
@@ -656,9 +644,9 @@ entryLoadMoreItems(GinState *ginstate, GinScanEntry entry, ItemPointerData advan
* logic in keyGetItem and scanGetItem; see comment in scanGetItem.) In the
* current implementation this is guaranteed by the behavior of tidbitmaps.
*/
-static void
+static bool
entryGetItem(GinState *ginstate, GinScanEntry entry,
- ItemPointerData advancePast)
+ ItemPointerData advancePast, bool ifcheap)
{
Assert(!entry->isFinished);
@@ -768,12 +756,15 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
/* If we've processed the current batch, load more items */
while (entry->offset >= entry->nlist)
{
+ if (ifcheap)
+ return false;
+
entryLoadMoreItems(ginstate, entry, advancePast);
if (entry->isFinished)
{
ItemPointerSetInvalid(&entry->curItem);
- return;
+ return true;
}
}
@@ -782,6 +773,7 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
} while (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0 ||
(entry->reduceResult == TRUE && dropItem(entry)));
}
+ return true;
}
/*
@@ -812,180 +804,291 @@ keyGetItem(GinState *ginstate, MemoryContext tempCtx, GinScanKey key,
ItemPointerData minItem;
ItemPointerData curPageLossy;
uint32 i;
- uint32 lossyEntry;
bool haveLossyEntry;
GinScanEntry entry;
- bool res;
MemoryContext oldCtx;
- bool allFinished;
+ bool gotItem;
+ GinLogicValue res;
Assert(!key->isFinished);
/*
* We might have already tested this item; if so, no need to repeat work.
- * (Note: the ">" case can happen, if minItem is exact but we previously
+ * (Note: the ">" case can happen, if advancePast is exact but we previously
* had to set curItem to a lossy-page pointer.)
*/
if (ginCompareItemPointers(&key->curItem, &advancePast) > 0)
return;
- /*
- * Find the minimum item > advancePast among the active entry streams.
- *
- * Note: a lossy-page entry is encoded by a ItemPointer with max value for
- * offset (0xffff), so that it will sort after any exact entries for the
- * same page. So we'll prefer to return exact pointers not lossy
- * pointers, which is good.
- */
- ItemPointerSetMax(&minItem);
- allFinished = true;
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
+ oldCtx = CurrentMemoryContext;
+ for (;;)
+ {
/*
- * Advance this stream if necessary.
+ * Find the minimum item > advancePast among the active entry streams.
*
- * In particular, since entry->curItem was initialized with
- * ItemPointerSetMin, this ensures we fetch the first item for each
- * entry on the first call.
+ * Note: a lossy-page entry is encoded by a ItemPointer with max value
+ * for offset (0xffff), so that it will sort after any exact entries
+ * for the same page. So we'll prefer to return exact pointers not
+ * lossy pointers, which is good.
*/
- while (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ ItemPointerSetMax(&minItem);
+ gotItem = false;
+ while (!binaryheap_empty(key->entryHeap))
{
- entryGetItem(ginstate, entry, advancePast);
+ entry = (GinScanEntry) DatumGetPointer(binaryheap_first(key->entryHeap));
+ if (entry->isFinished)
+ binaryheap_remove_first(key->entryHeap);
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ /* Advance this entry, if it's cheap to do so */
+ if (entryGetItem(ginstate, entry, advancePast, true))
+ binaryheap_replace_first(key->entryHeap, PointerGetDatum(entry));
+ else
+ {
+ binaryheap_remove_first(key->entryHeap);
+ key->unloadedEntries[key->nunloaded++] = entry;
+ }
+ }
+ else
+ {
+ gotItem = true;
+ minItem = entry->curItem;
+ break;
+ }
}
- if (!entry->isFinished)
+ /*
+ * We must have an item from at least one source to have a match.
+ * Fetch the next item > advancePast from one of the streams.
+ */
+ if (!gotItem)
{
- allFinished = FALSE;
- if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
- minItem = entry->curItem;
+ if (key->nunloaded == 0)
+ {
+ /* all entries are finished */
+ key->isFinished = TRUE;
+ return;
+ }
+
+ entry = key->unloadedEntries[--key->nunloaded];
+ entryGetItem(ginstate, entry, advancePast, false);
+ binaryheap_add(key->entryHeap, PointerGetDatum(entry));
+ continue;
}
- }
- if (allFinished)
- {
- /* all entries are finished */
- key->isFinished = TRUE;
- return;
- }
+ /*
+ * We now have minItem set to the minimum among input streams that
+ * we know. Some streams might be in unknown state, meaning we don't
+ * know the next value from that input.
+ *
+ * Determine if any items between advancePast and minItem might match.
+ * Such items might come from one of the unknown sources, but it's
+ * possible that the consistent function can refute them all, ie.
+ * the consistent logic says that they cannot match without any of the
+ * sources that we have loaded.
+ */
+ if (key->nunloaded > 0)
+ {
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ {
+ /* this source is 'unloaded' */
+ key->entryRes[i] = GIN_MAYBE;
+ }
+ else
+ {
+ /*
+ * we know the next item from this source to be >= minItem,
+ * hence it's false for any items before < minItem
+ */
+ key->entryRes[i] = GIN_FALSE;
+ }
+ }
- /*
- * OK, set key->curItem and perform consistentFn test.
- */
- key->curItem = minItem;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- /*
- * Lossy-page entries pose a problem, since we don't know the correct
- * entryRes state to pass to the consistentFn, and we also don't know what
- * its combining logic will be (could be AND, OR, or even NOT). If the
- * logic is OR then the consistentFn might succeed for all items in the
- * lossy page even when none of the other entries match.
- *
- * If we have a single lossy-page entry then we check to see if the
- * consistentFn will succeed with only that entry TRUE. If so, we return
- * a lossy-page pointer to indicate that the whole heap page must be
- * checked. (On subsequent calls, we'll do nothing until minItem is past
- * the page altogether, thus ensuring that we never return both regular
- * and lossy pointers for the same page.)
- *
- * This idea could be generalized to more than one lossy-page entry, but
- * ideally lossy-page entries should be infrequent so it would seldom be
- * the case that we have more than one at once. So it doesn't seem worth
- * the extra complexity to optimize that case. If we do find more than
- * one, we just punt and return a lossy-page pointer always.
- *
- * Note that only lossy-page entries pointing to the current item's page
- * should trigger this processing; we might have future lossy pages in the
- * entry array, but they aren't relevant yet.
- */
- ItemPointerSetLossyPage(&curPageLossy,
- GinItemPointerGetBlockNumber(&key->curItem));
+ if (res == GIN_FALSE)
+ {
+ /*
+ * All items between advancePast and minItem have been refuted.
+ * Proceed with minItem.
+ */
+ advancePast = minItem;
+ advancePast.ip_posid--;
+ }
+ else
+ {
+ /*
+ * There might be matches smaller than minItem coming from one
+ * of the unknown sources. Load more items, and retry.
+ */
+ entry = key->unloadedEntries[--key->nunloaded];
+ entryGetItem(ginstate, entry, advancePast, false);
+ binaryheap_add(key->entryHeap, PointerGetDatum(entry));
+ continue;
+ }
+ }
- lossyEntry = 0;
- haveLossyEntry = false;
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ /*
+ * Ok, we now know that there are no matches < minItem. Proceed to
+ * check if it's a match.
+ */
+ key->curItem = minItem;
+ advancePast = minItem;
+ advancePast.ip_posid--;
+ ItemPointerSetLossyPage(&curPageLossy,
+ GinItemPointerGetBlockNumber(&minItem));
+
+ /*
+ * We might not have loaded all the entry streams for this TID. We
+ * could call the consistent function, passing MAYBE for those entries,
+ * to see if it can decide if this TID matches based on the information
+ * we have. But if the consistent-function is expensive, and cannot
+ * in fact decide with partial information, that could be a big loss.
+ * So, loop back to load the missing entries, before calling the
+ * consistent function.
+ */
+ while (key->nunloaded > 0)
+ {
+ entry = key->unloadedEntries[--key->nunloaded];
+ entryGetItem(ginstate, entry, advancePast, false);
+ binaryheap_add(key->entryHeap, PointerGetDatum(entry));
+ }
+
+
+ /*
+ * Lossy-page entries pose a problem, since we don't know the correct
+ * entryRes state to pass to the consistentFn, and we also don't know
+ * what its combining logic will be (could be AND, OR, or even NOT).
+ * If the logic is OR then the consistentFn might succeed for all items
+ * in the lossy page even when none of the other entries match.
+ *
+ * Our strategy is to call the tri-state consistent function, with the
+ * lossy-page entries set to MAYBE, and all the other entries FALSE.
+ * If it returns FALSE, none of the lossy items alone are enough for a
+ * match, so we don't need to return a lossy-page pointer. Otherwise,
+ * return a lossy-page pointer to indicate that the whole heap page must
+ * be checked. (On subsequent calls, we'll do nothing until minItem is
+ * past the page altogether, thus ensuring that we never return both
+ * regular and lossy pointers for the same page.)
+ *
+ * An exception is that we don't need to try it both ways (ie. pass
+ * MAYBE) if the lossy pointer is in a "hidden" entry, because the
+ * consistentFn's result can't depend on that (but mark the result as
+ * 'recheck').
+ *
+ * Note that only lossy-page entries pointing to the current item's
+ * page should trigger this processing; we might have future lossy
+ * pages in the entry array, but they aren't relevant yet.
+ */
+ haveLossyEntry = false;
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished == FALSE &&
+ ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ {
+ key->entryRes[i] = GIN_MAYBE;
+ haveLossyEntry = true;
+ }
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
+
+ if (haveLossyEntry)
{
- if (haveLossyEntry)
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
+
+ if (res == GIN_TRUE || res == GIN_MAYBE)
{
- /* Multiple lossy entries, punt */
+ /* Some of the lossy items on the heap page might match, punt */
key->curItem = curPageLossy;
key->curItemMatches = true;
key->recheckCurItem = true;
return;
}
- lossyEntry = i;
- haveLossyEntry = true;
}
- }
- /* prepare for calling consistentFn in temp context */
- oldCtx = MemoryContextSwitchTo(tempCtx);
+ /*
+ * Let's call the consistent function to check if this is a match.
+ *
+ * At this point we know that we don't need to return a lossy
+ * whole-page pointer, but we might have matches for individual exact
+ * item pointers, possibly in combination with a lossy pointer. Pass
+ * lossy pointers as MAYBE to the ternary consistent function, to
+ * let it decide if this tuple satisfies the overall key, even though
+ * we don't know whether the lossy entries match.
+ *
+ * We might also not have advanced all the entry streams up to this
+ * point yet. It's possible that the consistent function can
+ * nevertheless decide that this is definitely a match or not a match,
+ * even though we don't know if those unknown entries match, so we
+ * pass them as MAYBE.
+ */
+ for (i = 0; i < key->nentries; i++)
+ {
+ entry = key->scanEntry[i];
+ if (entry->isFinished)
+ key->entryRes[i] = GIN_FALSE;
+ else if (ginCompareItemPointers(&entry->curItem, &advancePast) <= 0)
+ key->entryRes[i] = GIN_MAYBE; /* not loaded yet */
+ else if (ginCompareItemPointers(&entry->curItem, &curPageLossy) == 0)
+ key->entryRes[i] = GIN_MAYBE;
+ else if (ginCompareItemPointers(&entry->curItem, &minItem) == 0)
+ key->entryRes[i] = GIN_TRUE;
+ else
+ key->entryRes[i] = GIN_FALSE;
+ }
- if (haveLossyEntry)
- {
- /* Single lossy-page entry, so see if whole page matches */
- memset(key->entryRes, FALSE, key->nentries);
- key->entryRes[lossyEntry] = TRUE;
+ MemoryContextSwitchTo(tempCtx);
+ res = key->triConsistentFn(key);
+ MemoryContextSwitchTo(oldCtx);
- if (callConsistentFn(ginstate, key))
+ switch (res)
{
- /* Yes, so clean up ... */
- MemoryContextSwitchTo(oldCtx);
- MemoryContextReset(tempCtx);
-
- /* and return lossy pointer for whole page */
- key->curItem = curPageLossy;
- key->curItemMatches = true;
- key->recheckCurItem = true;
- return;
- }
- }
+ case GIN_TRUE:
+ key->curItemMatches = true;
+ /* triConsistentFn set recheckCurItem */
+ break;
- /*
- * At this point we know that we don't need to return a lossy whole-page
- * pointer, but we might have matches for individual exact item pointers,
- * possibly in combination with a lossy pointer. Our strategy if there's
- * a lossy pointer is to try the consistentFn both ways and return a hit
- * if it accepts either one (forcing the hit to be marked lossy so it will
- * be rechecked). An exception is that we don't need to try it both ways
- * if the lossy pointer is in a "hidden" entry, because the consistentFn's
- * result can't depend on that.
- *
- * Prepare entryRes array to be passed to consistentFn.
- */
- for (i = 0; i < key->nentries; i++)
- {
- entry = key->scanEntry[i];
- if (entry->isFinished == FALSE &&
- ginCompareItemPointers(&entry->curItem, &key->curItem) == 0)
- key->entryRes[i] = TRUE;
- else
- key->entryRes[i] = FALSE;
- }
- if (haveLossyEntry)
- key->entryRes[lossyEntry] = TRUE;
+ case GIN_FALSE:
+ key->curItemMatches = false;
+ break;
- res = callConsistentFn(ginstate, key);
+ case GIN_MAYBE:
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
- if (!res && haveLossyEntry && lossyEntry < key->nuserentries)
- {
- /* try the other way for the lossy item */
- key->entryRes[lossyEntry] = FALSE;
+ default:
+ /*
+ * the 'default' case shouldn't happen, but if the consistent
+ * function returns something bogus, this is the safe result
+ */
+ key->curItemMatches = true;
+ key->recheckCurItem = true;
+ break;
+ }
- res = callConsistentFn(ginstate, key);
+ /*
+ * We have a tuple, and we know if it mathes or not. If it's a
+ * non-match, we could continue to find the next matching tuple, but
+ * let's break out and give scanGetItem a chance to advance the other
+ * keys. They might be able to skip past to a much higher TID, allowing
+ * us to save work.
+ */
+ break;
}
- key->curItemMatches = res;
- /* If we matched a lossy entry, force recheckCurItem = true */
- if (haveLossyEntry)
- key->recheckCurItem = true;
-
/* clean up after consistentFn calls */
MemoryContextSwitchTo(oldCtx);
MemoryContextReset(tempCtx);
@@ -1080,7 +1183,7 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
/*
* If this is the first key, remember this location as a
- * potential match.
+ * potential match, and proceed to check the rest of the keys.
*
* Otherwise, check if this is the same item that we checked the
* previous keys for (or a lossy pointer for the same page). If
@@ -1091,21 +1194,20 @@ scanGetItem(IndexScanDesc scan, ItemPointerData advancePast,
if (i == 0)
{
*item = key->curItem;
+ continue;
+ }
+
+ if (ItemPointerIsLossyPage(&key->curItem) ||
+ ItemPointerIsLossyPage(item))
+ {
+ Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
+ match = (GinItemPointerGetBlockNumber(&key->curItem) ==
+ GinItemPointerGetBlockNumber(item));
}
else
{
- if (ItemPointerIsLossyPage(&key->curItem) ||
- ItemPointerIsLossyPage(item))
- {
- Assert (GinItemPointerGetBlockNumber(&key->curItem) >= GinItemPointerGetBlockNumber(item));
- match = (GinItemPointerGetBlockNumber(&key->curItem) ==
- GinItemPointerGetBlockNumber(item));
- }
- else
- {
- Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
- match = (ginCompareItemPointers(&key->curItem, item) == 0);
- }
+ Assert(ginCompareItemPointers(&key->curItem, item) >= 0);
+ match = (ginCompareItemPointers(&key->curItem, item) == 0);
}
}
} while (!match);
@@ -1322,7 +1424,7 @@ collectMatchesForHeapRow(IndexScanDesc scan, pendingPosition *pos)
{
GinScanKey key = so->keys + i;
- memset(key->entryRes, FALSE, key->nentries);
+ memset(key->entryRes, GIN_FALSE, key->nentries);
}
memset(pos->hasMatchKey, FALSE, so->nkeys);
@@ -1579,7 +1681,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
{
GinScanKey key = so->keys + i;
- if (!callConsistentFn(&so->ginstate, key))
+ if (!key->boolConsistentFn(key))
{
match = false;
break;
diff --git a/src/backend/access/gin/ginlogic.c b/src/backend/access/gin/ginlogic.c
new file mode 100644
index 0000000..e499c6e
--- /dev/null
+++ b/src/backend/access/gin/ginlogic.c
@@ -0,0 +1,136 @@
+/*-------------------------------------------------------------------------
+ *
+ * ginlogic.c
+ * routines for performing binary- and ternary-logic consistent checks.
+ *
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/access/gin/ginlogic.c
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/gin_private.h"
+#include "access/reloptions.h"
+#include "catalog/pg_collation.h"
+#include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/indexfsm.h"
+#include "storage/lmgr.h"
+
+/*
+ * A dummy consistent function for an EVERYTHING key. Just claim it matches.
+ */
+static bool
+trueConsistentFn(GinScanKey key)
+{
+ key->recheckCurItem = false;
+ return true;
+}
+static GinLogicValue
+trueTriConsistentFn(GinScanKey key)
+{
+ return GIN_MAYBE;
+}
+
+/*
+ * A function for calling a regular, binary logic, consistent function.
+ */
+static bool
+normalBoolConsistentFn(GinScanKey key)
+{
+ /*
+ * Initialize recheckCurItem in case the consistentFn doesn't know it
+ * should set it. The safe assumption in that case is to force recheck.
+ */
+ key->recheckCurItem = true;
+
+ return DatumGetBool(FunctionCall8Coll(key->consistentFmgrInfo,
+ key->collation,
+ PointerGetDatum(key->entryRes),
+ UInt16GetDatum(key->strategy),
+ key->query,
+ UInt32GetDatum(key->nuserentries),
+ PointerGetDatum(key->extra_data),
+ PointerGetDatum(&key->recheckCurItem),
+ PointerGetDatum(key->queryValues),
+ PointerGetDatum(key->queryCategories)));
+}
+
+/*
+ * This function implements a tri-state consistency check, using a boolean
+ * consistent function provided by the opclass.
+ *
+ * If there is only one MAYBE input, our strategy is to try the consistentFn
+ * both ways. If it returns TRUE for both, the tuple matches regardless of
+ * the MAYBE input, so we return TRUE. Likewise, if it returns FALSE for both,
+ * we return FALSE. Otherwise return MAYBE.
+ */
+static GinLogicValue
+shimTriConsistentFn(GinScanKey key)
+{
+ bool foundMaybe = false;
+ int maybeEntry = -1;
+ int i;
+ bool boolResult1;
+ bool boolResult2;
+ bool recheck1;
+ bool recheck2;
+
+ for (i = 0; i < key->nentries; i++)
+ {
+ if (key->entryRes[i] == GIN_MAYBE)
+ {
+ if (foundMaybe)
+ return GIN_MAYBE; /* more than one MAYBE input */
+ maybeEntry = i;
+ foundMaybe = true;
+ }
+ }
+
+ /*
+ * If none of the inputs were MAYBE, so we can just call consistent
+ * function as is.
+ */
+ if (!foundMaybe)
+ return normalBoolConsistentFn(key);
+
+ /* Try the consistent function with the maybe-input set both ways */
+ key->entryRes[maybeEntry] = GIN_TRUE;
+ boolResult1 = normalBoolConsistentFn(key);
+ recheck1 = key->recheckCurItem;
+
+ key->entryRes[maybeEntry] = GIN_FALSE;
+ boolResult2 = normalBoolConsistentFn(key);
+ recheck2 = key->recheckCurItem;
+
+ if (!boolResult1 && !boolResult2)
+ return GIN_FALSE;
+
+ key->recheckCurItem = recheck1 || recheck2;
+ if (boolResult1 && boolResult2)
+ return GIN_TRUE;
+ else
+ return GIN_MAYBE;
+}
+
+void
+GinInitConsistentMethod(GinState *ginstate, GinScanKey key)
+{
+ if (key->searchMode == GIN_SEARCH_MODE_EVERYTHING)
+ {
+ key->boolConsistentFn = trueConsistentFn;
+ key->triConsistentFn = trueTriConsistentFn;
+ }
+ else
+ {
+ key->consistentFmgrInfo = &ginstate->consistentFn[key->attnum - 1];
+ key->collation = ginstate->supportCollation[key->attnum - 1];
+ key->boolConsistentFn = normalBoolConsistentFn;
+ key->triConsistentFn = shimTriConsistentFn;
+ }
+}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index bb0ab31..2b733ab 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -13,10 +13,13 @@
#include "access/genam.h"
#include "access/gin.h"
#include "access/itup.h"
+#include "lib/binaryheap.h"
#include "fmgr.h"
#include "storage/bufmgr.h"
#include "utils/rbtree.h"
+typedef struct GinScanKeyData *GinScanKey;
+typedef struct GinScanEntryData *GinScanEntry;
/*
* Page opaque data in an inverted index page.
@@ -588,6 +591,19 @@ extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
GinNullCategory *category);
+/* ginlogic.c */
+
+enum
+{
+ GIN_FALSE = 0,
+ GIN_TRUE = 1,
+ GIN_MAYBE = 2
+} GinLogicValueEnum;
+
+typedef char GinLogicValue;
+
+extern void GinInitConsistentMethod(GinState *ginstate, GinScanKey key);
+
/* gininsert.c */
extern Datum ginbuild(PG_FUNCTION_ARGS);
extern Datum ginbuildempty(PG_FUNCTION_ARGS);
@@ -732,10 +748,6 @@ extern void ginVacuumPostingTreeLeaf(Relation rel, Buffer buf, GinVacuumState *g
* nuserentries is the number that extractQueryFn returned (which is what
* we report to consistentFn). The "user" entries must come first.
*/
-typedef struct GinScanKeyData *GinScanKey;
-
-typedef struct GinScanEntryData *GinScanEntry;
-
typedef struct GinScanKeyData
{
/* Real number of entries in scanEntry[] (always > 0) */
@@ -746,8 +758,19 @@ typedef struct GinScanKeyData
/* array of GinScanEntry pointers, one per extracted search condition */
GinScanEntry *scanEntry;
+ /* A heap containing unfinished entries, with curItem > advancePast */
+ binaryheap *entryHeap;
+
+ /* An array containing unfinished entries with curItem <= advancePast */
+ GinScanEntry *unloadedEntries;
+ int nunloaded;
+
/* array of check flags, reported to consistentFn */
bool *entryRes;
+ bool (*boolConsistentFn) (GinScanKey key);
+ bool (*triConsistentFn) (GinScanKey key);
+ FmgrInfo *consistentFmgrInfo;
+ Oid collation;
/* other data needed for calling consistentFn */
Datum query;
On 2.2.2014 11:45, Heikki Linnakangas wrote:
On 01/30/2014 01:53 AM, Tomas Vondra wrote:
(3) A file with explain plans for 4 queries suffering ~2x slowdown,
and explain plans with 9.4 master and Heikki's patches is available
here:http://www.fuzzy.cz/tmp/gin/queries.txt
All the queries have 6 common words, and the explain plans look
just fine to me - exactly like the plans for other queries.Two things now caught my eye. First some of these queries actually
have words repeated - either exactly like "database & database" or
in negated form like "!anything & anything". Second, while
generating the queries, I use "dumb" frequency, where only exact
matches count. I.e. "write != written" etc. But the actual number
of hits may be much higher - for example "write" matches exactly
just 5% documents, but using @@ it matches more than 20%.I don't know if that's the actual cause though.
Ok, here's another variant of these patches. Compared to git master, it
does three things:1. It adds the concept of ternary consistent function internally, but no
catalog changes. It's implemented by calling the regular boolean
consistent function "both ways".2. Use a binary heap to get the "next" item from the entries in a scan.
I'm pretty sure this makes sense, because arguably it makes the code
more readable, and reduces the number of item pointer comparisons
significantly for queries with a lot of entries.3. Only perform the pre-consistent check to try skipping entries, if we
don't already have the next item from the entry loaded in the array.
This is a tradeoff, you will lose some of the performance gain you might
get from pre-consistent checks, but it also limits the performance loss
you might get from doing useless pre-consistent checks.So taken together, I would expect this patch to make some of the
performance gains less impressive, but also limit the loss we saw with
some of the other patches.Tomas, could you run your test suite with this patch, please?
Sure, will do. Do I get it right that this should be applied instead of
the four patches you've posted earlier?
Tomas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3.2.2014 00:13, Tomas Vondra wrote:
On 2.2.2014 11:45, Heikki Linnakangas wrote:
On 01/30/2014 01:53 AM, Tomas Vondra wrote:
(3) A file with explain plans for 4 queries suffering ~2x slowdown,
and explain plans with 9.4 master and Heikki's patches is available
here:http://www.fuzzy.cz/tmp/gin/queries.txt
All the queries have 6 common words, and the explain plans look
just fine to me - exactly like the plans for other queries.Two things now caught my eye. First some of these queries actually
have words repeated - either exactly like "database & database" or
in negated form like "!anything & anything". Second, while
generating the queries, I use "dumb" frequency, where only exact
matches count. I.e. "write != written" etc. But the actual number
of hits may be much higher - for example "write" matches exactly
just 5% documents, but using @@ it matches more than 20%.I don't know if that's the actual cause though.
Ok, here's another variant of these patches. Compared to git master, it
does three things:1. It adds the concept of ternary consistent function internally, but no
catalog changes. It's implemented by calling the regular boolean
consistent function "both ways".2. Use a binary heap to get the "next" item from the entries in a scan.
I'm pretty sure this makes sense, because arguably it makes the code
more readable, and reduces the number of item pointer comparisons
significantly for queries with a lot of entries.3. Only perform the pre-consistent check to try skipping entries, if we
don't already have the next item from the entry loaded in the array.
This is a tradeoff, you will lose some of the performance gain you might
get from pre-consistent checks, but it also limits the performance loss
you might get from doing useless pre-consistent checks.So taken together, I would expect this patch to make some of the
performance gains less impressive, but also limit the loss we saw with
some of the other patches.Tomas, could you run your test suite with this patch, please?
Sure, will do. Do I get it right that this should be applied instead of
the four patches you've posted earlier?
So, I was curious and did a basic testing - I've repeated the tests on
current HEAD and 'HEAD with the new patch'. The complete data are
available at [http://www.fuzzy.cz/tmp/gin/gin-scan-benchmarks.ods] and
I've updated the charts at [http://www.fuzzy.cz/tmp/gin/] too.
Look for branches named 9.4-head-2 and 9.4-heikki-2.
To me it seems that:
(1) The main issue was that with common words, it used to be much
slower than HEAD (or 9.3). This seems to be fixed, i.e. it's not
slower than before. See
http://www.fuzzy.cz/tmp/gin/3-common-words.png (previous patch)
http://www.fuzzy.cz/tmp/gin/3-common-words-new.png (new patch)
for comparison vs. 9.4 HEAD. With the new patch there's no slowdown,
which seems nice. Compared to 9.3 it looks like this:
http://www.fuzzy.cz/tmp/gin/3-common-words-new-vs-93.png
so there's a significant speedup (thanks to the modified layout).
(2) The question is whether the new patch works fine on rare words. See
this for comparison of the patches against HEAD:
http://www.fuzzy.cz/tmp/gin/3-rare-words.png
http://www.fuzzy.cz/tmp/gin/3-rare-words-new.png
and this is the comparison of the two patches:
http://www.fuzzy.cz/tmp/gin/patches-rare-words.png
That seems fine to me - some queries are slower, but we're talking
about queries taking 1 or 2 ms, so the measurement error is probably
the main cause of the differences.
(3) With higher numbers of frequent words, the differences (vs. HEAD or
the previous patch) are not that dramatic as in (1) - the new patch
is consistently by ~20% faster.
Tomas
Attachments:
3-common-words.pngimage/png; name=3-common-words.pngDownload
�PNG
IHDR 0 ��X� sBIT��O� pHYs � ��+ IDATx���YpYz�sNn�b
�J��}�6{Z�y�G�z�%���i���������+F���H�^��i���gd)d9B
�����3�&���� �Zs=�>$P,�E�E��_�CWfV��B�/����
! @)�R ^]H �d� @� ��A@ %�� J � ($ P2E$�'����-w��O>��O������w�����pW� �PL@���������:���}xh�[����������k��7~���z��� *���s�����?�x����������/�;����?u��?��u������w�� p�3C�����C�P��7n\�r%����+7��x��� ��������^X�������������
��?��T*��� �"�^�����R��)� HJ��?,� ��}q�~/�P}}xa����}^X����k�gH{��w3���W����{/�������R�b���M=�W��3��89�����}����|_�&?�������7���>�h��\�e��/^�v-����k.^x��� ���!�����_�jcc��sg�o������G?x����fB�[�k�� ��)& qC�5� ���{�����o}gbb����w��A���O��i�� ��PL@��~$�[W��u���ve�m�3 ���9$ ��� J � ($ P2H �d� @� ��A@ %�� J � ($ P2H �d� @� ��A@ %�� J � ($ P2H �d� @� ��A@ %�� J � ($ P2H �d� @� ��A@ %�� Jf�I{Ks��{�}��������/���O>���Ivx lr�y<=�w���������?������n���o~����Hooq� ����K�>���������v0�����z��?���E ^1�w~���?s�D�����~����o��o��q�����+W���~#��;< �b�l������������_��Hd�����B�����a���px!��;< <*�x����{�������G?&��h��� ��������������B>�����g��-�
D�����w�}��y�bfH2�����l������M�u���P}�SvxX!Y�������������J���������O����o��>�K�@�/|���I^B�}��\�����)�;;������hcc���/^�v�Zz��k�.\��{� ^1I�����:::b��������g_�����������'��>����}�����K����%� ��]/���������G_��������K��+?��:������������DGG��|����Gs���� ���u@��o������u��[W��n���X�0 x��ecD �]A@ %�� J � ($ P2H �d� @� ��A@ %�� J � ($ P2r� ���[��ss"��^������S��� { @�����N$�����h�ZZ��?��`�� s|<��9��9>^�� TH ��fgw� ��� �J��.
l�� ��y<y����8H �Pw� ��� ����YY0�tv�d<