Collect frequency statistics for arrays

Started by Alexander Korotkovabout 14 years ago49 messages
#1Alexander Korotkov
aekorotkov@gmail.com
4 attachment(s)

Hi!

There is updated version of patch. General list of changes since reviewed
version:
1) Distinct slot is used for length histogram.
2) Standard statistics is collected for arrays.
3) Most common values and most common elements are mapped to distinct
columns of pg_stats view, because both of them are calculated for arrays.
4) Description of lossy counting algorithm was copied from
compute_tsvector_stats with corresponding changes in it.
5) In estimation functions comments about assumtions were added.

Accuracy testing

Following files are attached.
datasets.sql - sql script which generates test datasets
arrayanalyze.php - php script which does accuracy testing
results.sql - dump of table with tests results

As we can see from testing results, estimates seem to be quite accurate in
most part of test cases. When length of constant array exceeds 30, estimate
of "column <@ const" is very inaccurate for arrat_test3 table. It's related
with skipping of length histogram usage because of high CPU usage during
estimate (see array_sel.c:888).

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.6.patch.gzapplication/x-gzip; name=arrayanalyze-0.6.patch.gzDownload
datasets.sqltext/x-sql; charset=US-ASCII; name=datasets.sqlDownload
arrayanalyze.phpapplication/x-httpd-php; name=arrayanalyze.phpDownload
results.sql.gzapplication/x-gzip; name=results.sql.gzDownload
����Nresults.sql��[�m]v�=o��z�b`+���Vb�I��AQ"K�HT���{f��������W$%T�d}��{�9����[k����?����W�������w��w�������~�������/��o���g�'��_���~��_��7��������W�����?�������G�����7���������<��?���������������������o���ws��?��~w����o����~���?������_��o���?�7����������������������_����?�O�)�������������������_��_|������������o����o������������������/~�������W�O��������q<��������?�������������_����w��g�����������_�o~�/���W����o~�/~���9>�o��{���+>�����W�������������O~u�?~�����g��?|��������7������7��7���?�_�����w����������Ws�:<�������?��_��������O���k���������?�������������?�7�������c������~����?��?xLz0�����?��Y|�|�_����~�}���W����7���8g�O���������?�O����?�������������������{���Y�����_}�]����>[�O\y>�k��L��Z?�����|���s9��\i����6��\?r{��}���_>��r�����~�H����>k�l�sn���Q��gX�%�o����?���?o�s$����������o}���|��r��R�q�����_W>�����������������O]��~�������������~�����_����8����8~wM���O/�x�u|��9���?������~��������w��u��������g�x�7���W�)�J���<��7������1������*��
�'|�=��_}����'NO�_o��[���u���5+�
<���l���$�#����Z������dF��9~��p�������z|�u\*����+����Z>�SK���g>"\��/<Z}��N��?��cm��u�l���>��2pA��[y"�w�^���>\��x�>>1���7U�������D_PA�����p[����>����o|����7�*�����q�g%�+�S�<���s�������������t��=�D����#}��x���q��Di�S=���;|�Q���{Px��A�S:~���B�Nco���_���������4���x�g\��>f:����	�x)�o:������9�-���;����N�N�.Po��2�����
�(b6����C�xu����|zM�����&>��/p��Yv}w�p�2�i������~9N������������]?��y/ixzs}��x��O���P���\G��g��x#�����Zy�y������xb�s}�}��C���O`�	������5.��EvA�z������-������y�?o��x���X15���q�Q��3
���q�C�?�$���MrX|��$�����Z�x<O%��D&C����p�2�������'��#���f����z'�w}���w�����(�yX�q��*p���r�F����,{�
�O?�oo�������!��
����o�t8]nE�����k�=>F}�c*�Zv���(C\$GI���xh����Q�x����'����N���P�#�������y���>Mq��?H�;8c�XG8�����xr#���B8#)>�N��19�#����*Y<���P��b�!���x�����:�r���[a��,9����v ����c�����������)����am��_���O���R+�v|���Y��F0��q<��95nq������(�t�\�$�����75y]:_����g�y����f��uo�G���He��u���?�v��������Z#�v���}����.~��6���+e�]
�����>v�%������e<^�}9��A
{�'m|����������h�*%f�U���u��y��xU|�{���^X=�.Fq	x���zK����O��h�G��9`M�B���P+��at>��|�(����z����?����_]
n��(��5�� G�z]�6W6->u��<��y��j|d����L?��Y����G�hq\�����=K���8������l��l�s|w;�#�*�n����4]Z�4���_=���3��NKq	��{�OJ�:��]�@��<���J|D<>#~K����Ke�������z#>AG���x��=���p�9�C|��|G��/_B}����S���@����_�]��W( �Pd4�Q�Q�Q�}<����$�v��E�q�,4T�2��}����p�q�����m��*L�����x����2^_*
��%_a�w9rJ$�B*�o�pF���tU�=�������S|5�2v��q���Q����"F��e��O-9�G�"�Q���uWV�u�g��X�X��������:�=�b���z��f9;��_M�=[������W~���8�%zE�SAz�|F|�y	�9�
�~a7y<�G�����j�F�Z�����qH()���C�e��vvE;{F��������x�Z<3W���'x\���Gd(�yG��yEs|�g���$HQT3$���s<~eY/�	�n�b�t%���G�T��_�X�k�/��I��z����r5���v��t�]��B���������R��/��4�]w��)�Q�qT�p�]j=�qZ�@[?�c+�Y�8��_���a�	��3>��#z����5��[��.�<���{���;#���krY����?�I���������Z�c�����c�X=�`���{���������U�K������������b��E.}{�u1�|��9.
|��7���=��p�mO���)��*n������M��T��vK�C!`$q���_��<*�G�}�Miq���?���	��u\"�9%�o�|���;4�H#�S|������t��|i������";�t�z;_o@������_�����9�����^������$�������v#_~�=j�3Fl����W�>���)�����"���l����Q�y���-��~�G�^���n�s$�}��]j��������Y;r��{^>A=gx�|����}-����m�
�Q�K�X/!�n�/�}k�k]���@�4��r���}�;��V�������>����]Va)n�.]����`�f�v���y\�!��Z�G�O��|���R����8�<o��3t	Vt]�2����"��[��[��Lq
;��
��Q�n#��K�G�|�����}G�u���c@�=�9�W���"�s������
Q�7�������g��~<���<���?��;�48^D���������*q�.X���c�e������Kiy�����e��r�~�}����!{���������Ra�s^��������8�z�g�]���/��_@�k[n�H�`%�L����^�zx�|�PF
�Z����Kn�����������KI����������5�"4���w�<��r���+���~+{�o>�^���?�5���9J�e�,�����K�?o�#�A������h��V���||%���G�So��
u��{�>��5_�%G�L'�~=ra�v������%�Kr�of�|��ybu����z�f,�
�����������^�5U�Dh����
�;�������>�&\����01�����!Ee�=������|fg�sE����[�~�>�� ]�;�xX�,6"��>��P����/������u�gX{��k�eg	��S����.{:�]�M��v�u������������c����Z
��U���m��q��g��@?.����2W?������hl����6�r/7���778�J����k����Y��{�������g�U�7P�]��)"q3��ibGi�/��(�}��%1��V���=������>�kb�T})�c\�K�LS�����4.5T�U0x�Yb�(R>�3�������]�|�����6�G/O\:$�u��n}����nM[� /'$__Tv�;N�o~?��T�B�G������h����C:1�u8>�����.c��&>�~$�����i���]����t�3�
L�a�
)����q� ���������t����r�v)C��x.�k�s�V�#��F��^�F�7�{^�	�/�����K�<^U��F<�x�����e��C�~K�->i*tW)^O�6�>��Gz|����g��^�?��Kxg�C�tv�1����k�;���@Z->�������O���T���q�x����Je��T�=��x���-�u(u{�����>c����\�v�i�eK�9�����K5�.3�����,��$��������z-�*p��]Y.w0�@�������=��_$�v}�)�+���o����J�d{9����N���Run�u���>R"k��(G��,�i��.'���}Y��{�9������_1�>��x$,�j�*=��R�������������Z�_o�����E��T�]_�]�Q�����)OV�]�p}�N��x%�E�U~����]�2j�@w������>��~_��K�o�� �a�<��c;��T����wQFt#�(s�+W��~	.��Uz�WB�d�{���q_��;B��������Q_>s��
f���[�'Ye��o3��we�������z������Gz����{����c>2��5�rr����3:�f:O+�>��Tnp�������R���y4+�����o�>P�g���������!1�����a_x;��������f�!�X�.�tK�����l��9�Yr]�d���x���s�|2��]�.���j05����� .��N�-X��4�>�W���F�/�g����l�U�s�}V<�����%]�2������/�X�^�����1��_�+J~2���{�n�i>w-�x�>k����I�j���M���R�[to��M���OzM�����e��.���+��c@�t���z��N�w\�y{jLst��=5�ZO���!��|R:o����r�����$\���Z�+�y>`��//�����JO���c=����y�g����.8X���~4l6G6��9���q?�}x���G*�(�E�u;
���>�2w���q���;v��������Q���Y����8��%���N����>Hf[����p���m{I�H7N�]i>��>��+?��w���/�+�K�/�����^FQW8��������Z��{:����C1
:�'�������������������i����2������0�T���Q�t�k��v\�����lZ�S�|M�g�^�#���a
���6�+���K��I����<���R�9�]����)x!�"�=�|!�]�=�c�w5W4��O�����?��N��)V]G��]�����:���sE.�S�����>��1�~I��"�M�y+�.�U���X��yE������(o��D����}���'�h�v}����P�?Y�����_{
�O�G{5�W|&���#����=������R���{{J���g~U?�$g����n��.�=��;�����E��=%_�L���L�[�Q������yy.K^����^_����ED<�I~�~~qm\��������SS��f��%�O��?�����n������
cy5"~�:'_�U�����Z���zv\����?6����K^hX�=Y����qM���|(��������.�s����7��w����QV������=�}�H^��@���'H���>`?g�1�0��j
�k?^@{����=�=n�H
�O�}�o��=�]^�����]S����Xk����������[�{�)O���-��u�M?��F��u<�U����#3�%�����������������y3$='���
�/�bF�3����$�����.�g>��w���K}��=�57�������4R�M���<�����+�	J~���������7�<N�#��Z�n�x�\Y=1�������������7������g~��G��������SH:����[�y�?`��h?i��v#�.�n��%��W������x�xE�~��>�rR�`�r����hI��MO�j]2��d���ABr�K��b���[��������k��i"V=[�]�tUV�0�{����^1+���������L��T�rO�2c�Q�7����U�g7�����������v���[��<��"vy��r���@a�O?�����c�|��q��=.}%��\��^�w#�W��*���cry��Z�q������a�-�����FK���^{�H���_I������u\Y����H\���y�&��@vEJn�k
g�i�~i�_2��q��{������� r^G�^���,����z�2��)���z�bf���k��C��O���9����*�?��w ������W�l�'�7u�x�);�	"��Yz����k��^B��~�s�C}������:�����-������+�^����?U�
�/�����=e`�'>�c��?o_�-g�����<����.������h}m�����}��=��s�������E�z��K�+����@H)�}q~O�k�A�9g�O����T���G�On��w���9Q��>
��q~�
/�5E=w��������Z_QY����xU�7��t���|�]���>v%1��$sm���+����\k�������*J���/Av�4[�?�x��C�d�Q�;���9]���<��<w����8z;�}�3�)�>T"���.w��\����{���	�o`�S��b��Q�%=�������b=���r~�jI�����KD%?��o�|���|C�2�;zT{"r<����?������_�E	�W����r��S;�_������.
��n��%�}G����AI_A��J�6ax}?u`W����gb��z���<��h�}Xy�"�<�����Lg������"E���w�����x����|��<��2;���+b��kp����|3+�J�`���s	���q!R=���cXf��������/a���<�g$�I���������Y�q4�N����eJ��)w������\����>%<#
�M������9�6��w|�����������f���>SJ��d�9�������F�o�	w�p��V?7���������W�\^4�G�y$���-�Z��o4}�Hz�/{�g��x�(��v�Z��uL���Q�����F�U<$���g&���}�����Ky�����k~=�{����~1���+�K~F������/Jy�����l|�������l_���4�z`��w��s[�<*�|�����/�6D����_(����|3������ ��a�����}V����t����&_B�s��_�J�xN7��x@�i�q���e@?*��l@|�g<|���w��%�v�����3���w]O|���c�]�������*i��.����p�������;��s��r�}�k%<�������,Y���f/_��+�rW�������1��������E�|�����b�G	��{������\���:�>��|��X
��������D�Y�/8b���d<E�����r�k��}�����l����|�|?�.�
#��h|�9��1��������a<
	Ov����Y�����_�3�\�^���TIX1Q^�w���<��9���/<	��������|����~T���b��
���W~����C�����`���}�z��.Gyn���T�����aI~��o��1�N����x���qz����Q�#������\��;�/��=q��d��G����ai<i�������:y����?di��5^~���M4��/�/�j�}I�y�
<��_��������������}'i{������h����s�
8��R�cL�G�����y�R��"�w���?���������+.?�������A���Ho<��;���{��{v�|���{�=I����������;���|~�g��`�h�M^�u_N���~��|���z�;K��������L�tF�oO��������������/i���p{a���wMY�j����u{���)�/��Y�K���v�v����Bo^���� ����3���\����������=m�����Y�r�w@��=F��m�^:^B^w\G~��#�����o���2�{������C���]������2>j�������+���A~w/�a��C�����A,�nr�cn����B�Q���o\_}�w��]�Z��F���N����c�?s��B�^j��3��>��5��q��_s:�/��wp�o��o��������1�s�� L{%~��/<����\������@����[��m�������SU�{?W�o�����e�����e��q��Y~��+!�oVFBy���|p~����p����k���C�����Rj��Q��������U/���?���
�;e����.����� >��/��~�8��C���c����Y)G)�i�S7Q�����zOuq����S�`��?"8������Z������p�l�"���
��a��[�1h�m����M���8�������`�}�E����xWv�W3���K�0�%^��{/ND����Nx�O�K��g��K���JjmG
6��P����\h�E��������;��O�[��+��!?����|�4��$��U����!���5.R���_v�/{�������K��5@�d�|+�`��nG������.k����|6�>?�F�������������������,�&��*�)jy�J����$5�}��WW���U�x~��j�SJ��%y7�x�E�\��/vo�4�����7�!����m?~h��3��Kv�����0�����1���z�J~jrw���������X�en�<�E����x�h-��F����xw_v���/�E/�_)�L���~{��b��nY��}|g���'�/��_��/�;��1��8�����`��_p�>^l��
�'�w�*��y
��xL���9o�O�vD�5av����/|R���C%O����K��=$8�����U��!m�������X}�L��8�����_)��S��b�R�G���d��o�_R9�;����"K��������h�P�������;Y�7�(��V9��:�k	�K���<�'C��������N~�AG+��Vz��2A|s2�������w�/���)I��'�����!s��G�8_O����&
��/�b�������j�_�����}��}���~�����{�o�����%������g��S�O^�,+}#�.���5��d�} �;m�%7�D����������Y��!���;0��`�����N�}���}�O�x?>>;1?���{�wl�e�����D�yv����y�l���T�.�7=$�/�Q_`g� ������$�9�Q_R�~��|��~���%l�S���L����/~<�������/&Y���MW��j�;�/���L��`L?U����K���������s���NC��v�����;F���Z�p#�1���!�/n)�y������6��@K>�"����Ql4���
�z�)�[�y�_��9_A���O���`:��0�������
2����x�G�=�_N.��3������[<�_P����/^�/��_�p����E��]����������%��X1�[b�_V��������W���}H�(�*����
������0�`'q��c��������7��9T�)�?���:5�C]�gX�y��/����	�V�W���!�g�4��?��`\	?�Q����`��_N@�)"�/�]�lq�3O�?���e���K��?������?1v�:�:?�H?KJ�Gzf�����/����@�� �}`Z����<�S��������������O�E�P������T���O��c��?Y���������2��V��~*��bY�c������"�������b�g�E9_�~��WNX7S��������}���@�LXO]�4u���h\���q|����������������ae���+�s\���8R�������vP��0����_8�����J>Z���Wp��)�?�%��~��1��J�N4�����O��pUgj�����������o����.,�����z����a \�*����4���%}��i/�������>T}U�_��������?���V�^�����A�G���r��Wu<�>=R���k�.L�h	�&M���To�xk��7A����q�����%�����8z�I;N^k�������n������@��_\q&���&�g'h~R��m��Wc�~��E���Y��*��G{xs#Iot�~vZ����yn(�>M+:����2�^�q
��$sEd=��L����`�#?����{\����	�^�[��o����t
�S�{�D)��J��^�����u��v�t��#V=���6��(��t�ep�R>�G?�A?�������
� 5(��W�'[�c=�Z,�D|!a��B����:��[6T��OGRx�O�6����%'{����!x-8�QM�qd6�i���O��<�mu�����>%�+��q��`%��9�C��Ow��G���g��&���m�b��!������y;�g�/?�����]���z������c��u�q����MR�r=����x��C	�>;����
��K�O"w�P����I786��OG�9N��8���3o8*U�*�T?���I�3�QD�=�J�{D�D�����z��!��/�#Z?�/_R����fc��q��c���C�$��1��)���G1�{S�1Z<��5�������K||}��=~Dm�&����D+�}�u�7��L�U:.�����7���&
)3R����V}{��� �<�	i>�p9�v=����Nx�O�������	t�C�������p|�	�tV��
����[�c=E������3�����*�8V���#��������+�������Vw#�5�����P�w��V��C0	���xP��T������R��H:���H9��y�M�����R{�D���q�[~��G����E*�@���|�TS�~����QQ�)�v>|��K�igP`6��x{�rxR����tS�5�l�����x��HU�S��;y�e�X��W� x��8�;�.\S��UP��>*��>�ZE�~��(�Q%g�	pu��
D)�G���V���z��H�Q�<�[��)���d����9B6�Jk��=����uT����]��'����|(�
��iZ�#@��>����|8���8vGM�p��Q!<�����!{����uh����"��P�0�9�-�������[�[������Q8R�n5R!t������`� �LBfb����HC�F�����hq����H���x5� ���
$4X7e����I��������q���CX�GnH�������#���h6n+]��o�o��%�D��D��X�bBKO1�+m�'�������AQ����3/����Va����$��c�z�"��~�����5������&�7*���{��>�����e\S;���"5t4��K��"�$�����V�18�jGu��:���U��	��x��uT���}��3���/mV.HI��,=7���G�:	�����:����S��D_@��+q�������!j	����ev���/�3�F2�#��z
b�P#3�r�����_�zl����\�#J�w*0��2����eT]�'����{[f�%~>~Va���w����a����#���R���*�1��75b���}�K���:���	����0�[6���6q�����C��i�Z����;S����C���S9��J�{�2�GM*[�'&2���Q��"����F�2P/<�EzQ8�������M	�o�c�eWf�B}������D)�j��7�s6����ee�.�>��
�����<~�>~<B�x��=�Ufw�����y\�����R��U��J�4����g�2�e9Z[�~*J>H�����s8
\��2�D�i�q�SW����E}_Y���RZMU-����X�fDd�#d�b������ 9����aO�������L;N�v�������1�x���aI[���
�(��?��e��k���T���|Q���^��#
28���g2��.�R��R���T� �����:#��}�\�R�=�����Q1�=����� ��E�VU��N1��8�,�������8�����7<T�L`u�
>���Y�����?���G{6����H�2�A0�����8,�!hc��������XM��x�O?y9���*�U�<p+c"���������Q���P�q��Kg�������Q��)�E���o;�:,n���8D��+���)�����#���������1�C������5U�u|�9_"&���
#`���P�<x�&��\�U��]�jED��pZsN�������gGY�|�q�������my%����rV�>R���wkV��%�xbm�������O����B�``;
��"���lP!L>��2t�
uj!��u\����P~Q���w��H���������6����\�F	�w�p �SG�A8C��V���}�=��h��Wv�
|��i�PnWm�6����#b������^&����(KRI���`�oE��j�O��Fa����v*����R��������/+�`�a�(����iq�mf��q����	k56[X�������u���!X��E� 
�M���9~h��
E.�qA;JH���b;u��W��ay�r�C.�U�4�}X�)`1\�pi
�A-oA��F	���%�% �h��`8���g�f.��	��o�"�4-+�3�i�r��l��#���(
�'���5tS�:�>�qdt�y���m��@�3���L��������%V��gE�qT{B)�����:ni�>xf�>���y�J�������E�M��e���k�#��U7s���j��dJ"�&h��%
},�M4z �_?�+>����ca����m�u�.s�o*N��_G�I`Qz?��zG#��g�'
�jaC�W]EG���h�3+dL�a������������~�'����n
_s����l,��Db��"��n>���1
�P<����6����g�u�-�>��T����\2�x��{�?	�p$=G�`L�0���A)�J)������c�j@h��_a9j����+|w�e,M�d��.���i�
GU}�����K��Q�r%"cD���8.��!p�������4P��@X����y���T6�?�b��P��/�:����wGN��������e$��<^�dK���^:n�����������Z�^&'��`�
���iH/�������NTE����P+�����d�oo^�J
$���>:�Pj3�����
�6�h*Pst�.~`����W5�\�O�,4�M������}���s9w$W�l�	2}`����q�+G0�s������59a�T���4����Q�h
�&�����M0�`_l��8��p�m!��s �7��)�Tot��#��/S���������lG	��D�eT�{E��������SH
j�aX=���(��S��V���_�	��D�u�ExN�op1����ocdZ��l�8�<�Kp@Zp,Tf2'������s����mL���P/5����K7B?-9���t���*���F&��5��3��i�pSb���FX�h��P�<�B�i[E��YY'V�z1��{�
��~�w>q�:)5E��Nqa����g��� ���4��h�����������9ql�=���=�Gi����%�����m�!����������EH�b��<��3�3����I#��t�f�E������.'�$w���dBb��cma`�8&���*�Q3��/Qq��U��HV[����2B��F��m��@�SF7���a$��$H�ox�b�h�
5��
�a������lMp���-���
�_HW����Y�>�x�9��|9����OT��/���� `���,�q'���x�3u��p���@I`�c���c��,�U���G
�D���O1BP���O�i��"�(�Er�:�ZhP��|����Q�X��V�I7a�@1R�LRk��.���_�D
GmH|4��Y��@Y�)Hd(Qw�-�t���K�����E��S�
1p�����g������]Zp�f�>�T%\&��D�_�G��=�F��<9���Glp�fFb��.r����u�(O!�x����Vb����+�.�!AxQWT'��(q��Y�[�e�����������7������8B��'�]��_�4�*cP�i!9-?vqz+y9�c@\�]#�>xfZz����T��W���\��*�5�N������������	q��Ql|��6�f����e����X�3NQ}�,s�f���C�hA�L(*7������	=		j�>j�����,�����'�9*Bp1o��wW�KC�>kR=�Rg`�=z�-x�(_O�D�G��N��������l��z���x��f�!Qa�}^�|��jPM���RK��
b��s����V��,����wcGG|9�xb����2u��6���"
��H#��snD��JK�g�S�Uzf�!n�&U`�?2�J<tv'>�_14��A�(��/�C�Q�p�
�BPQt�X�����G�����&bKl���UN���l�
%���=V()���	�<��������v3 a,)E�-.�t�%!e�af7�]h0�G��:�y���ImsU��ZL��L�p�=]�x���y�TV���d(�eU�]X�0�����@��\T�L[P5�JR+�a����������P<�V��Zf]88�������dp �[a&
*���z�%�#�v!�a�q����c�<���&�����Rz����
M��D��xR�������
.��}���i+�C�d��r�"����]�����2���� �]�'���+,���i�5�������J�'�H"3�&r@�%_����*5��F������rc���7kP��ycT�D0W�n1Bg�B���"2�f]j'p���"�
�*��9�����f������p�\t4�xh��1���������E�P<��-��W�F�������z��F����9��F��%����&w���kq�a���6�J��2�+�����Hg.�H!���$(���(.������{�L��Z>j4; �-1����G�>�g���8d*a]Qga.*w�8��*�H�jB0���e+�Z�*F'/�N�l�������E���}M�F5Q
�5C��T�_T0��n��b���[7���
w��0���"�����6
�
q�	h�XU5v��hX�M0�+���Q�PL`I0�Fg��(83T"�>�3S��u��_06He@�#(!�lZ��8�[���8�$VTxx3h�S�P���m'��"���#�G� �t�.b0+D�����"� cp ���}��T���w�����{=����S�?}�Y*aI��A�0TG����^&Qu��������N����-Xx��0�`�B�}��,���fZ��
 �p?P�e��8����.cz�Tt�WD~��Do����Gx�����q����C+�D�����7V��8��(����U�Uw�����S��k����
����$Iw��E��hk�d����0|"���S�0�������=���zG}+n.�
��9���o�?'8������T_�ztL���
��0�*����WK&Xv�/��Z�D����%�^�a�*�X��va�g��Dm��F��U�0��R@����%*�/�-H>Y2p�����Q�'�)��^l����'�kZ�����G�y�>e
	�n��:����%bl��Kc6T��~��ts�<�f��V�����O�����vU:WdE�<0~�NM((�h�����
�qE,���g\�3�.3	�]L=���
��4
,c����a�=p#�>'��j�"�T�a��P�UfeY�c�9]�y���;|M�P���nn�f;��t�L8~i�`	����K�����n�+6T�GL�E��I��in��2��+�W��jO:�a��N��0F���2)� ��V5��W�,�	�
��k�8�W�%R6A��-��,���Rz�X�$B�A��JO�lr�#uf?j���a������?����(mq�_���bd�Su��Rv��"�u3���CxB��\NRY���v.��i�l��8`C]30�B����{E	�f>�QH��K0D������ eA����t��]	q\2��v�?z�VuD�;J�z<�7���0[#@���d��H�R:�t�B$�S$`
�t�a��u(�����C�Q��mM�	�Te����L���d���������g��Tv�x�h�J��lI@��#��I�|��������Ufy�-���A�8PnX�cB�3Y�����7:��}��?�0���lB5g���������Rn������i����x���n�P[�7��q�| �i��N.�qu�����"�:^`V����(���c���E�[p�r:�xF[��-�kW7|4�yXHV�j������*�/_@���!xQ��L�H�z�n��+
h���?�vg�$�a{s[h(+���
��W��<�b�����s'�lXC�D�������Lg�H�����%b�8���r$�B��w�]
�w
.|;����U��P!����������=X]3����C@�in���Q�,u�#�\Z�m=����m+?6q����
hT��3VTQ�Op��q�g�cgD7d������)����0��W�5��K��5X�q��q|_DxC������)����h!��(*�>�q'����e~M����D���T� zc��|��L�f���lS�\�����������L��x�-)@��(��	�Le�������X�+�x����3dI�S�v���,����xU�U@���l|U*��A�G��FzRB��m�:�$���zH]�i��{�|�����2.qo<���7j��>sY#��w��@J(�:(���AC��������R�R��4���3��`K
hr �����j�U�9���C���s��s]�� �I���T�D&�@��*/SVD6�.�
�7���M��R5-��|.�[��7��sK���'=2��X{g9����`���!�A����" 	��aQ����K>=�����R�+�S��d[�C���9�b�oK7r�"y5�6�`��^���8(����=���2��c���`U�hW��:���^*K!\.�l ~���11W� �&�]�w5���EG�2���({��?��v��GO��pLQf���Ym�j�[P6�"	�����"�2H���L��5�������Y�j���'��N���(|�BYgWj!�I)���_2o�B�"'��i�1h���IC�cQ�`�+j �6��D6�
-J��1��+F�����+Ie�7� *�I��1g����[�����fP�GG���ae:�z�����^�
Y���8HP���">����)�%wC�,��BdZ^������)�[�
��Upq���2�'�:�@���wu��
������S��P���a���������nV�D�X�
��lY<R H�h[�	qqY�2��k|w�X��$����3�j_�Rt����.(W���V�m���O.�?���,���q�[[j�p��f<��A����
OM�%���_�1��)p�� 2�G)d��u���e/M�%gH�5P�����Q��LUj�,g��[�&������*N$�c@
[��E�h������C�PH���	p\� XO%p�%qS�����(f�
�Kt���8f��
�1���q��"�O��P;���	��W��p@�,n�z�%��`(�>aFk�%~�dO����\w�r���u	@.t[�,�QF�s���-e%��|�s���o��g4}��m�����w�.5�lLy���	��a%���L[-��GJ\�
5-	�W5���?MJ�p4���*���Q�bv2�@4A���3@%EZ`�	�.O(����q
��""�"Nn
�L�6s��g�O�k��Z��0%��W�f�����VA�Y9���::;%!lm�8���Q]5�T�� ����M�8	�f/le#�.���U�Vt kf���#�����iF���{S��f-���X�m�4t)�T=��P@mgA���ns[O�	�C��
Z�������,��X<�YNAY`s�'��"��xe�`�p���
���*xG��%�@~]��,��lK{���S>�����G�y�D�����e� ld�w����i���������Dr�����.�+��6��kEQ��DC�5/�k�(Q���8?�����5dY��KQZ��Q��*t|g���N�{4��4��z��"���r5��X���p�hf��V=�Z��!����-s/�|q���=l�$����As�A�����R^^m
l������9��f�}���s�p���f��+jf�Z[����n�k��x������
$nm�6F�G2�xKq�_s�|V�}�g�Yzy����pv��+�<����@�p+�{W���OW�[4hm����><���t��Q	sK�����E.��cre8���Xlt���vO����jWWPP{|?�����2����/joLN.�6b�jx�������#�m��:_��,�.���������� �P��sjJ&�O�e��(��F��o������B�Ch2��������-yC@[+� �;2Fd��	}M����J:G��R
�H�A�n���;��Vm����>��~d�I�k%���i�q��'u	
�3-7	����P���'"�,O�^-�k�8�j��R,�Ufr�U��U�8�RO�]3�)q�E�	(�T�I����k���C�\�s��5r(F��\bm0.y�
��i��B25l��5�h��>
����
 �)������4'�c���Pv�c�+�'�w��g��M��To�b�C3��8���5$m��#�}�����;�a7���	~�hv��� 6OE\6����'L�q���O��J
}yJ�j���>�"�(�Jv����.�{�d�Q�{�i�����,L��������l������U�x`�s�aA_HY�]e�r���8U�$a,����)�ntM���o��O�����9*24%��L�n6�W{B=87S'I�qU��OZ��<�X7(����G�a)�K�Ce^�CP���W�5��*��	z�*�*�0����P�����H�����	�^�MKz�A���]��w��[J�����]]��@��n	��~���U����K�MT���1|�hZ�4�Az!_bR�$�Q?\��%��Dio�Y���.f�?k��}EhN��s�*�'Jt�Dy]rsX��a�HQM�E��64��@�� !�&������n�b���&B��>
v��v��5F  ��%#�Z���.KM-,5�8�O5L���(���F��R��� ����#@�.�r���$+��AQ�fA
�K�p�V
���l:�0 �dYqVI�P����2Gr7oC�����p�f�
�I�r��
I�Hs!>=!7��,�o���Tl�S�gN�85~�FtC.EX�/��e���G��[f�������rUm����F2�X
�l��:|��+�	B����$�j�^d1����fetf[����
E�A�']������O��)<��<�.�w��i����d��%���.��q�+�W�_]L2�1W��&�/�Q|N������ �j������a�H��#����,RD�@,����[A
@�M����+��b7�0�����Up��-y\�y���s|�k+�;�s�
�d7HUn������S�|���[
f>5W�p�_<%��imAF�or�HvM�`}���:o�?f�>���}x������[�������M]AfQ�/�����ao8�
�5T������Z
��ju�u8�[�5Ts��L������[;������=���&N
32����,JL,y�-��S��}��8%�L.�n�~�U��t�
h&#��,�KQ�i�C�F�b��a�;��/�z���h�a��� 2%a���t�9x��'3��V^Jr�9��WI��v�&���E$���������y�r�\}���f���X���: ���������JC��do|,|�4Y[�^:L�<J��w��+n=��o�����$��������xti���G�r"������8�P��<MZ�#����#g���F��lt�,���<AO���$]��]�V'��	��A(�_��t]�����&�~��M�B����}�T�7t�Y�y�-�1���h��<���s}�8#�N�j@�$[�K���i�K� ��l���a�Y.����5bJd��<	���M)��~��n�����K������4�0��Z�L1������q����*�9������(�m2b�?jH5��oJ�C��Sc��va��
��<����Ek%�E�0��K1_��4{mV���EY<
.|"���a�B|�K�`� ���2 ��K������A���g3
���=���C�w�d?�#^?���q�.�$
�<��6x����z�W�%�]g�]
T�8Iz�GL�V�S9�����J�lN��nl{�S`���V�i�b�^T�p�W��GQ�$�~$�ds��9����/g�D��VO��d�8F����	��������v�������
��,"��$�D���G�`!�
@�����&���p����^
)�$G��x�>Zj/��I�;�v[�`u��i�U����7n�S��)1��u�`���w�Z�S@�E�\Tmw��uA����i? E�����p]N
�({��p��!�b�,�������:2�e�����
K<���D��\l�4�il���
[o���.T
�ST{*��W�m���j�*��B,�h�R�>�I���R(����=��xlKV�vX|�O�����S�k$
�0����-v��D�`R��Tg��d��l�U����y#D�����Q�O���*p�MG�}R��z�#\��Z������*id?��`�D��j�����"�Uu������<&��t�����Re�B������BbP��d)���@��C8�4e�F��uti���V��R����|����+������[%���~��)�i����R�������f3�������._��c����hN�j��b.�9�@\6K�5E>�N���I
:%i�L����/����0�V�R��O�p�\��soF���t�1����\>��5�V�gg_�o�)�����&�ahhkQ�1��T��[�	�g3�x����X�q:zrwV�,��\�2��p�?�Q���������l��s���mLc��"�G�W�=I��C��n�f����Z�'������s{%^�d��mU+��`��Y���m�+�'����bP�&��b�W�K\��IVr�f��}S�A��A_
�\���<N��n��x
��a ��Ng��7*9Z�|��{4���w�N������a)����\!R�a�rO�m�I��{��j^dK#���W��r:�l�t@�}
��d��ZI>@C
V(T��c|����+���g�T2�+��=�s>>�^XiI�n���u\��Q��2MXhn��>{����b��F\A6"����(����`����a�#��}{=�}u����h�]���<���sr�[l���\p~�5�����?�4���|:�P�%�GM����N���j[l�T��pW=^����	���z�����i#���p�������L���$4��C����1���Nf���a��0"�L�����sZ��� /��tn�W:���y��
�5m�(�� ����xkXo�[������%�����t��v[0�6�9z����_�����1�U���X��������J'�	�6H���b3�rn���2�5�����+��20��e/����6<��S����]]��so�c��dsJr@�	
H�z��=2��Mn��p'����rQ-�P��J�� $���)�*gk,w-g�R?j~��i�m��xS<R�\@7�e	S�R�NP���Vha�N�{���T�Q[���Wm2���b������2��n+`>����������ON+37�f��G���,m�A�-�
����\4F$�4�m��:��l	�Ltr�fB���G��1
�>��**y;e�L �j�^��
��J��:<��F�h��b�:]y�B�C#;�8��[��G��[���1������+e�m�4�[��!���Gx�! ��E�L�1�Y������!��9�����iK��X������-GS��@��S!��������68L:E�f��D�OAU
�oZ��G����{&�+O������Ts&��
����z������&��d��^��wX�u]�\�]IL����N�TB��������P
K>j3�9��1���n��)�8�\��Ks���J$e��,�k��{&�t����q�g)Db����CppQ\��������f;�d���L��-��D���t�8�Q��
���)��^ �J����g$����(O?j}aw��MLa���-k��[�u7������Q�0��%V��n�s�v���b�q������UyA�,"A�VB����1�e��4LMF�]�9u�A��gF����*��dM1��^�UL�k��FH�+�Wp�{pBCFM�{2Sj�`��B����v�.:%m��f��v�WW��0Kg��E�6��xM`P�b��)X[BOR���3������\�^�KK��%��Lr�`%�e�L�cw�M������8�=3�1����[8���C�s^XJ������LSaQ��S������*?�mV�M�h�B:\�5��<$Q8�d�=4�<S�
������r���uqk,VX��7��t�|`��cP�"jM���y�Qy���-�c�Rn�`�n]@}Ezp���X��.�@���R��+w�&�Im�dNg5���2�E�>fHY���8����M��}�������x[�/�Q�P>�x����-��6��U������l#�*�3��blu��x7�����0��if2��V��}�zY��H9F��v#�
z;$�+�������DO�@DqQb3���~�'���4��/y:J4��Ky3Tb�.f E��6:I��@�Cc�l����gj!A�l�6GS^h�p���Pj[O��<����i���������=om(Q[���I������`Ros���)IQu������Mtb��vd���������
�����I�����*?�g�8�v	���m�wH�;T���Y�2T����!&P�ffx��f}�����D\O�@���[�JZ^���,��J�u��"�sQW����0r���`�?��G�����&e����
����IJ%B������O�N�7���2�(�h�"�Pz[���b�,K�f���r���<����p�,���e[L����"�&%�K����]��m�w���F|�G��U��K�v/R�%�
xx�����[���/��<��xw����G��HP:]E�f�3��j�O|���1��e�~��(�nY���*�t��c�<P�;��>��Q����h���q�LI��G�����u�H�xM�6t���C]a#��`�T���lg8��7!�f0��p�g�H.urg��	W���b��X�����uF%U=����lO[D��2��#���+x����^��R��>a�Y5;%R�����+yO�����:9/�����V��p1��7MZ�8�A#]���.�[e���a�Z���Ehy���m���y��gS���Gl���b)l��m�z���S�&�-'J,�dB��$�`�:d\t�SVN����sXU��2���`�����EN�

FPD��S1��f3�����=U��rd�bE��x�\��\�����"z)nI�5��@(@�A�L����S
|U�F��&����)�rU��FD9a��?QG=����MB��]�0�%��Q�$�P~�1�2m�7�|�nkxl1������Q��4"�b3��/F����A��[�L�E�����r;�����-��b�LC}���S����,�R���j�c�LU)wE$���5@�E��<�`�4OyY{�6
v�b���3���],�Vh��2�]�
��vd\M��m��V�Z�K�����L�|�%�4��j��������^��g������s����[[�'��1r�KS���j��P4�����K=T��[���H�p*hB>��"e���L��2�6YK����jq���x�H�Q��#������F�������fGB�����u��(v��@�D�G�v���j�"9�e�^i�F�AT����K3�����	�
�����|Bx�qqTA���[t�����}r
��r�����|+��Y�Pq���HS�� o��m�1����	/z	eX.���5�����P�T����R�4/Fw��IL�bB��#H!<���f����9����}U��H�5H����!��E��f4]����L��U%fI��''����X�0�T�`+gSr�%�������]�A3�����e�Dp����i�����-d��������*%����t�` <k�Q�����H�����U����k?������n������vL��"�_W�����]����������v��&�&�.����-�����K�r�
;���7�U�c8Kp���#95x�_�IwJ�i�-������������*���|�������[���j_�[(�d�,�qI���VFca4L���E@��c���G�mxTcZH_x?�qp2�����?�	Be��=���S�)�� �$�&�}���<��>���0f+�oO�6}���I���"BY1/�4�hkc���JM����T�i)JS��SH��7-Z�w�LZ}��`��Ds�S�@��6Q�a	L���9xx�,����J�J��
���E�Q��A4�XBe9 ��J��X*��m���B��j�bG�"����
Ze%���+kW2w� 9@�=
E�A%GaN�K�J���{c�z���������@.����X:����J�Vq����&4a�%�q�U�0��/�
?����AJb	H�CJ�b8��l1:s��D���t���o_p�*�r��L5L%uqhu.���L��&y?8�����{�-��m�M��G������w��+��5)�F��'>��Y�b�e��{Y���+�mI,{D�B�j�	������Q)���o���]e����N��5��V���bC/��m,��Dk�!��b�S6��x0��QJ�9����F/��=@;9�j���o��+����Y���m�P�
��.yap�G�H���o��+2^���X����5:b0
���hj�x(��&Z��e�a��4��}K�Z�n�d����y�4��k�/L0�9�
ohV��Y�#�A��n�KU�}�["y�"�����;���&��<:�����P�X�4�
������*��"�XN{R0\l7��1/q�R� ���M�Y<��]� O	`j�|�xWK��[��`�l�.�P�M^`L�[[V#F��&�s^��n����/�����-4V%y/���	q���}&�,Y@��^M��������j�*��b�x1�`��J<xR��FR�[�V���bXa��v�ae����I"	�Sg#K���hE,������Cvk���a�]������#Ut5f�v.�&��t���t��F+���a��MV����"y[p��x	`7WRj�
���eE���y�An��j,����>k�%Wr.Q����N�j�b:Q� C�~�
~W�������F�Dd�����=������;�V���nS�����.R!�x��#�*S�����i�B��~J�X,Cw������tWj�$4�Y�pX��T�X{�.#l?%~����L��`�������?����������T=V3{�����w<�-�����n 	W�
���[��j=T�m��;��*�N�t%|���S�GT�{m(�e��E)4qk����O��$�f�N@{������������J�-�2��IdM�)�M�x�="���Vda��_���j�J�.y����<�gHm��Y��"`���Ms����&����l����BlP�%R�~��?��Ia�b"�n��8���@Qj��W���:��n�R��e�^�L��e7]�Z�>
o�	�"����vo�"�mLI�V[�)\�a�ja��t���|c�������L�I�[N	 ��&�~-�w�`)�{�j��S�h��B�^�s����v��49a�S���J���W��\=W ��I������h��'H��r��D="�5�=;�l�F�w~R��Q	ftM���H9��&
ZZ.��b���DFgK��I��(�<�l����[��>���dm��������XqP�m����k�%���>�E�
u64&��8����c���HR��|��8�l�%���^#wWSx�uw2-���7w![����)��a�t�������kJny��[Z�	^���T�(4���pT�|��,�t��&�����O�G���6�2\.)�D��1}!��4���_�Z@r��2�C�^�G����7_&�-��LM7_rl�G�{�I���f<�D	$.	Ys^����h��;�y���N��M�^��y6����G�O�p����d6������ ����0�d�����)#�/'w?C~S\���=�]]�<{g��,���0�K����+o�%�
Ex��(�r�r�
#�7M�>���>"f5�g���[��y�r,��U��"�\��nKv��<��D=d6@D�d���vo�c�.7W�N�p���W��1hz�K��0�+���}9=����)����3�7�A�������B�(��t�j��������|�:���a�s�m�������UO7�@���	�@���"���k��y8Fk����[d���@L�����������W	�����C�?�'d�"��%��Q��������X�8�W�F?�I�����vk�$2Kq�M�~6����{w�q��r�Wc�P,@IN�K�z�����d���DMV�����g�T�tNj��xD�;M}�OJ�8/����"����sSP���
�7< ��z���r�q=����j�n��������0���uZ\QmG�o�mn\����$�K�*�.z;�`�0��/r���+�j�"�S*^��Z���p�\���N��M,���eR��p��|4Z���m�"��gH�h>X%�TiBL����q8���i0�|��%]��n���=�d�\��p��t8]����ISTU�����,���sI����6Az�sD��$y�n~�O)�����m�yLn��]�-	���S��\�b�2�&��}:'�"��5�u��~���������1�M���w�005����`]:I�\�#aF5���)��Tg���'�$�U���AI�+�)��^qb����|.{~�3���=�q.��C����:`�7Y�7W�������R^�|j@�I��]R��V\F��lp4�_x��4�mY�I�MZ���lQ��_�
)"����AF�7��`�i�9��"��ubG
+����}#�}q�W��L�i���Kv"v�M|5[�'v4�'O�'6&s����5���EYW���rT��� P��_�{�Z���."Cj���={6:B���@���� 'i���	���(�=��O��0S�8
���������o3��t��V~P��.�����q�� .��C3Ph*�UpA
��w��+�84[ ���"d���w����V���f��"�`4���T?O6�3���gW`(��~�Q��o%��Y��@��Q���2T���*@��������Fq�_�'�8����#�\�I��n�����^[q���P�P�d�?��Yo9�.����p��#A����/��q�n`�8',�w<C-B�pR��pUk�/��W���M5������������/�������,:77\b���6��mt�I��M�{W9O�������2y���n������Z�,���98/�UgDpP�=������s��I�7{�~$��V�����v#����o��'��O�sW?�X��z�������r�%|��A�3��Xo�(f*Oq��x�����qVEHQG�����A�r��R��mtkO�%�1��s
����R��M}��=WM
�A�|��{6b��2\�3y���	��i�5D��)�Z|�����*������c�<lhC|�Cf�7�������(E��1Z�|�}�L������Wo/��hhXC���<�BYY;�I�H1�0 ����1��Ro��3�����m�<m)��������g�9�/��E|���l2u��
����I_-���Z��7��]o��8�MgSQ��j�l��bb�2 �����;3D�����3��{����J5b�Q@��u1������Y���,
d���b��5��{�yS����.�n���+~QRx�c����}�K�ao�)p��5��:$+��;�&@�0L`A�r-)R�44��P\���o��[pb��GY�F2�����m��Kf�&����9���� �bui���}fg��is�j3�._����0�\�~���'�������
�� !I&Vl����0�dA9���^�,&H���X��,�\b����h!��4 j$��~����q�Q������R��Ny}jg�d4 v���0g�����iO�5��� ��������x�8=M��n�WZ�P]�b��c?Y_�i�E0���i��O����\6�x�t��y�6.�*�qn9���
1��F�GI/�Z���L�-���^���:U��N��S�h�W%������M�����N8��*2v�C�� �����o���4����������-�c�A(��ML�2"��Sk{�����Z����[ ��J8|���Y���[�����<.���W����6���y�+iS+�������=]]���y8;�����zUG�2u�o�o5�8���l���I2���Qe��
=P����V �rp��n��_
��I/"G��w2b6/���I�
�is���<r����y���"��el:���}Z4nj�i�`����Fy]����c`o���cx���xZ�����I@:��	�`
�����D\g}Q�Z�l�U��(2d$<<���yuYkd�-3��8j�'����F�k��AOP�����B�<�E���=������&��C�</�*
�l�G*�E	lK�GlM}�0����;M��W	�m
�c�m�pO^%�'?{I�F���}�$�9���b3�$g�l��V��u����[�^N�r�{�l��V�?<��7�-k���3�����u�}[�R�>FA��7 �������h�[���4�e)f��������6�n���
�#����R�g���
C�r9�p��r���T{QzC�<����/M��mawVpTB4��A�#��ix������@��LS��u��a��.6ml��B(�C���6;�5?\�,����`� )y����W�C�@���,7au2�v�i}:���INwF�$"�_P%����h�� ��tdu�_,��jc}��t�����x�\�0b��`��=57C�t
<:����>�d�����L�MA����\���t���.��0�D��Fz���~K���S�����X��U�6,�'���!'��Ke�`cC�~���m#������.�`$LUO�q0x��mg�r���l5w2��}��c�<5���@�W��v�2F��*�y(�sB��rI�����l`I������m�*��
�J:��m$�:{�D��q[�G����
�73:B~4OQs#
�F(+�4H�����/������be\����+jF�� Sb�t|�?[q�^���p����I��*!]�����I)VI%��8?�6����PYN��#Wn��Y�l�EM�t�
�^�+�M.iK��,�"���}
0�j����,���SZ��sl`��6<<����R���@�%�
�A��v#��,cZ'mJ1-s���A,�S������	���<����1�q���\��M����@����`��Q�����p\<��5y�=�v�
����\X�%�Q�H��u�{�C\z�5��xF`��;U�ft5�?09�ao�<�:��m��cV
��)�����E�a[B.X���k���G]�r�,d��Q��3���U��8�#v����n��sl�9�g�]�pf-��M�����2���3�-�:�U��{;�9z��h��M�Ln��I�C��|E�;�Z7��=L���T��@y�� -���!J'VR�PK�j W���m==�|�(^�hr\���b�Q�~<\�P�r?�w���9?��EB��"1�z��'	��J�9��SN������AU����TN
h?����:�aq;���L�9��Zj�+�sF�����=�@�	`WY<iTO�O�<�}R��z�c�o��>��R�y9��k�k������f�����C,�X ��Xr�?�Bc�����yY���JF��^16V�t��R*�^�Z�e����������sS@_Qjc�}	I���~�SOH$��E��fk�%6�0P?!q����a�|��}�N�8�}��D��J7F,-�����&�[n��N�$�5yM����dA��SA�9���m���p�Mr��1���V��*��^m�7����A�h��R����������a������g^(M�3���u�&�b��O��ape\���T�s%0�L���L$���M���x����bx\�����K
X�[��kc�e�&H4�H�t��9����`+��v�*���zV��<�}�e�,��K{���o��+$b���FE���������3������L���#me�b���3[g�l~��	0w��s����k��������
W��$����ms,�r�"�3��'X(���H� 8>}�o�,�����lGR�M"���E�ytL����1c��a��I�|����qe��n�7�,���!���8�(�1�
�DK���x%7�@�]�Ku#�!1�#�1�������n�+����~����X/����<�\�B<�����.E��]��-`E�b�I��������
����BV����w�x�y�����n���J�)�>���-����Ww���tL����i�k1�����ZX
����\�S�!���������T�:�t�c�������>A���������pDy$���M������(zpC��y*����|#���b��s��)����dm���
�^Q=�����+����-�h�Egm�s�9���;g��Z�����='�r1\��W�����>��d���Oi��$��zMB����w���
$�����O0`�����R�!�������A5�m�B�l.j�
���f��n<`B���g8'�F�r4�}������+$y�@1���}R���:RL5��A/�� ��k��jQe����=���
Y�1��}� 5��j{���l���yo+yn�I�T����7Xx^z0��9�����A��gYmS�[q�H�0�C'��N��j�V�A��[����������7]F�eQ,�5hE�r.�1�tU�:�4����{��q`��k��
{2��G�d	�BWC)2R#��t��/�,���q!G�"%��a��^rdi�p�����j{��h���2� �-��c�y����*I��v������V
�n�}Z:��(�7���I����g�����dW��p@�%�-o=5�l�8��I��D��Yt;�{={�3�l������v�����M��\�����5��|�'��A��\G����q�q�9��7���
�%����{��1�'w�;�����u}���i/�N�q�Gl�������l���=��&~����c�*�B~;G���)1
�PI�5�!0����M~��w#F��R���d��|Q�#�$��aI*�����}�@
~���~�A�3E�].:;�!|�-W���
/�Y��\��!-m8�d@xe����e�R���^�Q�-�k�H/�����bA^�#���T�>��@�!x������n���y�V���@D��?#��Qu��rq�mn��M:��j�4������J,F=$����n��J��������/��>'�O��B�6S=}(��	q���<��69Q0d-�����$�0UP!�O��7L����/M|���rr@��3�(��,D������V�?��+l)��8v�vY���=�f�����o(��#s�u��$��pK����kH��
Z����et�T���_����\����`�H���Z�!&V(������q
�|����
���*�����Q�awQ�T�����E��jN�h��>-z��H��G�
|���lW�%�{��_���e���]��������������+IO��X%���w�1S��������7�;�7q!J|C;���| �ZeX�G����p����`5���D�xA����y-t^�n��hg�S�6j��\�3d��Dn�1����4���u�fQ�A��%t��0�b3�J��lA6��N�����);h�����7S��%��.�
���,��������b�+Ztn��a;�����P=`���@���`W��=�MaM�L$��\�N���:����kV7����BO�NC��N�]�
�d_��-�0Y��sk�6�-%c���B4�>��d}V�����YL�����"?��l����n��r8Oh�D3k�h��fh�a�$������y4V=�	�n�Z7�� �u,��R�(F�l�;�v����������c�����%k.�����`ln�O�`�I�r.�H�����K���>-��PyZ%p��E��Q`t����-D�$��
3������" ��k�T��!>/��3��~x�}��E+��t����2�b��:x�q�m�m��l?��k�~�A����I�j
�w�5�U����V!b�
��%Za����MQ�]4b��V���������k�!��� �Z$��U�}V=e�s������	Q�
��3���n�:�T%���juhTs��0}�[�d�t�X��U�������q�^�0D^�;���k���v~}��������5p��A`y"���!=�h����x���>����>u�
j_�Uy���e�NK�?O� ^�y��Ij�#T���-B���lVc�eNC~�r��^HoO����h��)dp���.���cy��;�^�V4����ES8�g��oK@�����r��K=f
8��9d���n��k���=��})�b��Q�`R,��e������37,��;����'[���`��}�<'91�q�����`������7���aO�Z�Q�
�2}�vPJL�<�I�=#	�`�z7~��'��d-���S7H�zc���eK�Ub�#��m��i��0h�L���
q/3����<�����~Z4�������v���L�P���\
g�E��:<N��z,�[��H�����z
mN�����F�#������W�6�lsU��@�r�����$x�N��E���������+�=y�?G������v��~��
^�����Ob����=.��yk�hu>�ft�������~�����zg�� �)	=~���>~t�i�z������/%v��vzv��Y�������Z=����OC|(���$�y/w����� ��\(�����l����z��)�l#}Y����t����DSt4������p�HhU������H�C\�ZH�$����6��sdvY���*�5l���>�p`<�������6-hzs�My���>�/��>���4������3��2N6Q��`b�p�������'��F�xb�@��w�S������b�t����7��_$�m9�a>�@[�N�9�z������Y;�6�J��J.��<�>j3��-�>�E#G~�������#��-�yg���X���G�(�G���?�<X�z�
����8T:J��������	4S���N��#�����������eB�����om�Q����Z��g�V����5i�j[����K�.?�����?�G�R�3�s�2|�g�a~:��a��[m���0��{�	��H!�?��&n#���8��^M,:W�m���P�l�B�*��,y�i2</y���v�zT��<��B�����;v+)�1X1��l�����.����(��6P.�fnrvZ�rK	c�����8��?��B�0:�e�����P�:�N]��;T.}��?��H���Ko�`�	��h0�������)h�*�
�?x�f����2��\��XH��E���hs@��?�G!JN�X��ud��q���[���?�,�i9����y������t�A��k��A�>��}����f!����U.�������7!$!�@�!W)^#Y8j#��R�H�#RL�ZOMe�r���c�#���9B� �%%O�N',��0��'w��G��g������tz����������p�1��'$�P�����N��P�R��1��w��4z^��I� tfG���u'�?t�c?O�M���F��PU��>`���2(a/�MP\��G8�x�D�'�2H����<P������0����s�~������*�a��{��`K��gS�I����VH^�b�������jf�h���N��N�������&��*���}^3�SvU�(@�*a�\>�*�F�a=[N�_t��4u9�A�qW�\Ds����.��.�y��:(	��/X��u�X�p�afT�l��r�h�����s�'��o�F���2���iO��4��}*g]������&�cq:��9?;'���p���q8��B����BM
�6��Hd�t0����x���z��2�!_XRO6�q�TB��BO�}
@ j����:���]-��������"U�i�Ju2�d6���4��m9�q�\
���ji��k{����-��F��+�ODl����"�4�h�<2�����J��#4�o�������s|DC$i`��u�C�'�RrX�)�����-�����s>�M!^�#��I�\�2K�L����#�	��3�����R��O5q��y�s�
�����$��.�Te� A�9��(@��_��B����rKC���&7���\2�{3���;}�9?���F�FsrV����S�?���/M���'�S+�i�fD��
�y�$�/��I��.�s=_��J:I>[$�����z��Al��W�b��EN��]�l~���$�n���U���X�k�9Q��e����$�:��Ru?��&�o��.���1Rj���5�	q_�\eZ�e
�d��9���^q�i4�P���]�0��m����Ej���(O�;n���-���!�^`�������m8�Bz�A�D�hH��/
�tY�=Hs|�>|r
ZM#����7�^�.��0 ���N`���f7�O�e�I4����v��d�*��
H��X/j-.	�e� nx������4J�Z�TU�����'���B��+�#���g
��p4��@�D�t�Zd�t��
��
�r0�T��N���d����C���+��fx��,�o��.����o/��X���s=N��U�l/�kztd�yD������XUy����m�����8RH������h��D���p]*��2�������f�����(O�e�����-q�F�,�Z�	�v������ d��D��tS��s��ob��(���bJ�!��n�P���c�s�;iN����t�Y�Dl�%* <W��*�^G�d��qM?�>�E)+g�����X�v�.���n�:c��1��(�3���#^�VO��Q�4�I�t�����#Z-Y���ev�f�:-Q\"�M����������BM��x��8K8r�Tp#�	�T�S0(ET���������U����S4�c��t����.Z����9����?F}YF���/|v�F����b)�4�������y�������=A���Q�	��X�+9�����K/�P����E.�����������Of|��a���u
��K�x��s��Q�z)��{�=��|>|�Z5����qH� w:Y�PO�K6���2��$9��#
���%�-x2��F�<l����A��Z�����;��e�
:���7x��K��wx�ED)��/�����S%��Y7���5����	w��ICG��()�6�BJn)�
B�Y��I-L��2g�����<��3�{�������@=9D3�nCh7.�V��I�&�
8�j��Ln�	����G�h����^U��;������R��V#
���o���l�!,J����l�*.J����~y�w������k���y:S�0���7�����S�c��g�L�,c�O2CU��U�&��\���6y^�Ai�j�V�I4S�s������>�?��Wn�p@����K�#No&M�v�FG/}�{7&1%�L�����K��5���d�,Z����������F!/!�c���+gQ���"���(�������QH��g�h��i�=
M������A;9�C��%AE�Z��$������A�-�M8�%�	t7�yI�
(\���d\�G��&a[rA��k�AK��AXi�q{+���s'�G;^����������$����>[C@�p�[��R%|f��j�M��x%g�"c+z�2�4���#���b��eFT��,4D�D��2��1�z�u
9��>�u�\�H �������0�Z���P�t�F�IP}���	�[���hbV�M^/B}��S�k��!�����P��1��F�ps+�+@\ p��-&��P��/���������_q���@>��������J2�-wI��J�h����/�i�!�:s���'��������A
v-Q��"��#�y{LH���:8�d�f����&�c!B��j�F-��V:�����a^}���l����7k���#(�O,��z��0q������0x�@�A}��0�c��:�N�^�x���/fX�@���l���}U�xv�)]+�A�����F�:
�!)r,W&s�#*��c�G�i�1]%?�F�r�p��A���j���L��W�6��:`�H���VH\�a��`��b
$B���KK��3H���O����|�i�o������������B��-v����S����g��K���X~����4���{���$������m���T�7�8����\���J����H5��$%�-���b��-�U�����(T8���Lb�������K3�]Kg�Y�2�PEJ����T�|�$B�.�&���]W$��8�|?�q�����
���]63�(Pn�����h���J�bSB���d:�4�$rH�������r�������S���*���&y+����J���B�8��M�'R����� p��yRl��&���F�6P_����HUG�q���p^n>3������2��*��y�"��[�.��;�z����tg
��Cu�!
�6�Ho+0_��|sHp�����`�FX�rPK���7&�r��i��5(/y`Qa)Z�Aj!�*�e@g�h5�Z���'a��X�� l�{{]��
`�����>v��}�vk��n���	Z��o�0����,�S����!����-�l1���:a��k���rUcv'��C�E�o�MZX���8���5�%������M��D�@�%_������\��&G>�$����%�7�Q��>�2����lh��y�3�F<,�>��5�^����d73��8��uzE��ZAj���m�:'
Yl:J9��r(�����3�����1nk�I$NU#-�x�6��A�X��y�)�G����^V����Gk�DnB������	��+D�����F]�@3m���$�&�Y���V�;:�xp�?�������Z���dGe�{X�v������LS��2��y�������o3��
�D�y�.��N�h���y���O-����S�8�B�ru�S�-��D��}P6A�1j�%%*l4a����&t������}-,�v~[�g;����Ub�!����\���,�w����/,���k��OW]ws�Uf�X�q�7xp��0�c�Jd�7'DMq%]���
�f�����=��i�{�E�K��)Pc��J�
�"@�n��V����n�p$1���ad:��\G���|���_������D��'�B����6��9'�*$	N������{ymG@���u�Pz�>�[����a�^'�(�%��#���+��j�����:�WL;�����73����W�SW|	j���(f��:<C%J�L_�����_P��71q�p���j<�;���iq�!�7���J&F�~����3���AT@.E�f��xB��9:#0	.&j�:�����hN��Q$��=�/����t_�p�#��_LM�<<`����v�[���G���Q����fL\���T8���>z&�)��;F����e�\(���}���K�(71"k���i�`��JYB�aQ�a��m�,��JKw4�50�t(g�RwJ�s3��h��t���9,E~����/}�I��vXcRI��v�$�����P��H1M5Y�q����n�cx(>�vF�8k��������9�#���p�BEZ�5F#��O]���V���t	���$-M�k�BwN���������\�����8w�!#���y�����!�v���NE�X
�<������'����
�FH}�L���L[����n���o���yS�����)`�e8E�@���7��8�FJ�mo��,,a������B��$��,��T��a#M�C�]�����J�B0�xnA'���]��5�<~h��?GO���h�(�"��sU�i����"$�<" M�4�\��jA�V��ZL�>
kt��.+aOT:�U�
�*W$�NG�+�����z��]d�ry:�����r~�$5G'��	@����-�Cn���E'�\��Ie�$*�6�B}!�7
e�C#A���LeO���V�-��8/��h����R��zt�������+���������L�T���+�`�S�L0a����&S��	W����Q�8_3��,�X�p��!�n�3l0�=����=�����#�P����"M��*y�3-xv�Ey/*�7b����,����_��_D��W��%���Is9S)<Q�^G�^�x��KG��pBLH�l��yM�����
$�P����N�)C0�)��TV�4[N_�`�=f��0���v��gcg��$�s�tO�����~R=��:QG���V����{����R��L$�dS ~�t��1�3+�*������l��\�J�A�-�b��8�a@v��Z�{
t��0�qkW�,��Qqt�p
��Ze�DE�PQ�V����vZv��%]�
2�N���Vz���;����$�)T�%�=E6{[i���:fx�){��
\��&�x&��	�3�v�D�is-�f���������_��-�� ����b�P=0�.#k��p�L��Lo�i�����7��
�<�TQ|�I�����?��/������f��Vv�;O��I$�i�Ee��)����U���(���9*o���Ev��y���&IF�hR
�yr��A~���]�gW��2R��0��q�����f�������VQ
#�_���.W���9f,d�1W�p'+g3����Hml?�7���u&�s��vvGxV`�"��@s�dS9P�s�B�w�,A)����rF��9�8Z4FC7��V'n!�0��T��OQ}���m�l-H��?��\�T7���������J��8h�MS�s�X���M�#������*�@��2��KC;t
�?t�M�{Z�	����@#G);�����$|

R�������ys�J�jsr����$y�c�K��$>.�A��%4?�L2q�
H�����V-r���>�2�
���B�z�����������M7]|�6�d'DV�Q0������#��Q�1�o/���F���J�_	Que��l��0�(��@�%�����|����yo'x��td�A�-�6�Xywu-aK�Y<s����]���+�C�^��:��!#��j�}��,+��/"��������G�H��1*|l�>�'x�Eb:-8��j�:��L5%�g��^]��Sv�M�l�	�B�P�wq��&���Q�(���$@r+�TR�)�Wp�	��/��9���CcB��d�T���r�]
_��?xV�.�I�.��w��|�g2�^�O	}���p<�lQ����;�$��$����\��z|����������������7����R<[R%=�������uvM{i��w.�zp�$�u*c����1P�#���=�;�QBp@m� 2(��e��\�E���7��n��r����4�+Z7�>+�JH���t�J@���E"��n#�����GZc�L2�{c�|#X��C��J��{\�&��
����1���7C=�!���+x�|��iqi��q��LfdEy���Q�&�1w���?G��?�5d;��d�G���8���Z���Y��������q���-8�����(�U�W�A� �iJ[��,G�<�L/�F�y�x���4���8�5� ���_q>��yc;����/{�t2��Z��
A�ZI��mI�t��FA\Y�X��*���^���M�\��z��=���P�!�X������x�]n�C�2���]��[$qD���&	6{�&U}Irep�� Y2�G��M�tH�I�������b�qv�������q�j����n����E��]P��1����>l*�]\���0�s�~�2AF�R`�E�d�L��W�*f��e� ����D���`~��]����f���q45P�h��w����t��M`9����v��i�w��$���%a�M������4S�����!9w.�#L�"X��q�3W7�HLdr�i�fE�w�gRW���H�������}P�1���X��2t��[R����[���)����Q��/4!��-�*qtFTQ���d�-Lc*����=Z���[f�i;������f�[]�DBl������n�f��L>JEv�U�������k}Q���%����_�,9M�� ��B���
�X�Z�N�/@�B"R��Ot�ms&R�4�^I���
��`)�hj8�;=��xn�m4\$R|�Ky70��%��=�l�6Rh���������>��9t����pL��iU#��%[]
?�1�Rk�:
-<,9M\D�]L\�e�F8	�k]hA5�fs����O�u,����R��B��];��QM0<�����o��T��w�����6����5�fS/�&��H��!7�zuXW�K�I��Hl�M-��I���I��D�+���^.{%a�Q�
���A=:���L/�����&���"(d�G�C���������s0�������J���&w�MO*�@�4p>�&����n��]T�x�B(�TG ��d������w����1�}����������}����-9���k�j��������\�"6K���`�X�I34`���@*>��V�ih���w�G��-�9�F��^��W1���:Y>7��w��y�������/�Y�1������U��1Z����M���KE�]�vb���f�����IP�,S���N���(\U1�K�7by���kZ$Y�������8���T	T�82�9K�\d�lI�:�v��n�O!�'������$�(�e��I'�����*�����3
T.��80�����	h<�n���aA�_�B6��bu4��Ld>-�"`Mw�U5���?H��&
�K�QqS
V����I+�����A�+h����cZk&h};��d��'���#�l$[�m� :�[�,y�b��[�P�����#m�n��z�W���(�i$���4��>5��{\�~��oD�~�A��X����f�?\G��H����r�����u���\��r&�i~�v�|���E�������B,��.���PA�Z��pz�^Gd	gr�5��s�'�d���y���;�����RN��%�K45P���B��"�<o+���(da��6i��(8��C�v����R,Ir��LNv���t-�3z���!�1���%��=�������F�/�����	���J�l�MF&�L��M�����(�$���J7�A�q���x�46*�!�d�S��1�����B�`x��<��@����Q���v�g����?�8�l�����,�s{���b��Jd��"QV�v�USI$��*e��4���+�k�{���`��-�,_Rn����C#.r:��*�k;�=)�b
�r�XlR�QDO�}kY��d�F��	��,��eF�yu����7�s��=�0����,hY{���U����n��{O�aq��N�Je�Ry[b_�z�������]$�(��y�wuc����$6e��N-&-r�%����q~�(��R�x��iZN�
�h��\�@+�:�IWQ�(':��b�m��^j��8��Tl��X/s��S��Yt-�v��-Z�2X1&3X��$�����A
�p�q��E5nb�c�&����~�#���'z�����n�e��m�E�C���Xl��iX�����_
��I�i���r���
X5���}$����+y9�%[7��Eyl=}|Z�[e�D�^I� ���ob���}��1���T�tCR���WI�X��f+a�o��rE6��PZ~\�E��7o�$�"QI�Y���Qj�[��AX���t�+��=����g��=S!e(�n
��/�E2��a�P	��k��;�rs�@IXhp���h��y�2�FLA��h�h�bu�
#����6�x�([3����y�v��t��*�f��0�i�Ln�� h�Va��l�3��C�-�y|�3�����a���f��Gg:��<T	~�0�Y���P|;-*��4�h��7�0���m���1�OE-��J�����#�V����	���MS f�s?�%���)�U����������h�-��z�`�A�<u-����7%��>��2�N�L�|�|B�],�T���-d��bF[Se���oN<�F[�?�����B�9:��"��U@�C
����i�3h�0��r���l.�����D]F;�p��A5l�h!�b��eg���D
b��+m�����@��r����W�v�*�{���2��R�6�
�l�t��J���k���K���"L��DV8MS��L���Iu��f�y��A�0���������?]VTxu��������iH�{�f'$i��M�@�s��^��n�Z������ei����$��'���r��l@�*�s�E�^^���7n��N����B7�����h$��Y�(���������������Ve��G�M.\��>/�x{O�����R����3WzX�	�4&-W����1���u�p�O��)��z<[H��� �Jhyx@�SS��S*�m�V1�i�8�^m_��<� ���������s��6��"&���{���Cc�I�C����.���lddi�6�T`�p��R�/�Dl$��Y�[���\��=j��7a��q<����0i�p��]����Z+D�e-@��(�Q�����	K�]��Z�^�5���P���9�F��<y��4=U����FuM%��s��8����]�nh���vhn����x�4��s��Hr���
O�����E������>��k��f6��S��<�O5
Cj����-Whh�(k6�\�n��
gl?A{�V��pnYd��������C>F�����`"~v��3y�$� 75��\}6�S�4���U���U9�q����YQ�[5�&�E�6Q�y�EYD7����q�}I+���/L����P&�H���p�1i��+0J&Eo`��&�T�NF[_�37���n(b/Z��;)v<��E��P����%���p����Q�B%h�-�)��X.��%�}��@b�TQ0#����7�����T��s\��6�,}�v^�i����|��*r_��f$q5��7�B������c�����EzD�Y��7����<p�0�NLe����5�m�������h,'��u6}�� Ja���s�h����G��*N�=/���h<,�&�C8"=��=�1t�!B�ZMf��A��xo��S�9��z�)�k8��~Wn�C�a$��+n��;;�eO&vQ��!��eQ3�������_��o<.�z�t�JN[a��,�k�*"6�����-���y��h6}f�bR����hx�]��5a�Z%�A����#�`s�����`�KI ��l��u�4&�nE����� yq���~���DL��!t�e9����'}55�,r��EiZo5F�J=o�����"Q�`H�b�����	�1�n���ui��}m��di��4.$���X���<qR��w����gF\�e�-���XB�).�_t������,63OX����f�����p�(0��!g�C�v���7��u�A�D�C"����Z�:���'�c����j���z����PebI�u�y0s���M�A%Y����B(cf5,%�J�9�V�`���4w�~�m�zrL7]%p[�1������0(T����Y��������q����4B�x��Z��d1
�t2DI��k���:��x�+zE {������<���qW���4�$�*V��?�yrp�+=&�
��"�j%a�	���<h�#y+�D;��e��0���v:N�dla�\bc��~����W[? ���	��R�%Gq�oC��EP��c�OD=0�_D���9Q"z�"���kq�&����:�{d
�k���������Q��-���M�#!�n�G���b�z�%����L�F�'�4�#A��C����'Z��]f�X��`�����8��H���O"���b�&s0��M�����F��]RQ�4����~�/B�,�a7l1q��Tr�9Nt�S��*dk���Q5��v�������	q�b����X|8N�����S���U�#|�����Z<����x�I�fO�Xb��0W��4_lw���u��&,(��,�c[�N���\�towjzP�����a��0e�EA��ayk&�������P�i�F�c�`wpE	IX��o.�BW_]Bg���^Vx��������_��������
k����\]�[���4����(j���!��0g��F�iF���\h���>�!�����6>���3;���s��m@:'I���m��i�Bc��.��M�Q�;�(���T�����-#R�-m]1L��?����FEe#��%������n���!�����+�q|P�?�����#��R����2dh/K5Z�!/��md4���6Z\c��
l�o�w��jq��+��S��k��e���%+L�*��^��4q��ik���M���IR���a�3p=����,x��s �$���r8w���)�S������'�v����@��I�t�GW���-FA��7��&����U1�4�TWS��]G��K�8��"�C��}����T�V���L�u;Z{k��c���~��\FD?���8��j����L���0|�hv��s����2=��j[��w�������:����=�b-=���^l�|�g����
�7m���t/%4��uS�7�_��10QO��y�h���/���z`��B��O�1��2^#�&�;9iE�~�Hu�xsTz84�m��`�M��#i�SZq��1%oz��Z*���&#���]yq�i����-���BJ|�P����M�Wzfr�P��*f��d�:tBS��b�Z[i���9]�m��F����$}�3���Y��T�����v$�%V@����9��#���%�2
���Ro���kDx���6��<)Yw��?^�W��P�mO�����;��*�^SR<��IM����1����5�!?t(�(��I�UMS�,u[Y!4D*�0L��S���"a��,�t$���6�3E�zW����
_"W�	#0,���C����M/L����vW�8,+]�P1P���D�,��g*��j$��7����C�s=�
"���W�Q%E~�$�I+Q��sh�	��9nI��� z���X���5���ay�8�]�!O����p�v#W�P<N�6�R�Y��l)M��v�'���~���t��Q]�,�*�qi���@�BgIv���TF������j���x-��U9�
�Y��E�"���3��u���@?&���(�rT02s�d�=u����=��xA�"�m����$'wH��9��/K)���}�*7�G�o(���I�,
�yysR��>��7���
��vZ��q�VIc��iWnV�z�_S�n.#"k
s��)���������JhP��7{�P��eIKO��Y(��������B�X-[L���k���6��9~l7i�����Ey��"H�u0�G������l�����E��J����H��H�[��)o������r��Y`w�5S�6�a����5%��������	>`�"��<�����H�H*
�R��	r�����Prs�|qv��`�LK�GBb��:���ph�p��&2����[�#.)�Ag�e8U�l���+���&
(;I�4���� �%D����b#jB&kw0aA�N��(��Z=I�:����#�����(
A��(����m1��M�~���LU��b@6S���������������9��RHij��kW����>I��`��dK�e��x1����^������	�7<7� R���B�J�����U�bH��;0�����JI�?��$�"bLzWq�22����evF��#����)c��v8�I���#����\�2n2
N@��6NvL��@/��Sh'���~�N�H��
	��t)��6����'p*��!|��R�N�*IS��@��-����$�k�'XwN����Fc�|#m�����
��T�B8����-��,K��d�1ich=�� �O�ZY��	y� �i�_%�Y4�>"u
!?�_NN�����E���k��T/;*���4)9�{dj��$Y�b�T5T"%�����'�G��Q��N�z.Y�9d��9�N�y-�mIN,,X������Hk�v��*;����5-�!��uU��D"X���>�.��������Z�F�����cP�FL���*�scU���g�c��"������c�V1qb��������c���	��a9'WMqC$Q������� I����M9'��D����3b�B����E��i�j��C��G�|��<QD��^(�k;jm��J� [)u\�D��Uz'y���%9[��6Q%g�:����T��@@
�>1
��*-4���)n��3d�/���X���[f"L�u�X+"9d-_=�2��(�Cq&�M�=�-�P1��4Rxs��%-����";��y1mEh��H��^�]�3 /�1��#M�<pB|�]����DtL� #�/�3J��(�WT�Gt����\���$�C4fI�/�N�&�G�o�$a�|���r�.g���()d�RY���P+"�iA>B�$��Y��p�3�^O�X'��OrY���m{�I���v��bdSIv����%�����^�����f�t�u��A��xy��6����S ']iEE_�eC@�(�@N!��@������So����7r���~hD�5�UO��ymxO��JU��:�G9����-
���=��!yg[&�
�M��wMk��M�UcQ��D0���j(N��?����[���Q�F)Q��@�z�����,���aH@�GQV�%�/�+��&����:���"����h]�;1�!�h0�K���=:PF=�ue7Z��N�t_�������~���f�bk/F���L��=�����awG�<��(������s���5�'��:OI�9h
F����Cb�Z����3Fa��,���=�`?����I��(QBE`�C����]a�v�vEh�i�\�����J��!��85�R�K�����J0()����QaW����Z����FQ����\k���rM�E�NN���3��h��k����fZ~���h�]�a�\���?j��������@�8��������U����d��F��b���m�>��u�����qF\��������;Bl;�����������U�NX�N�������������v�f-XVR�k��5`����a���H�E-�� �JL�����S|^n���*�4m8`h��8B!�� �wFIvL{���;s�kYE!�*Vn��`r$�I�lo�#�LkQZ%O�Q�Q#L;Y>$.�l��I��M+��&%3��C�+t	Sw59M�h���6�����!�!u���Z�J�sXF����N�v�g�%q�K�6����2��kr�Y\��q���Yb�j�D]..�{k�'F@���5�&�Q�|������a�{�c1� �nj5���4NQ�D��$�
�mj�X~���`���7�i���Xk��U2��#�s��nd���������pT��L����/4]�������?bB������s�b�u��@�T��v����cL"@��c�
Q�s��m=J���po���X�0��[R��Q
F	���\(=$�v�p����F��t�BJaE@��6c�Dh�$z
�����l9J�����iZ�����@q������
���s��Q�{���13���^������H .�T��P�����{�����)����8�I�h��O����9�]Y"*�����2��D^��p���~�_�B�^Ct�Y��5�k\�^�A�������x�����#��Xtsg;eX'�r�����4�#��8j�Z�/��	eb�1)F��t��/�o/
G�7��6yX���N��J}�2�M
���O���4�X���N�=�(k���/��������&+�-���5�������Uj���#�Q�GA�u������%�)��
���R�����
�������8m;H\$�F�:�yi�S��=�S�����t&(�f������ZM*�B��M4������O���e�m�@i.2�%B�6���S �<;��e�.�I�[��J��$��B�f�,A)���F��v(>
�$�pP���Mh;=2���3Q���/�*�Lt�7�[rUyy3��e�),
��_�?#(5@������	zw��<���Y����iuR�L�]���^[GU������5�RMLlAE	3��,��n6����P�J~�Ly���u� e������ln"I���.��$��['#a�A<Tj��I��H�?�Z_��3#������d������v)�&<y=�L3�]!�����1C��:���#��	N��:2#aZ�Z�d7�osq�p�23nCB
\��n.gW�uv<�<��?��k�#)����������F���y��v���hNC��n�z�4�;y6�H_] -�&��1	aE�K������x����Bn���Wp�
���"%�*���_|"����Z�{8P#���O��N�'�6�~����'��;*�F�ek�t{=���������4-��:lv�9�(��F7�����T-�.�@U��?���"���k��ST�H�����=K8��d7���p��K�D���/1&j��G���E
���aM)i�D��H��U�/h��_�	-���@��>������[��e�������>�� �k�������BN�%E(	J-���!M\���.� �$�~
Sw*7�.K������������T6x�->S"��0�s�\�tp�J�B���	?g~��l�<��j�4�}�`��a��8*�F����	���R3'����z����IP��V��"2��'7I�-�(��TsCw.�#�gI�*[{31�	��'%��beLs�S:6$Zf�������z,�J�@!����]^����@it^*�~��@*u{.;@'yzW��`9O��`�U����'��v�
����@��R9#	�5'���i��^�$��D`�=*�\}�4��9��1��uH��,�\������r�5���[�X�������gV������z�N�!�[e�=&��$J��
%�r�)�T���}�^-o����Qgi�}�u�����]<rA\:�Mh�������L��!�AgX7�)���_�p�����U�Q:
��#�����/�}J&�)��r���B���� /�L��B@0���s�H1�H��\����_4�����9Z,�Da��)@N���IT}W��Z�d��Df���[p�_�'wzGb���;6���2����Km��E\t�Y� C6�a�u�%���� $8�}����+~���Y��+�l�Ep���������F>';"��$�FdI1���kXI([w��������nx)|���%��Z�
��K��1�~I"l�+������x NMI)��� p,���[d4�#��R��u��� !a,K��L��INA��4�5tz�R�����,s�t����[����1����4)���"kW���_��SM���������C�%�;��o?����W�����������-�eB����H���p�u���r~@21�����NuA��4�YN+kXT���
�)	���m������S�S=��ZZ�t�.%�KW�R�WhS���T�	Mhy�I����,r�����L�u���f���!l��������B9n��Q�D*m8�(��`�;��;���O��R����fS��.:����jq&I�	�h%������&��9����sM�H������p�����d�i���q�Tt�����E�:x&
"�c�p��9<�1�!�G��3&8S��k�58��=q%\UN}i�W������!�u�F_���*mI.F������U �SP*R5YA�����%�24+]�+I�h�H�����R�U�e*U�Ty��dSp�����L��`G�
1����#���$�����bu"���kh��*�S������9AT�z��E]B]�!�L8�etz9z�AZ�H��u��o���9l�%�3�;�	����v���J���Y��!�Z����!o,/�"�r�R-Nlv�\m�18�5}'�����i�)�d�sk�:Eo+��+����dre�2���J;=�0�%��t:pljvm<�����{��
�C!�[����GA�wrX�]2�����AS|�H�&��Ats�6�����tr ���`�?~��u\��y#��..�8��$����9��������r3���A���^�I+-��������PA��:!�2N�8���)���#�^���9��31�d���]�{����3NvKQ�B�,�t�1��Z������H
h;r[��?����t{@������5�XL���!���%�����7��#+`������y�?nUn;qY�}�s�`86wT�I��C�6_����v	���e��~S�,���*(�������=d���=��I���F6�nn�C�^$�'��i�f]`�m��	
/����smK���Y��J��>%��0B������!�{
i�XZpG]�n��c	�K�l������-�(#����x�T��|��2��)=
+��j/��| *0]$���	��N��C�
�D��lq.zXKm��~��z'����E��n�hX�
[N5
upzh�UJ����7�"[q��;4�q�Gys����">��W�D�y�\-h9L\/lsY�}��\������`��D)��?���42J8�+�7���7|��3�<�"9ju��]5'��������l$r�uf����T����tw����Z"�pk�6�1� 7#����E�
���E:�s c
]'�M@RdK�,�7�=_�R5�b�K0cw;�q��_dsz�-5$���3�DK�r+�RoGo�\�z �f��k��5-M����Xo��&b�����Ztl���{�#j�/�E�/��9C��qm��!��s�!�i��+<�	-~+�S,D�l�D=`�V98�A�$mat��;�IB$s}��Z?q��|����NWR�^'�}�7�l��%V�1�5�en����6�,^S�	�"��M2f,����	�a>�r:����J����s�R��e
�H�M#�G���'��(�(�X��_�^I�t��2z�5 U+,I�RB~��h��8�C�}���(���}7u���Z�<��mM�md��7�#�)s]		������7�&�YI�NQ�i���eR
���Yx!���=
&��f�<��+rlnyA!���m�����3����;	����:�<J���}b��`P�D�@]l���[�
��(�V�JD�+0Hm�}��{���8m�),!H
5;����s�7M��.^QI%���4��VJ��	m"�U��CR3&SL?���������r���b�=%((�=�Q�hu^h�B�2��8����,�NZ�}J�W��
Yz��G
�����F�M�K����iu�
��-�(��,��e	���82�;]N-}G�Dxd\j��Rz�&t�$�����u�B
D��wL��%CK���m����w�
�����N��3�DG������?A��@o�-�
H`���
g�7�]�9���	DP�JRP��U�]2-����*��o[�B@�Z�T���.�?�gs7�Z���@V!#�6����
p2��N�U���o��2�v��)�	�&i�����i�f{A�}"����1���0Cx����bFQU���v)�CQ28B�� ?M��a9n!��%l�F��zJ�����������7�|A �W1F���������R��Y�8B)&�\�V�R~s�0]�)��8��M{a�-$(\����v�$�"��*?i:�����
�1dt��b����u�`�"�3���]�#4H,�#�����&��]*
%�x�M�U��"�BECUM�)���s4X�4������,>��
��rx:���������{L)�g��r�,j�WF�������:���!{[cNjOsW���2=��n#�N3?R�����R�be[6�P��Ic�F�#6�2H �cN(4��j�y��0��?@�N���hR�D[<�T��������
���h�%�\8�o�yl�JD���E��r�O�q
��#D�0E$"�U��B��lr��@�G��|	�
/�����L�z��p��\��t���1a��\3}�z�;_�K4{Q^I-N�*�{�g��sI��yX�r�+���Q�:��I�w�h��g6��="
��w\����.,�|���Sk������y-.],w���/#.=V�-y5��ms� }��t���Jw�
������������bK,p��fiF� ���+<��ZXG#g�*�]6�H:��^5����]�x����K�&����9!��J_�Nx��������a�Y�V��u��+��ef!�,s����#-���F9/W�`E��k�e"������v������P���3H���(���D�q@d]����[�����$�gW�t�'�l��q{��K�Ap����
��;)w����';-��X7�]���7~U����iM�q�1�F�PfS�9�(Q$�F�6@H�F�$��������S-4���W������"B$��)��29��?������ED�!���OD[g��V>���Rzx�c�l3�s����7��z'�Q�'�G��BR����G>�Y�u]|�����*�T�eL����lY�J�w��J����eP�	X���ilxd��4[cn��g1����x���Q:���N�\�J�yy�p[�
	'2��r�����
�~�L�Oq��1g��+�]�J��1�#c��I*!�����,� �J�n&ay��V+�9-�3' �A�	wa����
�h��Z/q������O�h��A<����t�������`�������f�2��u���4�&���8�� ~���]��io����	!G�`c�\\��_������=s?�H�����j����]�jq\a�@��Eg�R������,��*[���h;���d�����n�����������&O�Y	)H*p�D����x��^���FN
7��lk���&"���!Hha��h�JX���A-g$��co9Z� �}��%��n_�Y�W�2����>�mv��D��-��L�{��M��t�r�b1�JN-l_1���x���+�Z�@���VE.��	#�H�j�uI���[���Y��Z�����
�Z�E��:�#8S@A���F1J��v��Au��!,����2������Y���:�I�P�mo	��K.|]tx�
�u
��!L�3���/g.����+�j�����dg��6���"��L�{iAR��S����p��AK<h� /)4�����Cy���c��t���?�E�}���W8N��7���)z�'&���A_�	���iIw�����e�Y� ��.4��4����i^�
��P������?f5�^�W"d�|��&�8�����MZ�%�I�U������hS*�*p��R��(+R�A�(�������W�?/&v��?�c�@�v>s?��7��
��u��9p�j�!�m�x����dd���0��|���zXC=����<a�~�d��I�!�$\Oy�h[���7���mB�%�~�"LZ���vH8��;z�>SN�����k��!����,H�����U��Y�<��p�J;u6���)���G����	XNs��W����SW���RZ.x���nv�u�Y������K�5)��f�$��y*�����A�Ih�I(�S�'�����fQ4 ��A��d�E�k��	/�0�-6;��o�0r�<~�5a����5�F�=��~{8�?]>y�M���R
��aQ$Z6p\M7��������R_&$��s�z��L3P}B����&�n�bs��w�p���mdehb�J���tY�p�]|K���|���2��~��{"]���m3��#5��t�:^����W�,g��r�8H�\jX�=��A�����
L�qhaw���f`�����>)�(���8�L��~�$D ��Nko���bC�.�"��^���9�����2<)��W�5j*AA�^83���e$}�����Q
o�v72�IBbFVm�F�
�l�_`��`m{���R�b��r�r:_��ujlOC���p}�"�f��|�,�2�-k�^�'�������?�K���H~s��B�wp{	
��l��R�9�nM3�kf�&*Du_������l]t�����!�}�I�]�t���`&���R+�=IEV-c\*1�.
*���m��rK����
A�n���MO���O��*��� M��1�O��M����Pa�n9�% �};D�����{9����%@v	�Hydnqo9�>��G-K6<�����Q�V���3�h-���m�(�uqN�N2W'�lFg�j�)�5/��G`[�4q�e��$p!�b�)�z!Xo@�<�����4R��H������m�_I\�[f8�q��]Vr��1p�w|�*W���fd.e�
yHe(���|��>Sl_�m����0���YY�z����yi/��������h����t��U�GZ���4ynVG���&emI]��i�����5R~����J_�
v��Q�I���X1wF��d���$������T�5x�DWW��F��Io����'D���`D�l�ea2B�Q�R�A�N���n%���;�IL��vw���>�"J�`���a^#�J��������@����7�\w.��C����3��$��*�K���y�q�(Ta)�k���Q����4���u�$z������:�����Z�����M��g,�	#db�[�ol��M[Zc9e�X���$9����P��c��P�Pi^�)��������(y��'�9�$�j����	'����b!�e��aT;�{R��1����'
�}C?$N�)��>q�����������0�m5����H�������7�����AGt����#9��������]��)����`����niX�d����Nm��
��)p��	�Z�=������9�A�hJ�qy��`
/
������S�L�Q)����I�V
O�D���g�Rti�P{gxZ���	3�/�b��I�!��EoO��A�$j���y��X��t�!�JY�v�����.�$�m�6��k9\c|:1
�����~(7$y�A�_M�H�]+p�:B���q'��v��j��E��B�hZ���V�s������y��aU:�Gk����y���&
zbj�����`e���6zZ�o���_�~S�l�A+n|����k<N�>f |4a
�a
�h�E���0+}T�`��&�7o"�NJi���<�N�������xdQ+&����i
��jB$���9�D��ci�����v=5w���M��9�[�6�����Z��L]�kJwN�vy�&��w��N};eo����
����sk�)��ar�hF����-�^{��4��I�d���?N������AwA[���!(w������^��������f��XB�z_���Py�j����9��M%�Lzi�D�fKhh��Y@��(K\
�����}t��'��w��v���LvM�
2V��S�N?�N5��^��q=/X��4��|}S=x��F�I��j�u�_�<�>Pt�6�f����9�J(�WU}{"t+|��8�D�7_��`�^O��&��-.W�t�#%�L��%�G���������V4D����#	4=���C��~��i~/��T_��S!��qn�P��f�,��Rn��,���d'Ef�>��E��3���_�h�'7n�����w���g�'��:�����R4������������fs~cG�*s9�
h`�E����WA�;���*�Tet�$���c����D5%�&�������%n	�����-��9�����,�����5i�R=�ep���h[Z����)��g���,,z������dR�J��T���i�sZ'��.*��{�4�K��N��G:���lO���nmRW�Z��	���=z��7���e���?9=e�g}�Y�:]���8�
���F>9|����4c��K�����"���4��E�3@��'A��HW�����y���\iG�-�@=:B\nV���,.%-j��t(�[�,$1Z��J�)/����P����n��:En�x�H���	'(�Mb4y����q�WE��{Yxf��� /�R���d!�>2kg!`Kq�I2����������\!@v���k�!�>;���B���KRz��+��~�Eb������^�S_=�^�bR5�p0�(���xY&�5���z�����P���!8�p�����z��2��t�������BYO] �P-��zH����K��Le�6FL�"~����t.`m@�;�,�VH�n.���9dX%���������7���.�d�
��M��1�|�����&�8��Z���!���$�a�7J\��
2�����
O@��P��T��������9f��#�kp������0�����QF�i*����s�|��t6�K7�Z�L��#�l��I7.
�S����`d�(�����;^2�O�~E�3.U�������Y'p`���$���;��$������F*s�Q���D����U�]��Z]zu�y�L��`�Xf�j0���j\���;�� {�03�JX�O�f��G���Zx<4#�L�+.�$������&w|U�����A��(������\�%���}m���4�S:z\l������p9��j�e�'����������Y�x�HE��%��sxa���H����W���!A
r�	�-Vm�a��wQ�e������t�F")ZN���2t%�
D�����|i_I�������+q_�-
k"��.8��#�����'���S=t���7���b�=�K�\����c_�K!t����lr�(��Sp1t�=�s�nP�	F��N}�j0�"a�2=�U��������P\��Wj��mq�
?��%�� ���z7����`�0�8cH�����������XB0��R�	\����f��{J�Ew~�V��w��Q�7�"������Y��xD���/Y�a�r���s����0xq�D�,0�yD0zw��� ��zT��(0�wo�I|p���4��Wk�[�b�������I?�Z�T	��K����	���r��I����`�m�O�0�n2��<��Q{c�q����`Ym5WJ��0=�������Lg��@�N�,f}q>�b�Y�J���|&���C����u.����^��c����9�j ~y�w����L��:{dI����y���YN�qZ,������A�1b�N����
���sY�{a4�|����@u��}�}LI�������.b�?������s�0�N�v����������n�S����O$^!K�$k�A��#��p���|=��������� l���h*��D��@��,�jZ/���0���Y=���������j���^w��j�}�\��������7�/0g���
#2Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#1)
1 attachment(s)
Re: Collect frequency statistics for arrays

Rebased with head.

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.7.patch.gzapplication/x-gzip; name=arrayanalyze-0.7.patch.gzDownload
����Narrayanalyze-0.7.patch�<�s�6�?+��q$��-�g��������}��N���@$d�E*�����v	R��i�K{�K"���.���f����{�2���I�������l�����-6\�,��5;���:���FML��(MN�k�ea S�X:l�o/�L����p&$"���p6�Z�~9C�r
Ql@���l��a�c1=�%��x�����	��D0b�����1����i*����������d�:�T��x5CK8�[
E7i��.,H�G�\L�E9o'����0�m3������'	� Y��W���nI��%��l=R��J~p�b���%���~&"/�'�b.@Z������h9,�M��
 ��7K �h	�����>>��]Z�0x�3�'��XD�X���KG��S�����p`J�^�o��E	e�d����99��A��;�Fq2�x��	Dw�����K^���P+T�����N�~����'s}#����Mp�D��W[��L��Ke�J���e��2?�Dq���8b�a�����az���p	<���9�J.H����3�""��� ��L��,$��n�X�(��q����L���O����f�B��c�b�	2�C���o���X�i����*���;�;���2����g�`�������S�����"���dxK`R>sB��@�$^J$���E� �:���L�}�q�����P����
�C��]M���z�1OX3!��M
�P3���M�G�����S�n���0=�>�CYs;G�G�����\Lb��,z� ���YZ��t���oQ�������p�H|P��
"1e�?��-2P^�����SC�!�t.�d���!���+(a�������4Z����.f�0�3`��Tx����A��|��bf;/� S1q�qU����9�j�}�������n���3����~I��m��!D��������;�.�"���,�R��\se���gy����;�Dd�~��/Y�����v����`ubu�`�.��]���k�bLeP���Q*�z�(4�ce��+��B�����-�S�����J,��T�w���V��.�����`9Kh\��<���������\��t���V��.�����C�\W���*��o��0;�a���
	���K�<vj���5�����h~��A�i��Y�y�������0��Qb��@���|���27�H��Z:m������E_���?�[������<�r�0�����-�������^���Z��a}��3+���FX!���{�\�__!��bw\%�9��U.�*��cz{�tNe�U�W�_ewe�UfW����mu�x��-�5����BZ�;�!�/��;���qQ���#���JTe�U�*��T�BN����=�_.w�X��rp`u[���u�@�k��;����-������Y���r���"���Z��������q�f �d����� L~��~���3l�����Hn:���	?b�dx��'���}����i���l�S��u�<�b�M�`Mv�1M�]�f3�k�c���HR2���2L!	y>��PL|����]~�V��irT�����y��)X}Gd����F�2L�F�"<�e�<��a
�FS�V�k���M��R�?�6�M�!1V#��jdrfXj���z���}��R�V��rW����>�^<����z<����Q*>�G8m4sd��G�U����� 
&<T5�J�DQ�:�f!�Q��A��4%k�bJ>���a�?�
C�6k��q�`X�XXgT�j��*������)��I]G��6�����c���Ed&��r��U����e*q����><��F�>��w*��5In�:��4f~OY<R�r���bl?9��q�Am���ND:�}GnW@����(Z��bqv��(,�K�����=�@;�C���Cx'������j��=B?�����'J���
�<x�-��N1�'�S�z�������;J�5��b����N��ClLET�_� Y	.��#�`2NR���4�f��h�����*�Z�_����-�/�S0M��~�g��������
c�7������������s�>P�����`�/�l���9L]���K9�;K���o��
��-$�0Y��4����$�����1������v�����
pj{H����NC������};3	���W%��
�G��}
���3��w����mg�f��h����c�$�,�/�o�f�B�����o�D����S�������(�/e_���
,-CX������"q�����!d+��E��`��������5	������I_}��[,_xaK��c������qRh��.���g����~�9��/�WL����c�$�A�y���0�)~��\��w"v�)"�/<���a\}�W�C�0��������T:F��$�?��*T�/c�B]����v4�NF���3Z��Xg�����C%�M����R����f��
�u�D��.��A
��	�BOv0�Kd��z��*R����$����5X������N���L�q���:���^�{�Sr%��x6��)��Wg���y����.�hm!�R��o����/+�m�,���*�d������c�������@Lb|���o�[i�p�)*ze/(������=��t"t����=�N/�	�E��������\2������Jl���	�R5'�����������]] _�����~F�
��x�~���T�F�"���Q����,�0�9��Q�Y��DA��2���b��&m@X�iH
sx��a��_\8l���S����
�"�_�v������@&���q���:��=7C|��{�����y�\d:�J��d#6D���m*�Q���E��s\I�����q�2`�L���(��4i�m.����rP��^s5Gw:4����'���\����B��xh+y��!
�j�nD�c�����q��hq�z��->�%dC
����%�S-���h3{��4�������Er":J�P��ErVp���0M�c8~%o�p����[0h�=�@��o�x�&����&�������X(x&]��w*�I�ag(��oh�u��a?�d��dJ� ��
���C��Li�	�^�XW�55�Q�2����z��g�\�����Dmv�I/��n����R�{9�92�|�lm N�u���N����GMh��Zd�������;�R��������u
?����Q^�4[�*�H�6{�	���a��M��_�us[������^�����{s�^�\�����y����T���ViC]k��6�8�N��N��#��sX��{>/.(�r���L���L��	>�������~��Ea��8�5���9��nq��k�TeS��Y���I�����F���z��swo��g������A��3�8��.��Pm�l%V*�D?%�ef�j�IJ�Z<M�����^�dZ�2@@W�}�v3t;����/_�j������i��
%Q�-�Bg�-7�8�4�yD��5�
�<4c=V��A�xjl^���_��G������nn����T.�._�s�����7�E����AE��r0�� �`�dj"��#�������U��3�k��xzv��l�#+[��8�R��yR���a2f�(��E����n�Y�Q�����yE��z��u��`^t)��n��uC3/�l���!�����RB�.�xw5k&�+���5X���������YN��|�����D.���T~Wb����JX�3s��y)�6JIz��|F2m������
*|����oX���+i�����TJ�u�4
����	����aj�_�M#�(���'��1>�2���l	�������\f08���r6��!r�,�����Eu��N
�����(�i>�6�T��-��@������oN�G~�dM�ISTucwm��$|/^P�$�:�Q$�3X3�c�n�Bc�2�W@s���Y���w���������R����qC��<O�(���*�j��S�7t���p�������Ru�����t[zV���:���(]�����I"?OiM�d}H�?NaJ%Ng����c�o1)�-_����8�`��/ijS4Q�Y�N<KKu]C�����(�Z��u�K���)� 1�+&:�� �8����c!��l����{~q���E���Dz<z��q��A1�j4�C����xV�q�p sL�e����CX;�d��H�sh_����3Vp�����7�zQu����`�,(���G��<�'�F�I�R%T��.rW�Oh
�c{ldDg�;��"s$�l�����{R�(��e���xN��A�����	����5� ��	��t@����-�~?,�Og$�J���q�1����+.��=@� �)!���Jji�F��.%MX��$!��==��U��4�7\�xm����&f���i/�*K���B%dA��r�J!����,�����	���Jq�lv�\������E���o����u��������6XW
�����-���1���3���o��E�V��KdO6?��[�f��^����.��e?����g���
l++7s�8E�&�q���zj'-��6�WM
��A
���e�
�D	��H3��FBJ��=\�������oT�^�Y]�skw)�����}l�9�����u
Jt������>��TQ3,
����v&���TV������<��LI$��W���RjPx�K^����������E�����vU��_�w��`v;W�)U��MU���HV����D�h�ft&����g���Tq���� �3��&�|vG'9��X�����v���~[������a�QO��p���8����;3<rKz��:���
���y�x;��GF���n�'�X��/��m2M6C�?�a,W����P�9��Ec�5V��	K�� ���Mhd�������������S�/GG%�!`!�h���@{	�(��}3C��D�`��e!!��}dC�,-A�Z���R�<���9*�)�_�|����Ga_��9�Pq����jJ��!�y�T��TR������������1�j��N\,����(��'_%=)-��5���B3�tAI�`�R��^��V����p�G=Z����H��5H�;�.���~��|�����M�-��1�:GqZ.H�c�d,w��)���yja���e`;s�N��N}�4�NF�q.��L����e b@<B7�������H Sm�B�D$w��Vixw�	n�\S��|���h�k�)������"���W�n��(�YS�X���t���#h�YXv]�zpc�j���t��fA����V8h��C
�V����������FC��`����������-���~@�1fS�`V�jP�PQj��,�)
�����H�${�0��9��(��<��lH�� �<��:�_������'L���h��������Iup4���5�
E&�"�f�,yXA��
~���M[` �����������C4�2��F�r%@u�����)Ny�%fY]�Q�)k����R=�*�=�EN�1�6?=�f
=�a-�c����i.wZ��!�����l��y���5���G_T����K�/8�t�.	�U[xyAW���*!0X�V�\�=���^�����	�Y��_��,�L!C/����_�Y��������j&�3�d)�o_U�.%k��3�BG
�|���#��)[����	��ko�Y�7��H��N}-��X6�����r�z�[����_u|��V��fl?P��T7��O���s����E�{s�{��E��9�N���m!�rX- ���p��'����CH\��7T�^�h��L��V��V�f���G���1��:������Lm�3"?Y�wa������z���g`���D�C�t�J�T���b�T�L��0��:�C����_���5�d���+z�g5�I�a�."��"x�w����I:��1���3����UU��#	������BwU���S������h;���h��DZ�Z@i��h���?I�@R�#�N�^2R>r�&���	8��*2�"�:���~�
�<�G��7?��j��D7�06���6MX�!�/�
�{%��`���@+)��:�wuQ�`Lv�,�N'�#0>������o)�g��~��fu2H�����k}k�<�e��W���X����-k>�%^�kh�����vI1���4���
mf(�L�	��8�tD��-��e��"��3}6����4u��x��P74��JX�j>�I"@H�������'.�@>�2N&�\�p�)��'����d�A������<S�cN���!R+��K?�d�$��@K^
�_P��t�WuL�0t�M.�9��U���l*�d��e�������v�z�ks�8�F�b8E�����3m����*B�]��@%��im����%5����c�tW�N�V����:������)3��5��V'����^4�jww���[l��h��t t308*�ep�LQz�����T@w�8�^Xb8�L�R��kl)hM���x��2����6�O���� b��K���Ds-,���fC�,o���N�����,e�pe
������UK�$q�lw�P�����������}hAr'��?�����c|�������(1��Y�C3L?�K��=k}����$�������2G�+�F�pp���\�^�_E:z�G�YHN ���S���j�������\ �D��y�rmM>i���Y��1��[��c� 84�%*�����%���>�[�X����W�����7�������������Wm���zA��~�8��V������)��fH&q�x�R�K����5�m��P�w�0����.���r[�U���T��h���%^�GXC��y����sfz���6�}�1_�d8V�fv&��	P�[��3 E�gb�����.�0���R��<����}f�
��Z���a��&�_	F�����pF2)����fN*� ����b����� f��HP�O��]������C!��nJ^D<�:�{dK�A4�F��l8F�4S"Y�'v"F�s�p,[�4��U
�>��F�$%v:	d'�9R(t���<��vf	H������E��c<��������|�2{ ������d��U"`���A3'��kw	H����>�c���s��b������,}�������lN~�4S�YCe}�-Z�+�L"@�.�~�][���<�����Zp{���J�KR������h�����u���o���W�z���<����X���~ll^|��z�"<��!l���-���A�
�����`G������O�*g��D�x��3���N�A7��	���Oo?6��Ot��h;x�>V�����u���A�\)����GHH-fU������X�>�F�q�d0��WX���a�Of9�����op�6��!]0��+�q��
��B�:��i��7b#3��#���%�(�W����O�|�A���Q��E8����8ORoq*TV����|e��Y������������xh�G���1t�
�s��:���L4Hb����Ye)n�R�9��X"q�%�N��)���gp�x��.u��L�DSHS����."Y�5�J����\T�����P>2�C&n�$ �������2������Q���T��j�2��:_�%�2uC�i���*��,�$E	<a��`���@7�Kl���� `������g�2F��~��'o�?���|�@� �\���r��~�����^w��ao����,�\b��}�axfa�������V�F��>�	z���]�hFw�4X�a���T�����/�a�	i����������A/0�7��"@��a?��6J�)�a�������c3_��h����7��j����l:�r�\��8�Q�+7�A�^f�
�7�}V!>{f�I}�9�	���q?�C����?��y� 3�����.��N4B�_@��C3eT���1��\�:����y4�b�$'�Ac���{���1y���E���SS�~���<����!��r3���j?X�pp!����r�u���e`-���/���G���$�.���r.%x���h�1��p~��tg�A�0���V��� o-e��O�o1�Ch�b���U��~B\Z�@)��#y���u
��e��3�]��m�Z�����>��aS���������#?3��q�c�X3>=��5a�����sf7����	u�#O��naV�'�U�=�	��r_M��-�������E�m~W��~��4�)����B������3O�����Jx����r&g>���=S����b����E��������V����<~�w 5<������y�1�����6�������!;9�������P���y=F6 �2����6�8�\�^���'�0���nPN���-V������O$��(���1��\������.m3�8OXlzj����� h����H�_�=/�1^%�6K<�>}����L���L��;c��q��A�:"�^v��"BJV�T�,�.�d��M��@|�Gq}�`_��7�(���
��O���B���L��>�n/�$+����y9X��(���i>���W�vJ���/�%�+zG��5|�(�"��I�v�����=mL�3�������&�<Bl`n�,O�5(���\\$U�Yr?f��Y�T�G�����a�5����oa5�7X�����\�hG1��� ��e��!�������������8��s�*,�8H��'in�B}�DQ7�I�W�qP��~F���h��&��c���p�i7����st	�3F(k)��z��.���7��tR�N.�JC�P����]�W�owF]����4t�����t��&�=Ts�"���V�����*�X�C��2�O�����eYv����"��R���x��m��"�8
�	����7��;�
����N�e����`��������6�2p�����R�$f��>�nEtt��������-����d��H�����������$]�9s�^��AUe��g��b���?�fY�����j���d��9��� �~!�!\��`3��U��eLR�,V�������i��V�
n�s��������].+���&����t��o,��$��Mk�x���%T��W���JF�A�Iz^%���H��6�a#k.S5����Jnb?��VS�Q��~�����m������yJ�����z�n�[�[{���N{�������UJ��*� ��^_k��7����������'�DH�0�	P���0\"�b
��r+��tGjo9ms% �����&����j�G��S�Ku��k����[���.��8mSB&K�SD��^i����s�!��U2:��--J��a�T~�����
}��u�H:��L�;�����Y9p!�r#f^�������9�U1�cV8Y�I�����5;�r�����W��N���%(E�������!�Qy��fx��ee6�?
�~�RB�O�
)'��U�o�_�=\��u��8��H�$
��i�Gp<���d��g*�p��5m��W��:@�ta=�����
PP9Z�G3�g�����&��������x�a�A�<��O����G�hf=J�S�G 	!�QYq�&�`����������)3F�`���n�v�A����
�7�1��9�>����(����@=�x�<��KZ>��)+�p�������m�I����HwaPC��'C�{���~4����$��6m��e�lX��s�OK7� #�-Af��1�����	L�0����$69)�#��x81�����t$Sl��P3��[�|V�R��K�s�`�����Lg��Q��:���!U0�"f�h�"�ky�����u6>����?����z��6��K]�e,�}�]��>V���'
�w��r6���X�&�p�K���\�r������+��e���7!3�\p6?���|��g��h�����#�@,F�Wxt%6�YI7�)��h�1��=$f��b+e�4�'���6���������'���E�,�Y�q�w�L�"X��(<��������xo�ty�_�U�tQ�K�Rl��(/2N\6�1c��*��xj�{N`Q6�Lz}>v\}����?��x�x����|Q��Rq%������D��'�b��Wqw|�j�V��.p�h���������d�����hN
G��5���������;.t�:J��E�7��E�I@i���\��b�@��rxY�l#�8��JL��ty�����������#4I����
y������}��i�<3��S:c�Q�B���������8����T�\@�3��"�������0e#/��w���"�L�Q���h�C�H�\�3�������D9�C��`ak/����t�^���l�&�\w^����j18Z����	-�!�����A�,'�KcGh=N����a��X2���1��;t���i
��D[QOiP�5���'�
w@X�&���j4��LE�]Yl-�/M�;.��ObU(u6ZZ���Z��wO����wo	�Z:j6�5���^f�3rX5<A�����,yL2�O��YRN@�I�����}�X��JX�'���@V��0J���5Z�h��}I��������d�����������z["HT�j)E�Z
]��H
�i_r�����,���1��H�19Sb`n\� ������A�1y84�J+���	Q@'G���D���D$���ep>d2YT���up�TqW�9?�Ib��� ��t?����s��o��2��H���,�b����H�%-�FXl+��mJZ!��`+��:��Ur��aK:Z�x����P��1
�{���di4��/��J��m/y�$vw �F��V)��-9� �<����Q��	�'_$���S
b7�Z�����??s]�[c���|zF��<V:R =c�|��u7!W���0�EUM����
�|���#�7^T��)������\;%s2�s���jK��_��!�Y��
��2��O^��j������,q�ch�&�����f1Sv[B�r2�������B�Qp��+�}]2.+k��*��w�����?o]����*0s �������]��W�vZ7Y�����Do:U!�,��G�`k?�N���H���H1�����R�B����F��C|<3�
H���C.��S��+�:�#4@��u��A��3��1DL��~�A���H�[����U��z]�P���������HK-4�/-����?�t~0Kf�*C����_���������i/�~�������������bcm����U<e<>6b��gA��l�����3�7$X����,a9�5��������]Z�#,�����	:o�>�'��'3KhvI���PE3��8N�DBbG�5��
�o��Z��hE��	3\�Ut����1�����:�P��dd4��U��rL��FTQ�l��g�G����K�:�a
�p�Y��/J��e�0�!G���}�(H�sl��Z�h��=��i��sm����6�dp�(�)��8��_����}z�9��&
�q�y,�f����8��[sj�U�*�*,H=�ZE�m^��,������^�jzR�Sm��2�"g���>�#\[�e����K��T����Q�9�A9h�o���r��T$������8��Z�?7�1���nrA����0�{��������{����������������� �6t�@�����
�B���e�
�}xUbN�G�/���g$$:�$Zd36,�5����	�j����Y��6��f�9�>��X��w]z:��!�
���gZ��d�;��F��?���ow�w����J;������!�(C9�U=G��&��Y#�����d,?X+���2OE�S�LAF2��.u��.#�<w�QL���\AE����KQ7�_���<H�._���%&I��G�d,���H���+��+C�;.������G�T��������m�����CPzJ�71��A^[��Y/f�:{�����;6�\�RZ~�T^���/���BZ��m��t�QW�Z�^_��}�����"�����drw)����|��s[7CI!�[��&���-!s3�Pf������y:���/���e"JdonQ�b�%�@���Gv�R�K[�B����L2���Kji�I�`"�N��:{Te��;xw�N�jF�=n��f��+������}��m2�A@���h���q�m|��4��[oq����#g?�g���M�������|V��
h��>������/��f��3PfC�e�"�8z�F����s�7���3���K���Y��&���x_g���o����P�9�W���(�;�)b�I��G@<:�}�)���Q��1�;�1��G'1\@X�Ex�i��0�G�s$���'��<$^����muQ�k�!�B�
������.���G�+�&)�G�U&&J�d8���
���d�Td�~c�����~����h;R��Tihe
����k������X
������8GZ�B��w&�"r\I�(������",�4��4�lb��\�
����M��������$��m�)VU�,����\Dpa��W�Z����Q4�>���:����E��lQ\�p�u����t����-jO��3������q�����:_�}�f��-�
��fO�b�����.��UG	w~A� �0�Q������yG7+Q����5�L�������w����;��8��Tz��*��X
�<#�M���m�<�o|,�z���$��.������{���y������
3c��T��r���`����>D��T�r��A�W���hds?Y-�����
��	���,p���v
�2�������D���t���bb[�OrE�Jj����W�+(�'�>�_\Q��������lE
���p�@��aH����Kt�{��+#pY�P��EWu���8�:�����E���e��7�jm,����n7����+��r�0��a��h����7����A�2��G�j!=-{3�N'�H�f�s�s�"�+HB�`�����d�8Z���Y�����.0�[�����I6��Y��0�nD�VU���!��k�8�n��:�~�<�u��U��V��F��s����h@1��4�L�����fb8�p*�Hi�<���Y����Z����#�O	�m;�����Z�UC;<�F~-�,O��GyH��v�W����'+�(����ee3���KZ�/mk��Dd�.����=O&6������t��=�����/&"J�vh\�3uk� RgG:J=�.lh;���loC"fF��$��q&6�E�.g�G�L.������)��"`^:xf��!�����y�i���b4L������2u^����Ss}�9x��>��e���,���q�Hf����V��cvJd��~W�Xc�0��<�x������������s�N��)L+�\ �ynmk��"��?i����DA���Y�Z�Sni�9��A���b�n�}�#������� �<��V9j��IY���������C-����mUQ��,���nu/Cdi����F�p���@�R�U�UA��A���>|�I�D��T�����
Li�%����
P|������B;W&8b]'����+��1)o���p�c\�Zj�A
�@��(��2�XB��`�9���B&���b�d��oB�,�WzLS��$sCps]�����.L�]$NY���B�����,GE����L%�3C���1�<~�a���[�Tl*����~�j����rP+��W��<#])�Y3����C�d�����m�d�=P��`���Z���v��?G|)���;���R�L��}�P��_�E��Da��L�/[����xd��o�k]���k�
����s�CG��C����j�,(_�,���'-N7���� �$�I�R��N�{��Aa�)-N���W���t�����{�<.+���Q>d���o��=q~��`h�C����Ngar|�>��?�?�(��������`M��d�A�r��H��
c�=|���@5�8�G	�*C��y<���3B���,7�Kf4e�vi��v?�����N�c�Bl\k�P���y�%�<��!r�:<<=����Ig�����8[U��F��8&^=@?I�;'����e3�F�~b�{es�Et��Q@E�$�Ps��+�Y��/����T�T_D-�P�R_�h��>P.l�avo��:��bh��VC,&���8��?����^�O"��������2�)�1����>X��^0�M�\V�5R�k{�0�D6c1�e[��n��++���'e���7f��I���R��ba?�S�a��z]rg���e���V������i�]��jK�m::���J�|�6�l�nR��:�[N?��>�-�B"��Y�h�9���<T�r2���%�?�V��c��d����hX6�M�=:��n�������u����:�z�������x+;h�>/�l{Yo�'Cmk�GAk���E�F<"x/�9P>�*�^��A7(�������Y�YZ�Z��!����1�^C������w���?��$W�o<�,�}�����T�F'���*��9�������z�).������3�����:�1<'����o���y�+�W]��8A�[uX��e-��:Y�������9,�.B���?^sTl�=SV����B�oTG�����h#�I�>j\�0*Q���<�.:�uO���:�au�Q?��E����������;��������gX.�"���}��R���(�G��r��1ae��v� Z������"������� l����8!���b#@�
d3�r��hvKw����0�F���~G�	q��e��&��JAp�/��t���f��Ujx)���*��f���2sv�M�r�g"�auM
38M�v&���9�y�6$s���h'�W�	a�iC��������m��*��6�*
8�p�nM����R�*Q�Lf�nt29
��&�D�|�_��x�j�8�db���X�9�{�������Q��TG�&-�'�g������9��=��+��v�E�X����p%��sH��w�����9�+���(����*Z�H��i�Y���k�����VF�V�;�Hqr}���2�;�lA�2����Y-�Fk'h�ECm�����N���O�����R�>$Iz��P��y
]}J��R���
��G�����������)�G�i�xQ��"<	(H�]#��B8�\�k6������8#'�����+��	��*����o|t5�+.�xYr_?w#Gt!��
R;D�����(��/��fs�d�?�qP��"aN��Z���2�����s�f'�Z���!��{{��sv������m�q�v�������R&M=�1!�����Y�'.�c�8M�qb��@�1�n�_���z*g;�Vt;���-�'�9����G����'��qb�2�^au$��R��q�"�w�<}��-�;���0�K���+^~x�$����R+F�#�?Kxk����P�~����7�l��j�L�_n9��o�4���S�8v�J3���4�-1�����&V0��V�V)],��y,_z�������q\�[� ���G��S�)&V��)��_9���f�[F'�
Y��7Ht����d��l��<���3���?Uj��z'v%c�&�����:R��7�MU��S�	��J�\f��
Ct���Z����b�
k2���N������I�b��<��t��mE��B-�h+�4(�M�ia2�dOS��_)j^T?��-9a2"JL��4�E�'���0�%�jr�qLIK��.�S`g@���n������	��
Q1*}��)��eO�R�+����/I?-DP�c�L�Y���$#���Rt�(�D_��p`y����On��$����4����w���
�+�q��At�,O(��K��6J=���E��U��������H��]8�I&�XG�'��d,�B�����������`H^&�'�s�Q�F?����1Nk�rK�25S^#�~	
�N�Rh�d��dN���c/�jFr��*�����&fg��:]�&���������{��_��xHS������4QQ������H"KG9�<�~E�[�*/����NS���TS���ZS�j��Vm�s�-��o���B��U��n����i<x,���+2*c�WL��"����x&�`���"0b����&H@9�IW�P�MWU�44ZQU:Yl]+��o�4���b���
e��kA�q��kF�P���G�VMsO�Q��Q�~�J��:��J���Zf��YI���Jcl��W�9����/<���v����;^y��ui�v�+�|��i�=�E7�x������H���%���~����
haI5[��3���jVT_���X�7��\gM�<�Cb����hQ����(I�u������r5� �F��Q���Y2A�M|n�6��!es��GJ5���q$�e8<��2i?�Fd� 9��
7'rU���9A����I���A�O������:U�k��'���{_����R�������?H�I��hTC?yw\���0�g��<`�������'��������o��A���/���������v�}�H7��?x��[U��|$j9�s��m(	5�a�x�E��-�P���c��fX�P�����D0	�����*���_�K��8�:��=�@���[��y��6���'���q��r����>�?�h@8����IB@�*4(�q��\����aZd�HF�T�VX
)��TXCOQ���
IEk��\^fB���UI�}��6 �X
�X���*���?z��w'A/�UZM���4Xm5��+x���=c�B_X1T�����au��I����A������}y�+�e�����V������K}�E��?'8Z?����Q�A��S /��W�X/�gm�J<�
M��*dW���]������"��Wd��YV����JV�V���&�a	.�V7@�����A�"k��"h�A'u�Q;���D1�E?�����<g����_������j��(@s~,rm��|�`�p��Vkm������'
*F}z�+��n��B]D�����9�hY ��I��_��0-�@�"(~���P4�PD��`�6�%���Nv��X����P����K*/�d}q98����8��/2������z[����po0>�^6�Ul��L�Ac��k��'4
�P������� *���b�!zA�����s����}���_�~�lA����b����}��Q~6�8�����t��t2�r����<�M�]���Q__-�i��_�9/Oxz�&o�� 3�f+h�
��/&���
C���uo�O����z���^���@��Z�����M���,P�������Q��D�7��7�G2#f~^����n^*#����	��z���3&��tf���<�<��\p2�����##���s*����,�i{�'���5�<���������x�=��m����l�)]!)����2��7�
������Tx�90rw�`a���:�p��E�c��:�mX���WU���z���/���?��>x��D�����nT(��=UZ��G��G���=Z��������<he,g���������w���������h'����n�n���	8�)�ewg��Q]<���TV�:���b���Zo��[�fs
:/�U���s�i}l~�,~�\�j�^=.z�*������_J��p`���b��H�*v�!*��W��|����J%*��� �|���
>��t]�b��?L�w�������}�wp|+�FC%O�����kv����E�W�q���(��eW6x�L
��o�L�`���h���I�J��
~'������z�=]R3��l��k!;h/4���-��*+���v0�=2f0S�������v��{�������+�)�=�L�6�*���<��+��y�l6�B6���5g7���������f���fB������M[ ���t��\���J��>�)/uc���Jk����fn��T�$8�������Wm/�����t��>�B���:����[`��JN�&>k>�n�����pzy�f������U����������	�������f�����MD�y�����i���4s��c�������������m�"<�������J�ndw�
}��C$�xI�ld�3��+��Z�^�]������ ��3�@iS�T+�������?bt����y��S�a��%x�Y���y'/X���O6 �\�
����wEGj2x�-�����	��I�E���4VW�jE��6i����?������b&�'��8�
C4p<���bx	`M�z�s�v����MI��l��[&��E4`�i!��t�	X�y�=����l�Z���ne�U[��9���Y�*�!��X+�z��E��@����[P�*��X0���~#`���K���%�(���K��`9����M�:���u��7�����?����c>���<7�
~�I���
sC��12��r#���v�t���;�u�K�;�����F\��{�+��Z\�So�
�[b_��P{��>m�#dL��it:{��[�8���o�a8>��[08���eD���[[&�n���0�@L��Yk�v<����������2�[��y�?4l�%���8�����[+������{0�%��
��i��H�l����nz��>M�P�[�6��|��eB����x4`���4�!�WL{�����]��LX�;��I���j��)���w����v�w�
���6-�
�_D�������|�����38��B�������;���Ew�w���?�2����;�����WQ5��jg�����}Qw���K���_Mug��[�V��R%�N�;��r��YU\w���x���[vW�~o}������c��\���ew���_yvWW�=����{P����_�vsd�Q��z%���.]�������+��8�B%(0^qR�����5�('��gXY��=��!����ct�Mv~3y�v(,�zI�D�R�dI�E���i0�%�q�p|�LN����`�Z�����C��BJz����'�IW����I'1��MO�;���S?���?��qw����d^���W���axQ^R�br���������w���S��9&b�G�<G����q�p���;�|��	����������z�y��A��qg����b<�v)�`r���M�zP9�4 �R�A�Pf	y1������1f�?��^��j�Q4��O�y$;�F�w\����	d��/X���q-o�h�����q�9�3�u��H���G�����BT��=������o��F9���yTDKrM�5<���wqg�F_�D{W��j��~�x�`��u���d
�Z<�1��|Z� ���0# ��������(=�Yr�s&1n���&� mS�#
��"6G��8{T�^��c�Gd����	��G@8���Do�����`�t4�����
'��4�0�o,����o�z9\�Y�If6-t1n5�+-q1>����>�k���b�?6��O��r�u,��������'�����=�Rm��qU���������v{1l#����G�/�o�����;�;p���y���������;���i��a
D�)�=����r-n�g��# 9�8��������-��O"�_����2��*�UO��F�C?�<B�������OW�O?1*�������ox��AJ�2G���z��F������.X�x����A���i�N����<�<������ZA��7���wP
#3Nathan Boley
npboley@gmail.com
In reply to: Alexander Korotkov (#2)
Re: Collect frequency statistics for arrays

Rebased with head.

FYI, I've added myself as the reviewer for the current commitfest.

Best,
Nathan Boley

#4Alexander Korotkov
aekorotkov@gmail.com
In reply to: Nathan Boley (#3)
Re: Collect frequency statistics for arrays

Hi!

On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley <npboley@gmail.com> wrote:

FYI, I've added myself as the reviewer for the current commitfest.

How is going review now?

------
With best regards,
Alexander Korotkov.

#5Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#4)
Re: Collect frequency statistics for arrays

On Tue, Dec 20, 2011 at 04:37:37PM +0400, Alexander Korotkov wrote:

On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley <npboley@gmail.com> wrote:

FYI, I've added myself as the reviewer for the current commitfest.

How is going review now?

I will examine this patch within the week.

#6Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#2)
1 attachment(s)
Re: Collect frequency statistics for arrays

On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote:

Rebased with head.

I took a look at this patch. The basic approach seems sensible. For array
columns, ANALYZE will now determine the elements most often present in the
column's arrays. It will also calculate a histogram from the counts of
distinct elements in each array. That is to say, both newly-collected
statistics treat {1,1,2,3,1} and {3,1,2} identically. New selectivity
machinery uses the new statistics to optimize these operations:

column @> const
column <@ const
column && const
const = ANY (column)
const = ALL (column)

Concrete estimates look mostly sane, with a few corner cases I note below.

We have many implementation details to nail down.

With the patch applied, I get the attached regression.diffs from "make check".
The palloc errors indicate bugs, but rules.out just needs a refresh.

During ANALYZE, we'll now detoast all array column values regardless of size,
just as we already do for tsvector columns. That may be reasonable enough for
tsvector table/index columns, whose very presence is a hint that the user has
planned to use the value for searching. Since arrays make no such
implication, should we skip large arrays to constrain TOAST I/O? Say, skip
arrays larger than 100 KiB or 1 MiB?

I find distressing the thought of having two copies of the lossy sampling
code, each implementing the algorithm with different variable names and levels
of generality. We might someday extend this to hstore, and then we'd have yet
another copy. Tom commented[1]http://archives.postgresql.org/message-id/12406.1298055475@sss.pgh.pa.us that ts_typanalyze() and array_typanalyze()
should remain distinct, and I agree. However, they could call a shared
counting module. Is that practical? Possible API:

typedef struct LossyCountCtl;
LossyCountCtl *LossyCountStart(float s,
float epsilon,
int2 typlen,
bool typbyval,
Oid eqfunc); /* + hash func, a few others */
void LossyCountAdd(LossyCountCtl *ctl, Datum elem);
TrackItem **LossyCountGetAll(LossyCountCtl *ctl);

[1]: http://archives.postgresql.org/message-id/12406.1298055475@sss.pgh.pa.us

*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 8253,8260 ****
<entry>
A list of the most common values in the column. (Null if
no values seem to be more common than any others.)
!        For some data types such as <type>tsvector</>, this is a list of
!        the most common element values rather than values of the type itself.
</entry>
</row>
--- 8253,8261 ----
<entry>
A list of the most common values in the column. (Null if
no values seem to be more common than any others.)
!        For some data types such as <type>arrays</> and <type>tsvector</>,
!        this is a list of the most common element values rather than values of
!        the type itself.
</entry>
</row>

***************
*** 8266,8274 ****
A list of the frequencies of the most common values or elements,
i.e., number of occurrences of each divided by total number of rows.
(Null when <structfield>most_common_vals</structfield> is.)
! For some data types such as <type>tsvector</>, it can also store some
! additional information, making it longer than the
! <structfield>most_common_vals</> array.
</entry>
</row>

--- 8267,8275 ----
A list of the frequencies of the most common values or elements,
i.e., number of occurrences of each divided by total number of rows.
(Null when <structfield>most_common_vals</structfield> is.)
!        For some data types such as <type>arrays</> and <type>tsvector</>,
!        it can also store some additional information, making it longer than
!        the <structfield>most_common_vals</> array.
</entry>
</row>

We're falsifying the above by splitting out that case into new columns
most_common_elems and most_common_elem_freqs.

***************
*** 8284,8289 ****
--- 8285,8291 ----
does not have a <literal>&lt;</> operator or if the
<structfield>most_common_vals</> list accounts for the entire
population.)
+        For <type>arrays</>, it holds histogram bounds of array lengths. 
</entry>
</row>

Likewise: that's now in the new column length_histogram_bounds.

We need documentation for the new pg_stats columns. Also, in particular,
let's document the special entries at the end of most_common_freqs.

*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 117,145 **** CREATE VIEW pg_stats AS
stawidth AS avg_width,
stadistinct AS n_distinct,
CASE
!             WHEN stakind1 IN (1, 4) THEN stavalues1
!             WHEN stakind2 IN (1, 4) THEN stavalues2
!             WHEN stakind3 IN (1, 4) THEN stavalues3
!             WHEN stakind4 IN (1, 4) THEN stavalues4
END AS most_common_vals,
CASE
!             WHEN stakind1 IN (1, 4) THEN stanumbers1
!             WHEN stakind2 IN (1, 4) THEN stanumbers2
!             WHEN stakind3 IN (1, 4) THEN stanumbers3
!             WHEN stakind4 IN (1, 4) THEN stanumbers4
END AS most_common_freqs,
CASE
WHEN stakind1 = 2 THEN stavalues1
WHEN stakind2 = 2 THEN stavalues2
WHEN stakind3 = 2 THEN stavalues3
WHEN stakind4 = 2 THEN stavalues4
END AS histogram_bounds,
CASE
WHEN stakind1 = 3 THEN stanumbers1[1]
WHEN stakind2 = 3 THEN stanumbers2[1]
WHEN stakind3 = 3 THEN stanumbers3[1]
WHEN stakind4 = 3 THEN stanumbers4[1]
!         END AS correlation
FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
--- 117,170 ----
stawidth AS avg_width,
stadistinct AS n_distinct,
CASE
!             WHEN stakind1 = 1 THEN stavalues1
!             WHEN stakind2 = 1 THEN stavalues2
!             WHEN stakind3 = 1 THEN stavalues3
!             WHEN stakind4 = 1 THEN stavalues4
!             WHEN stakind5 = 1 THEN stavalues5
END AS most_common_vals,
CASE
!             WHEN stakind1 = 1 THEN stanumbers1
!             WHEN stakind2 = 1 THEN stanumbers2
!             WHEN stakind3 = 1 THEN stanumbers3
!             WHEN stakind4 = 1 THEN stanumbers4
!             WHEN stakind5 = 1 THEN stanumbers5
END AS most_common_freqs,
CASE
WHEN stakind1 = 2 THEN stavalues1
WHEN stakind2 = 2 THEN stavalues2
WHEN stakind3 = 2 THEN stavalues3
WHEN stakind4 = 2 THEN stavalues4
+             WHEN stakind5 = 2 THEN stavalues5
END AS histogram_bounds,
CASE
WHEN stakind1 = 3 THEN stanumbers1[1]
WHEN stakind2 = 3 THEN stanumbers2[1]
WHEN stakind3 = 3 THEN stanumbers3[1]
WHEN stakind4 = 3 THEN stanumbers4[1]
!             WHEN stakind5 = 3 THEN stanumbers5[1]
!         END AS correlation,
!         CASE
!             WHEN stakind1 = 4 THEN stavalues1
!             WHEN stakind2 = 4 THEN stavalues2
!             WHEN stakind3 = 4 THEN stavalues3
!             WHEN stakind4 = 4 THEN stavalues4
!             WHEN stakind5 = 4 THEN stavalues5
!         END AS most_common_elems,
!         CASE
!             WHEN stakind1 = 4 THEN stanumbers1
!             WHEN stakind2 = 4 THEN stanumbers2
!             WHEN stakind3 = 4 THEN stanumbers3
!             WHEN stakind4 = 4 THEN stanumbers4
!             WHEN stakind5 = 4 THEN stanumbers5
!         END AS most_common_elem_freqs,

I think this is an improvement, but some code out there may rely on the
ability to get stakind = 4 data from the most_common_vals column. We'll need
to mention this in the release notes as an incompatibility.

! CASE
! WHEN stakind1 = 5 THEN stavalues1
! WHEN stakind2 = 5 THEN stavalues2
! WHEN stakind3 = 5 THEN stavalues3
! WHEN stakind4 = 5 THEN stavalues4
! WHEN stakind5 = 5 THEN stavalues5
! END AS length_histogram_bounds
FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)

*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
***************
*** 610,616 **** DefineType(List *names, List *parameters)
F_ARRAY_SEND,	/* send procedure */
typmodinOid,		/* typmodin procedure */
typmodoutOid,	/* typmodout procedure */
! 			   InvalidOid,		/* analyze procedure - default */
typoid,			/* element type ID */
true,			/* yes this is an array type */
InvalidOid,		/* no further array type */
--- 610,616 ----
F_ARRAY_SEND,	/* send procedure */
typmodinOid,		/* typmodin procedure */
typmodoutOid,	/* typmodout procedure */
! 			   ArrayTypanalyzeOid,		/* special analyze procedure for arrays */
typoid,			/* element type ID */
true,			/* yes this is an array type */
InvalidOid,		/* no further array type */

The recently-added function DefineRange() needs the same change.

*** /dev/null
--- b/src/backend/utils/adt/array_sel.c

"array_selfuncs.c" would better match our informal convention.

***************
*** 0 ****
--- 1,948 ----
+ /*-------------------------------------------------------------------------
+  *
+  * array_sel.c
+  *	  Functions for selectivity estimation of array operators.
+  *
+  * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/tsearch/array_sel.c

Fix file location.

+  *
+  *-------------------------------------------------------------------------
+  */
+ 
+ #include "postgres.h"
+ #include "access/hash.h"
+ #include "catalog/pg_operator.h"
+ #include "commands/vacuum.h"
+ #include "utils/builtins.h"
+ #include "utils/typcache.h"
+ #include "utils/array.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_collation.h"
+ #include "commands/defrem.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"

Sort the includes alphabetically, except for postgres.h coming first.

+ 
+ /* Default selectivity constant */
+ #define DEFAULT_CONT_SEL 0.005
+ 
+ /* Macro for selectivity estimation to be used if we have no statistics */
+ #define array_selec_no_stats(array,nitems,op) \
+ 	mcelem_array_selec(array, nitems, typentry, NULL, 0, NULL, 0, NULL, 0, op)
+ 
+ Datum		arraysel(PG_FUNCTION_ARGS);

extern prototypes go in header files, even when it's not strictly necessary.

+ 
+ static Selectivity calc_arraysel(VariableStatData *vardata, Datum constval,
+ 			  Oid operator);
+ static Selectivity mcelem_array_selec(ArrayType *array, int nitems, TypeCacheEntry *typentry,
+ 				   Datum *mcelem, int nmcelem, float4 *numbers, int nnumbers,
+ 				   Datum *hist, int nhist, Oid operator);
+ static int	element_compare(const void *key1, const void *key2);
+ bool		find_next_mcelem(Datum *mcelem, int nmcelem, Datum value, int *index);
+ static Selectivity mcelem_array_contain_overlap_selec(Datum *mcelem,
+   int nmcelem, float4 *numbers, Datum *array_data, int nitems, Oid operator);
+ static float calc_hist(Datum *hist, int nhist, float *hist_part, int n);
+ static Selectivity mcelem_array_contained_selec(Datum *mcelem, int nmcelem,
+ 	  float4 *numbers, Datum *array_data, int nitems, Datum *hist, int nhist,
+ 							 Oid operator);
+ static float *calc_distr(float *p, int n, int m, float rest);
+ 
+ /* Compare function of element data type */
+ static FunctionCallInfoData cmpfunc;
+ 
+ /*
+  * selectivity for "const = ANY(column)" and "const = ALL(column)"
+  */
+ Selectivity
+ calc_scalararraysel(VariableStatData *vardata, Datum constval, bool orClause)

You have scalararraysel() calling this function for any operator (consider
"const < ANY(column)"), but it only handles a single operator: the "=" of the
default btree opclass used at ANALYZE time. We could start by just returning
a constant selectivity for other operators, but we may be able to do better.
If the actual operator is the equality operator we used at ANALYZE time
(staop), use binary search on the mcelem array. Otherwise, use linear search,
applying the operator to each MCE. (If the benefits justify the trouble, you
could also use this strategy to support types with no default btree opclass.)

+ {
+ 	Oid			elemtype;
+ 	Selectivity selec;
+ 	TypeCacheEntry *typentry;
+ 	Datum	   *hist;
+ 	int			nhist;
+ 
+ 	elemtype = get_base_element_type(vardata->vartype);
+ 
+ 	/* Get default comparison function */
+ 	typentry = lookup_type_cache(elemtype,
+ 							  TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO);
+ 	InitFunctionCallInfoData(cmpfunc, &typentry->cmp_proc_finfo, 2,
+ 							 DEFAULT_COLLATION_OID, NULL, NULL);

The default btree operator class that existed at ANALYZE time may no longer
exist. If we don't find a cmp function here, punt to avoid a crash.

+ /*
+  * arraysel -- restriction selectivity for "column @> const", "column && const"
+  * and "column <@ const"
+  */
+ Datum
+ arraysel(PG_FUNCTION_ARGS)
+ {
+ 	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ 
+ 	Oid			operator = PG_GETARG_OID(1);
+ 	List	   *args = (List *) PG_GETARG_POINTER(2);
+ 	int			varRelid = PG_GETARG_INT32(3);
+ 	VariableStatData vardata;
+ 	Node	   *other;
+ 	bool		varonleft;
+ 	Selectivity selec;
+ 	Oid			element_typeid;
+ 
+ 	/*
+ 	 * If expression is not variable = something or something = variable, then
+ 	 * punt and return a default estimate.
+ 	 */

The operator is never "=".

+ 	if (!get_restriction_variable(root, args, varRelid,
+ 								  &vardata, &other, &varonleft))
+ 		PG_RETURN_FLOAT8(DEFAULT_CONT_SEL);
+ 
+ 	/*
+ 	 * Can't do anything useful if the something is not a constant, either.
+ 	 */
+ 	if (!IsA(other, Const))
+ 	{
+ 		ReleaseVariableStats(vardata);
+ 		PG_RETURN_FLOAT8(DEFAULT_CONT_SEL);
+ 	}

Previously, we defaulted to 0.005 for operator "&&" (via areasel()) and 0.001
for operators "@>" and "<@" (via contsel()). Surely some difference between
those cases remains appropriate.

+ 		if (res == 0)
+ 		{
+ 			*index = i;
+ 			return true;
+ 		}
+ 		else if (res < 0)
+ 		{
+ 			l = i + 1;
+ 		}
+ 		else
+ 		{
+ 			r = i - 1;
+ 		}

Throughout this patch, omit braces around single statements.

+ 	}
+ 	*index = l;
+ 	return false;
+ }
+ 
+ /*
+  * Array selectivity estimation based on most common elements statistics.
+  */
+ static Selectivity
+ mcelem_array_selec(ArrayType *array, int nitems, TypeCacheEntry *typentry,
+ 	  Datum *mcelem, int nmcelem, float4 *numbers, int nnumbers, Datum *hist,
+ 				   int nhist, Oid operator)
+ {
+ 	/* "column @> const" and "column && const" cases */
+ 	if (operator == OID_ARRAY_CONTAIN_OP || operator == OID_ARRAY_OVERLAP_OP)
+ 		return mcelem_array_contain_overlap_selec(mcelem, nmcelem, numbers,
+ 									   array_data, nonnull_nitems, operator);
+ 
+ 	/* "column <@ const" case */
+ 	if (operator == OID_ARRAY_CONTAINED_OP)
+ 		return mcelem_array_contained_selec(mcelem, nmcelem, numbers,
+ 						  array_data, nonnull_nitems, hist, nhist, operator);
+ 	return DEFAULT_CONT_SEL;

Returning a fixed selectivity when this gets attached to an unexpected
operator seems counterproductive. Let's just elog(ERROR) in that case.

+ /*
+  * Array selectivity estimation based on most common elements statistics for
+  * "column @> const" and "column && const" cases. This estimation assumes
+  * element occurences to be independent.
+  */
+ static Selectivity
+ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,
+ 				float4 *numbers, Datum *array_data, int nitems, Oid operator)

We could take some advantage of the unique element count histogram for "@>".
Any column value with fewer distinct elements than the constant array cannot
match. We perhaps can't use both that fact and element frequency stats at the
same time, but we could use the lesser of the two probabilities.

For operator "&&" or operator "@>" with a nonempty constant array, no rows
having empty arrays will match. Excepting the special case of "@>" with an
empty constant array, we can multiply the MCE-derived selectivity by the
fraction, based on the histogram, of nonempty arrays in the column. (For
"<@", rows having empty arrays always match.)

I don't think it's mandatory that the initial commit implement the above, but
it's something to mention in the comments as a future direction.

+ /*
+  * Let be n independent events with probabilities p. This function calculates
+  * probabilities of exact k of events occurence for k in [0;m].
+  * Imagine matrix M of (n + 1) x (m + 1) size. Element M[i,j] denotes
+  * probability that exact j of first i events occurs. Obviously M[0,0] = 1.
+  * Each next event increase total number of occured events if it occurs and
+  * leave last value of that number if it doesn't occur. So, by the law of
+  * total probability: M[i,j] = M[i - 1, j] * (1 - p[i]) + M[i - 1, j - 1] * p[i]
+  * for i > 0, j > 0. M[i,0] = M[i - 1, 0] * (1 - p[i]) for i > 0.
+  * Also there could be some events with low probabilities. Their summary
+  * probability passed in the rest parameter.
+  */
+ static float *
+ calc_distr(float *p, int n, int m, float rest)
+ {
+ 	/* Take care about events with low probabilities. */
+ 	if (rest > 0.0f)
+ 	{
+ 		/*
+ 		 * The probability of no occurence of events which forms "rest"
+ 		 * probability have a limit of exp(-rest) when number of events fo to
+ 		 * infinity. Another simplification is to replace that events with one
+ 		 * event with (1 - exp(-rest)) probability.
+ 		 */
+ 		rest = 1.0f - exp(-rest);

What is the name of the underlying concept in probability theory?

+ /*
+  * Array selectivity estimation based on most common elements statistics for
+  * "column <@ const" case. Assumption that element occurences are independent
+  * means certain distribution of array lengths. Typically real distribution
+  * of lengths is significantly different from it. For example, if even we
+  * have set of arrays with 1 integer element in range [0;10] each, element
+  * occurences are not independent. Because in the case of independence we

Do you refer to a column where '{1,12,46}' and '{13,7}' may appear, but
'{6,19,4}' cannot appear?

+  * have probabilities of length of 0, 1, 2 etc. In the "column @> const"
+  * and "column && const" cases we usually have "const" with low summary
+  * frequency of elements (otherwise we have selectivity close to 0 or 1
+  * correspondingly). That's why effect of dependence related to lengths
+  * distribution id negligible there. In the "column <@ const" case summary
+  * frequency of elements is high (otherwise we have selectivity close to 0).

What does the term "summary frequency" denote?

+  * That's why we should do correction due to array lengths distribution.
+  */
+ static Selectivity
+ mcelem_array_contained_selec(Datum *mcelem, int nmcelem, float4 *numbers,
+ 		 Datum *array_data, int nitems, Datum *hist, int nhist, Oid operator)

Break up the parameter list to avoid pgindent reverse-indenting the line.

When I tried a constant array with duplicate elements, I saw an inappropriate
report of 100% selectivity:

[local] test=# create table t1 as select array[n % 2, n] as arr from generate_series(1,100000) t(n);
SELECT 100000
[local] test=# analyze t1;
WARNING: problem in alloc set Analyze: detected write past chunk end in block 0x7f189cbbf010, chunk 0x7f189cc1d0d8
ANALYZE
[local] test=# explain select * from t1 where arr <@ '{0,45}';
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..1986.00 rows=186 width=29)
Filter: (arr <@ '{0,45}'::integer[])
(2 rows)

[local] test=# explain select * from t1 where arr <@ '{0,45,45}';
QUERY PLAN
-----------------------------------------------------------
Seq Scan on t1 (cost=0.00..1986.00 rows=100000 width=29)
Filter: (arr <@ '{0,45,45}'::integer[])
(2 rows)

By contrast, the estimate in the non-duplicate case looks sane considering
these estimates for the individual elements:

[local] test=# explain select * from t1 where 0 = any (arr);
QUERY PLAN
----------------------------------------------------------
Seq Scan on t1 (cost=0.00..2986.00 rows=49967 width=29)
Filter: (0 = ANY (arr))
(2 rows)

[local] test=# explain select * from t1 where 45 = any (arr);
QUERY PLAN
--------------------------------------------------------
Seq Scan on t1 (cost=0.00..2986.00 rows=500 width=29)
Filter: (45 = ANY (arr))
(2 rows)

Incidentally, "const = ALL (column)" should be equivalent to "column <@
array[const]". (Assuming the "=" operator chosen in the first statement is
the "=" operator of the array type's default btree opclass). However, I get
significantly different estimates, with the latter getting a better estimate:

[local] test=# explain select * from t1 where 1 = all (arr);
QUERY PLAN
----------------------------------------------------------
Seq Scan on t1 (cost=0.00..2986.00 rows=18407 width=29)
Filter: (1 = ALL (arr))
(2 rows)

[local] test=# explain select * from t1 where arr <@ array[1]http://archives.postgresql.org/message-id/12406.1298055475@sss.pgh.pa.us;
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..1986.00 rows=1 width=29)
Filter: (arr <@ '{1}'::integer[])
(2 rows)

+ 	/*
+ 	 * Rest is a average length of elements which aren't present in mcelem.
+ 	 */
+ 	rest = avg_length;

You define "rest" here as an array length ...

+ 
+ 	default_freq = Min(DEFAULT_CONT_SEL, minfreq / 2);
+ 
+ 	mcelem_index = 0;
+ 
+ 	/*
+ 	 * mult is the multiplier that presents estimate of probability that each
+ 	 * mcelem which is not present in constant doesn't occur.
+ 	 */
+ 	mult = 1.0f;
+ 
+ 	for (i = 0; i < nitems; i++)
+ 	{
+ 		bool		found = false;
+ 
+ 		/* Comparison with previous value in order to guarantee uniquness */
+ 		if (i > 0)
+ 		{
+ 			if (!element_compare(&array_data[i - 1], &array_data[i]))
+ 				continue;
+ 		}
+ 
+ 		/*
+ 		 * Iterate over mcelem until find mcelem that is greater or equal to
+ 		 * element of constant. Simultaneously taking care about rest and
+ 		 * mult. If that mcelem is found then fill corresponding elem_selec.
+ 		 */
+ 		while (mcelem_index < nmcelem)
+ 		{
+ 			int			cmp = element_compare(&mcelem[mcelem_index], &array_data[i]);
+ 
+ 			if (cmp < 0)
+ 			{
+ 				mult *= (1.0f - numbers[mcelem_index]);
+ 				rest -= numbers[mcelem_index];

... But here, you're subtracting a frequency from an array length?

+ /*
+  * Comparison function for elements. Based on default comparison function for
+  * array element data type.
+  */
+ static int
+ element_compare(const void *key1, const void *key2)
+ {
+ 	const Datum *d1 = (const Datum *) key1;
+ 	const Datum *d2 = (const Datum *) key2;
+ 
+ 	cmpfunc.	arg[0] = *d1;
+ 	cmpfunc.	arg[1] = *d2;
+ 	cmpfunc.	argnull[0] = false;
+ 	cmpfunc.	argnull[1] = false;
+ 	cmpfunc.	isnull = false;
+ 
+ 	return DatumGetInt32(FunctionCallInvoke(&cmpfunc));
+ }

We could easily make this reentrant by passing the fcinfo through an argument
and using qsort_arg(). Please do so.

*** /dev/null
--- b/src/backend/utils/adt/array_typanalyze.c
***************
*** 0 ****
--- 1,834 ----
+ /*-------------------------------------------------------------------------
+  *
+  * array_typanalyze.c
+  *	  functions for gathering statistics from array columns
+  *
+  * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/tsearch/array_typanalyze.c

Fix file location.

+  *
+  *-------------------------------------------------------------------------
+  */
+ 
+ #include "postgres.h"
+ #include "access/hash.h"
+ #include "catalog/pg_operator.h"
+ #include "commands/vacuum.h"
+ #include "commands/defrem.h"
+ #include "parser/parse_oper.h"
+ #include "utils/builtins.h"
+ #include "utils/hsearch.h"
+ #include "utils/typcache.h"
+ #include "utils/array.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_collation.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"

Alphabetize the includes after "postgres.h".

+ 
+ #define ARRAY_ANALYZE_CHECK_OID(x) \
+ 	if (!OidIsValid(x)) \
+ 	{ \
+ 		elog(INFO, "Can't collect statistics on %d array type. Array \
+ 					statistics collection requires default hash and btree \
+ 					opclasses for element type.", stats->attrtypid); \
+ 		stats->stats_valid = false; \
+ 		return; \
+ 	}

No message is necessary: the absence of a btree opclass or of both opclasses
degrades normal statistics, and we make no message about it then.

The decision to abandon the stats at this point causes a minor regression for
types having only a default hash opclass. They currently get scalar minimal
stats; now they'll have none. See below for one way to avoid that.

This macro is used in just one function, and the return statement means one
wouldn't lightly use it anywhere else. Therefore, as a minor style point, I'd
place its definition right with the function that uses it rather than at the
head of the file.

+ Datum array_typanalyze(PG_FUNCTION_ARGS);

extern prototypes go in header files, even when it's not strictly necessary.

+ /*
+  *	array_typanalyze -- a custom typanalyze function for array columns
+  */
+ Datum
+ array_typanalyze(PG_FUNCTION_ARGS)
+ {
+ 	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+ 	Form_pg_attribute attr = stats->attr;
+ 	Oid			ltopr;
+ 	Oid			eqopr;
+ 	StdAnalyzeData *mystats;
+ 
+ 	/* If the attstattarget column is negative, use the default value */
+ 	/* NB: it is okay to scribble on stats->attr since it's a copy */
+ 	if (attr->attstattarget < 0)
+ 		attr->attstattarget = default_statistics_target;
+ 
+ 	/* Look for default "<" and "=" operators for column's type */
+ 	get_sort_group_operators(stats->attrtypid,
+ 							 false, false, false,
+ 							 &ltopr, &eqopr, NULL,
+ 							 NULL);
+ 
+ 	/* If column has no "=" operator, we can't do much of anything */
+ 	if (!OidIsValid(eqopr))
+ 		return false;
+ 
+ 	/* Save the operator info for compute_stats routines */
+ 	mystats = (StdAnalyzeData *) palloc(sizeof(StdAnalyzeData));
+ 	mystats->eqopr = eqopr;
+ 	mystats->eqfunc = get_opcode(eqopr);
+ 	mystats->ltopr = ltopr;
+ 	stats->extra_data = mystats;

Instead of duplicating this initialization and exporting StdAnalyzeData,
compute_scalar_stats() and compute_minimal_stats() from analyze.c, I suggest
structuring things as follows. Export only std_typanalyze() and call it here.
If it returns false, return false here, too. Otherwise, proceed with the
additional lookups you currently do in compute_array_stats(). If you don't
find everything you need (default btree operator class, for example), just
return true; the analysis will continue with the standard scalar approach as
setup by std_typanalyze(). Otherwise, replace compute_stats and extra_data
with your own materials.

+ 
+ 	stats->compute_stats = compute_array_stats;
+ 	/* see comment about the choice of minrows in commands/analyze.c */
+ 	stats->minrows = 300 * attr->attstattarget;
+ 
+ 	PG_RETURN_BOOL(true);
+ }
+ 
+ /*
+  *	compute_array_stats() -- compute statistics for a array column
+  *
+  *	This functions computes statistics that are useful for determining <@, &&,
+  *	@> operations selectivity, along with the fraction of non-null rows and
+  *	average width.
+  *
+  *	As an addition to the the most common values, as we do for most datatypes,
+  *	we're looking for the most common elements and length histogram. In the
+  *	case of relatively long arrays it might be more useful, because there most
+  *	probably won't be any two rows with the same array and thus MCV has no
+  *	much sense. With a list of the most common elements we can do a better job
+  *	at figuring out <@, &&, @> selectivity. Arrays length histogram helps to
+  *	"column <@ const" to be more precise.

The histogram addresses array distinct element counts, not array lengths.
That's exactly what we need for the selectivity calculations in question.
Please update the terminology throughout the patch, though.

+  *
+  *	The algorithm used is Lossy Counting, as proposed in the paper "Approximate
+  *	frequency counts over data streams" by G. S. Manku and R. Motwani, in
+  *	Proceedings of the 28th International Conference on Very Large Data Bases,
+  *	Hong Kong, China, August 2002, section 4.2. The paper is available at
+  *	http://www.vldb.org/conf/2002/S10P03.pdf
+  *
+  *	The Lossy Counting (aka LC) algorithm goes like this:
+  *	Let s be the threshold frequency for an item (the minimum frequency we
+  *	are interested in) and epsilon the error margin for the frequency. Let D
+  *	be a set of triples (e, f, delta), where e is an element value, f is that
+  *	element's frequency (actually, its current occurrence count) and delta is
+  *	the maximum error in f. We start with D empty and process the elements in
+  *	batches of size w. (The batch size is also known as "bucket size" and is
+  *	equal to 1/epsilon.) Let the current batch number be b_current, starting
+  *	with 1. For each element e we either increment its f count, if it's
+  *	already in D, or insert a new triple into D with values (e, 1, b_current
+  *	- 1). After processing each batch we prune D, by removing from it all
+  *	elements with f + delta <= b_current.  After the algorithm finishes we
+  *	suppress all elements from D that do not satisfy f >= (s - epsilon) * N,
+  *	where N is the total number of elements in the input.  We emit the
+  *	remaining elements with estimated frequency f/N.  The LC paper proves
+  *	that this algorithm finds all elements with true frequency at least s,
+  *	and that no frequency is overestimated or is underestimated by more than
+  *	epsilon.  Furthermore, given reasonable assumptions about the input
+  *	distribution, the required table size is no more than about 7 times w.
+  *
+  *	We set s to be the estimated frequency of the K'th element in a natural
+  *	language's frequency table, where K is the target number of entries in
+  *	the MCELEM array. We assume that the distribution of element frequencies
+  *	follows Zipf's law with an exponent of 1.
+  *
+  *	Assuming Zipfian distribution, the frequency of the K'th element is equal
+  *	to 1/(K * H(W)) where H(n) is 1/2 + 1/3 + ... + 1/n and W is the number of
+  *	elements in the language.	Putting W as one million, we get roughly
+  *	0.07/K. This gives s = 0.07/K.	We set epsilon = s/10, which gives bucket
+  *	width w = K/0.007 and maximum expected hashtable size of about 1000 * K.

These last two paragraphs, adapted from ts_typanalyze.c, assume natural
language documents. To what extent do these parameter choices remain sensible
for arbitrary data such as users may place in arrays? In any event, we need a
different justification, even if it's just a hand-wavy justification.

If I'm following this correctly, this choice of "s" makes the algorithm
guaranteed to find only elements constituting >= 7% of the input elements.
Incidentally, isn't that far too high for natural language documents? If the
English word "the" only constitutes 7% of typical documents, then this "s"
value would permit us to discard practically every word; we'd be left with
words read while filling the last bucket and words that happened to repeat
exceedingly often in the column. I haven't tried to make a test case to
observe this problem; am I missing something? (This question is largely
orthogonal to your patch.)

+  *
+  *	Note: in the above discussion, s, epsilon, and f/N are in terms of a
+  *	element's frequency as a fraction of all elements seen in the input.
+  *	However, what we actually want to store in the finished pg_statistic
+  *	entry is each element's frequency as a fraction of all rows that it occurs
+  *	in. Elements might be repeated in the same array. Since operators
+  *	<@, &&, @> takes care only about element occurence itself and not about
+  *	occurence count, function takes additional care about uniqueness of
+  *	counting. Also we need to change the divisor from N to nonnull_cnt to get
+  *	the number we want.

On the same tangent, why does ts_typanalyze() not deduplicate the same way?
The @@ operator has the same property.

+  */
+ static void
+ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+ 					int samplerows, double totalrows)
+ {
+ 	int			num_mcelem;
+ 	int			null_cnt = 0;
+ 
+ 	/*
+ 	 * We should count not only null array values, but also null array
+ 	 * elements
+ 	 */
+ 	int			null_elem_cnt = 0.0;
+ 
+ 	double		total_width = 0.0;
+ 	double		total_length = 0.0;
+ 
+ 	/* This is D from the LC algorithm. */
+ 	HTAB	   *elements_tab;
+ 	HASHCTL		elem_hash_ctl;
+ 	HASH_SEQ_STATUS scan_status;
+ 
+ 	/* This is the current bucket number from the LC algorithm */
+ 	int			b_current;
+ 
+ 	/* This is 'w' from the LC algorithm */
+ 	int			bucket_width;
+ 	int			array_no,
+ 				element_no;
+ 	Datum		hash_key;
+ 	TrackItem  *item;
+ 
+ 	int			lengths_count;
+ 	int			length_index;
+ 	int			slot_idx = 0;
+ 	HTAB	   *length_tab;
+ 	HASHCTL		length_hash_ctl;
+ 	LengthItem *length_item;
+ 	LengthItem *sorted_length_tab;
+ 
+ 	/*
+ 	 * Most part of array operators, which selectivity is estimated by this
+ 	 * statistics, takes care only one occurence of element in array. That's
+ 	 * why we should take care about count element occurence only once per
+ 	 * array. To clean occurence flag for each array by iterating over all
+ 	 * hash table can be too expensive. That's why we store pointers to hash
+ 	 * items contain elements which occur in last array.
+ 	 */
+ 	TrackItem **occurences = NULL;
+ 	int			occurences_count = 0,
+ 				occurence_index;
+ 
+ 	TypeCacheEntry *typentry;
+ 	Oid			hash_opclass,
+ 				hash_opfamily,
+ 				element_typeid,
+ 				hash_oroc;
+ 	FmgrInfo	hash_func_info;
+ 
+ 	StdAnalyzeData *mystats = (StdAnalyzeData *) stats->extra_data;
+ 
+ 	/* Compute standard statistics */
+ 	if (OidIsValid(mystats->ltopr))
+ 		compute_scalar_stats(stats, fetchfunc, samplerows, totalrows);
+ 	else
+ 		compute_minimal_stats(stats, fetchfunc, samplerows, totalrows);
+ 
+ 
+ 	/* Gathering all necessary information about element data type. */
+ 
+ 	element_typeid = stats->attrtype->typelem;
+ 
+ 	if (!OidIsValid(element_typeid))
+ 		elog(ERROR, "array_typanalyze was invoked with %d non-array type",
+ 			 stats->attrtypid);
+ 
+ 	typentry = lookup_type_cache(element_typeid, TYPECACHE_EQ_OPR |
+ 	 TYPECACHE_CMP_PROC | TYPECACHE_EQ_OPR_FINFO | TYPECACHE_CMP_PROC_FINFO);
+ 	ARRAY_ANALYZE_CHECK_OID(typentry->cmp_proc);
+ 	ARRAY_ANALYZE_CHECK_OID(typentry->eq_opr);
+ 
+ 	hash_opclass = GetDefaultOpClass(element_typeid, HASH_AM_OID);
+ 	ARRAY_ANALYZE_CHECK_OID(hash_opclass);
+ 
+ 	hash_opfamily = get_opclass_family(hash_opclass);
+ 	ARRAY_ANALYZE_CHECK_OID(hash_opfamily);
+ 
+ 	hash_oroc = get_opfamily_proc(hash_opfamily, element_typeid,
+ 								  element_typeid, HASHPROC);
+ 	ARRAY_ANALYZE_CHECK_OID(hash_oroc);
+ 
+ 	fmgr_info(hash_oroc, &hash_func_info);
+ 
+ 	InitFunctionCallInfoData(element_type_info.cmp, &typentry->cmp_proc_finfo,
+ 							 2, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	InitFunctionCallInfoData(element_type_info.eq, &typentry->eq_opr_finfo,
+ 							 2, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	InitFunctionCallInfoData(element_type_info.hash, &hash_func_info,
+ 							 1, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	element_type_info.typbyval = typentry->typbyval;

As I mentioned above in passing, all of the above setup only needs to happen
once per analyzed column, not once per tuple. It belongs in array_typanalyze;
define a struct to hold all the looked-up state, including a pointer to any
state for std_typanalyze(), and store that in stats->extra_data.

This code should probably get around to using the new SortSupport
infrastructure like analyze.c now uses. Not sure whether that's mandatory for
initial commit.

typanalyze functions that call arbitrary user code must be reentrant. It only
matters in corner cases, but crashing in those corner cases is not acceptable.
See our use of CurTupleHashTable in execGrouping.c and defend this global
state in a similar fashion.

+ 
+ 	/*
+ 	 * We want statistics_target * 10 elements in the MCELEM array. This
+ 	 * multiplier is pretty arbitrary, but is meant to reflect the fact that
+ 	 * the number of individual elements tracked in pg_statistic ought to be
+ 	 * more than the number of values for a simple scalar column.
+ 	 */
+ 	num_mcelem = stats->attr->attstattarget * 10;
+ 
+ 	/*
+ 	 * We set bucket width equal to (num_mcelem + 10) / 0.007 as per the
+ 	 * comment above.
+ 	 */
+ 	bucket_width = num_mcelem * 1000 / 7;

The addend mentioned is not present in the code or discussed in "the comment
above". (I see the comment is copied verbatim from ts_typanalyze(), where the
addend *is* present, though again the preceding comment says nothing of it.)

+ 
+ 	/*
+ 	 * Create the hashtable. It will be in local memory, so we don't need to
+ 	 * worry about overflowing the initial size. Also we don't need to pay any
+ 	 * attention to locking and memory management.
+ 	 */
+ 	MemSet(&elem_hash_ctl, 0, sizeof(elem_hash_ctl));
+ 	elem_hash_ctl.keysize = sizeof(Datum);
+ 	elem_hash_ctl.entrysize = sizeof(TrackItem);
+ 	elem_hash_ctl.hash = element_hash;
+ 	elem_hash_ctl.match = element_match;
+ 	elem_hash_ctl.hcxt = CurrentMemoryContext;
+ 	elements_tab = hash_create("Analyzed elements table",
+ 							   bucket_width * 7,

Though it's copied from compute_tsvector_stats(), why "7"?

+ 							   &elem_hash_ctl,
+ 					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+ 
+ 	/*
+ 	 * hashtable for arrays lengths.
+ 	 */
+ 	MemSet(&length_hash_ctl, 0, sizeof(length_hash_ctl));
+ 	length_hash_ctl.keysize = sizeof(int);
+ 	length_hash_ctl.entrysize = sizeof(LengthItem);
+ 	length_hash_ctl.hash = tag_hash;

You need to pass the HASH_FUNCTION flag for this setting to take effect.

+ length_hash_ctl.match = memcmp;

Omit this. You would need to pass HASH_COMPARE for it to take effect, but
it's also implicit once you override the hash function.

+ 	length_hash_ctl.hcxt = CurrentMemoryContext;
+ 	length_tab = hash_create("Array length table",
+ 							 64,
+ 							 &length_hash_ctl,
+ 							 HASH_ELEM | HASH_CONTEXT);
+ 
+ 	/* Initialize counters. */
+ 	b_current = 1;
+ 	element_no = 0;
+ 
+ 	/* Loop over the arrays. */
+ 	for (array_no = 0; array_no < samplerows; array_no++)
+ 	{
+ 		Datum		value;
+ 		bool		isnull;
+ 		bool		null_present;
+ 		ArrayType  *array;
+ 		char	   *ptr;
+ 		bits8	   *bitmap;
+ 		int			bitmask;
+ 		int			j;
+ 		int			ndims;
+ 		int		   *dims;
+ 		int			nitems;
+ 		bool		length_found;
+ 
+ 		vacuum_delay_point();
+ 
+ 		value = fetchfunc(stats, array_no, &isnull);
+ 
+ 		/*
+ 		 * Check for null/nonnull.
+ 		 */
+ 		if (isnull)
+ 		{
+ 			null_cnt++;
+ 			continue;
+ 		}
+ 
+ 		/*
+ 		 * Add up widths for average-width calculation.  Since it's a array,
+ 		 * we know it's varlena.  As in the regular compute_minimal_stats
+ 		 * function, we use the toasted width for this calculation.
+ 		 */
+ 		total_width += VARSIZE_ANY(DatumGetPointer(value));
+ 
+ 		/*
+ 		 * Now detoast the array if needed.
+ 		 */
+ 		array = DatumGetArrayTypeP(value);
+ 		ptr = ARR_DATA_PTR(array);
+ 		bitmap = ARR_NULLBITMAP(array);
+ 		bitmask = 1;
+ 		ndims = ARR_NDIM(array);
+ 		dims = ARR_DIMS(array);
+ 		nitems = ArrayGetNItems(ndims, dims);
+ 
+ 		/*
+ 		 * Check if we have enough of memory to store element occurences in
+ 		 * one array.
+ 		 */
+ 		if (nitems > occurences_count)
+ 		{
+ 			occurences_count = 2 * nitems;
+ 			if (occurences)
+ 				occurences = (TrackItem **) repalloc(occurences,
+ 									 sizeof(TrackItem *) * occurences_count);
+ 			else
+ 				occurences = (TrackItem **) palloc(
+ 									 sizeof(TrackItem *) * occurences_count);
+ 		}
+ 		occurence_index = 0;
+ 
+ 		null_present = false;
+ 
+ 		/*
+ 		 * We loop through the elements in the array and add them to our
+ 		 * tracking hashtable.	Note: the hashtable entries will point into
+ 		 * the (detoasted) array value, therefore we cannot free that storage
+ 		 * until we're done.
+ 		 */

The second sentence of this comment is obsolete.

+ 		for (j = 0; j < nitems; j++)
+ 		{
+ 			bool		found;
+ 			bool		isnull;
+ 
+ 			/* Get elements, checking for NULL */
+ 			if (bitmap && (*bitmap & bitmask) == 0)
+ 			{
+ 				hash_key = (Datum) 0;
+ 				isnull = true;
+ 				null_present = true;
+ 			}
+ 			else
+ 			{
+ 				/* Get element value */
+ 				hash_key = fetch_att(ptr, typentry->typbyval, typentry->typlen);
+ 				isnull = false;
+ 
+ 				/*
+ 				 * We should allocate memory for element if it isn't passing
+ 				 * by value, because array will be freed after that loop.
+ 				 */
+ 				if (!typentry->typbyval)
+ 				{
+ 					Datum		tmp;
+ 					int			length;
+ 
+ 					length = (typentry->typlen == -1) ?
+ 						VARSIZE(hash_key) : typentry->typlen;
+ 					tmp = (Datum) MemoryContextAlloc(stats->anl_context, length);
+ 					memcpy((void *) tmp, (void *) hash_key, length);
+ 					hash_key = tmp;
+ 				}

Please use datumCopy() or add a comment about why it's insufficient.

+ 				ptr = att_addlength_pointer(ptr, typentry->typlen, ptr);
+ 				ptr = (char *) att_align_nominal(ptr, typentry->typalign);
+ 			}
+ 
+ 			/* Advance bitmap pointers if any */
+ 			bitmask <<= 1;
+ 			if (bitmask == 0x100)
+ 			{
+ 				if (bitmap)
+ 					bitmap++;
+ 				bitmask = 1;
+ 			}
+ 
+ 			/* No null element processing other then flag setting here */
+ 			if (isnull)
+ 				continue;
+ 
+ 			/* Lookup current element in hashtable, adding it if new */
+ 			item = (TrackItem *) hash_search(elements_tab,
+ 											 (const void *) &hash_key,
+ 											 HASH_ENTER, &found);
+ 
+ 			if (found)
+ 			{
+ 				/*
+ 				 * The element is already on the tracking list. If it is the
+ 				 * first occurence in array then update element frequency.
+ 				 */
+ 				if (!item->occurence)
+ 				{
+ 					item->frequency++;
+ 					item->occurence = true;
+ 					occurences[occurence_index++] = item;
+ 				}
+ 			}
+ 			else
+ 			{
+ 				/* Initialize new tracking list element */
+ 				item->frequency = 1;
+ 				item->delta = b_current - 1;
+ 				item->occurence = true;
+ 				occurences[occurence_index++] = item;
+ 			}
+ 
+ 			/* element_no is the number of elements processed (ie N) */
+ 			element_no++;

We should not bump element_no when we skipped the element as a duplicate
within the same array.

+ 
+ 			/* We prune the D structure after processing each bucket */
+ 			if (element_no % bucket_width == 0)
+ 			{
+ 				prune_element_hashtable(elements_tab, b_current);
+ 				b_current++;
+ 			}
+ 		}
+ 		/* Count null element only once per array */
+ 		if (null_present)
+ 			null_elem_cnt++;
+ 
+ 		/* Update frequency of particular array length. */
+ 		length_item = (LengthItem *) hash_search(length_tab,
+ 												 &occurence_index,
+ 												 HASH_ENTER, &length_found);
+ 		if (length_found)
+ 		{
+ 			length_item->frequency++;
+ 		}
+ 		else
+ 		{
+ 			length_item->length = occurence_index;
+ 			length_item->frequency = 1;
+ 		}
+ 		total_length += occurence_index;
+ 
+ 		/*
+ 		 * When we end processing of particular array we should clean the
+ 		 * occurence flag.
+ 		 */
+ 		for (j = 0; j < occurence_index; j++)
+ 			occurences[j]->occurence = false;

Could you usefully simplify this away by storing the last-containing array_no,
instead of a bool, in each hash entry?

+ 
+ 		/* We should free memory from array if it was copied during detoast. */
+ 		if ((Datum) array != value)
+ 			pfree((void *) array);

No need to for casts to "void *".

+ 	}
+ 
+ 	/* Skip slots occupied by standard statistics */
+ 	while (OidIsValid(stats->stakind[slot_idx]))
+ 		slot_idx++;
+ 
+ 	/* Fill histogram of arrays lengths. */
+ 	lengths_count = hash_get_num_entries(length_tab);
+ 	if (lengths_count > 0)
+ 	{
+ 		int			num_hist = stats->attr->attstattarget;
+ 		int			delta;
+ 		int			frac;
+ 		int			i;
+ 		Datum	   *hist_values;
+ 
+ 		/* Copy lengths statistics from hashtab to array and sort them. */
+ 		length_index = 0;
+ 		sorted_length_tab = (LengthItem *) palloc(sizeof(LengthItem) * lengths_count);
+ 		hash_seq_init(&scan_status, length_tab);
+ 		while ((length_item = (LengthItem *) hash_seq_search(&scan_status)) != NULL)
+ 		{
+ 			memcpy(&sorted_length_tab[length_index], length_item,
+ 				   sizeof(LengthItem));
+ 			length_index++;
+ 		}
+ 		qsort(sorted_length_tab, lengths_count, sizeof(LengthItem),
+ 			  lengthitem_compare_element);
+ 
+ 		/* Histogram should be stored in anl_context. */
+ 		hist_values = (Datum *) MemoryContextAlloc(stats->anl_context,
+ 												   sizeof(Datum) * num_hist);
+ 		/* Fill histogram by hashtab. */
+ 		delta = samplerows - null_cnt - 1;
+ 		length_index = 0;
+ 		frac = sorted_length_tab[0].frequency * (num_hist - 1);
+ 		for (i = 0; i < num_hist; i++)
+ 		{
+ 			hist_values[i] =
+ 				Int32GetDatum(sorted_length_tab[length_index].length);
+ 			frac -= delta;
+ 			while (frac <= 0)
+ 			{
+ 				length_index++;
+ 				frac += sorted_length_tab[length_index].frequency *
+ 					(num_hist - 1);
+ 			}
+ 		}
+ 
+ 		stats->stakind[slot_idx] = STATISTIC_KIND_LENGTH_HISTOGRAM;
+ 		stats->staop[slot_idx] = Int4EqualOperator;
+ 		stats->stavalues[slot_idx] = hist_values;
+ 		stats->numvalues[slot_idx] = num_hist;
+ 		/* We are storing values of element type */
+ 		stats->statypid[slot_idx] = INT4OID;
+ 		stats->statyplen[slot_idx] = 4;
+ 		stats->statypbyval[slot_idx] = true;
+ 		stats->statypalign[slot_idx] = 'i';
+ 		slot_idx++;
+ 	}
+ 
+ 	/* We can only compute real stats if we found some non-null values. */
+ 	if (null_cnt < samplerows)
+ 	{
+ 		int			nonnull_cnt = samplerows - null_cnt;
+ 		int			i;
+ 		TrackItem **sort_table;
+ 		int			track_len;
+ 		int			cutoff_freq;
+ 		int			minfreq,
+ 					maxfreq;
+ 
+ 		stats->stats_valid = true;
+ 		/* Do the simple null-frac and average width stats */
+ 		stats->stanullfrac = (double) null_cnt / (double) samplerows;
+ 		stats->stawidth = total_width / (double) nonnull_cnt;

Isn't this redundant with the calculations made in compute_scalar_stats()?

+ 
+ 		/* Assume it's a unique column (see notes above) */
+ 		stats->stadistinct = -1.0;

The comment, copied from ts_typanalyze(), refers to notes not likewise copied.
We should probably instead leave whatever compute_scalar_stats() calculated.

*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
***************
*** 1705,1710 **** scalararraysel(PlannerInfo *root,
--- 1705,1736 ----
RegProcedure oprsel;
FmgrInfo	oprselproc;
Selectivity s1;
+ 	bool		varonleft;
+ 	Node	   *other;
+ 	VariableStatData vardata;
+ 	
+ 	/*
+ 	 * Handle "const = qual(column)" case using array column statistics.
+ 	 */
+ 	if (get_restriction_variable(root, clause->args, varRelid,
+ 								  &vardata, &other, &varonleft))
+ 	{
+ 		Oid elemtype;
+ 		elemtype = get_base_element_type(vardata.vartype);
+ 		if (elemtype != InvalidOid && IsA(other, Const))
+ 		{
+ 			if (((Const *) other)->constisnull)
+ 			{
+ 				/* qual can't succeed if null array */
+ 				ReleaseVariableStats(vardata);
+ 				return (Selectivity) 0.0;
+ 			}
+ 			s1 = calc_scalararraysel(&vardata, ((Const *) other)->constvalue, useOr);
+ 			ReleaseVariableStats(vardata);
+ 			return s1;
+ 		}
+ 		ReleaseVariableStats(vardata);
+ 	}

If we're going to add a new file for array selectivity functions (which seems
reasonable), scalararraysel() should also move there. (To do this and still
keep the patch easy to read, you could do the move in a pre-patch.)

*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 849,854 **** DATA(insert OID = 2334 (  array_agg_finalfn   PGNSP PGUID 12 1 0 0 0 f f f f f i
--- 849,859 ----
DESCR("aggregate final function");
DATA(insert OID = 2335 (  array_agg		   PGNSP PGUID 12 1 0 0 0 t f f f f i 1 0 2277 "2283" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
DESCR("concatenate aggregate input into an array");
+ DATA(insert OID = 3816 (  array_typanalyze PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 16 "2281" _null_ _null_ _null_ _null_ array_typanalyze _null_ _null_ _null_ ));
+ DESCR("array statistics collector");
+ #define ArrayTypanalyzeOid 3816

Use the fmgroids.h symbol, F_ARRAY_TYPANALYZE, instead.

+ DATA(insert OID = 3817 (  arraysel		   PGNSP PGUID 12 1 0 0 0 f f f t f s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ arraysel _null_ _null_ _null_ ));
+ DESCR("array selectivity estimation functions");
DATA(insert OID = 760 (  smgrin			   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 210 "2275" _null_ _null_ _null_ _null_	smgrin _null_ _null_ _null_ ));
DESCR("I/O");
*** a/src/include/catalog/pg_statistic.h
--- b/src/include/catalog/pg_statistic.h

/*
* Currently, three statistical slot "kinds" are defined: most common values,

Here's a larger quote of the comment starting here:

/*
* Currently, three statistical slot "kinds" are defined: most common values,
* histogram, and correlation. Additional "kinds" will probably appear in
* future to help cope with non-scalar datatypes. Also, custom data types
* can define their own "kind" codes by mutual agreement between a custom
* typanalyze routine and the selectivity estimation functions of the type's
* operators.

That needs an update. (It already needs an update for STATISTIC_KIND_MCELEM,
but this patch would invalidate it yet again.)

***************
*** 260,263 **** typedef FormData_pg_statistic *Form_pg_statistic;
--- 268,274 ----
*/
#define STATISTIC_KIND_MCELEM  4
+ 
+ #define STATISTIC_KIND_LENGTH_HISTOGRAM  5

The other kinds have long explanatory comments; this one should, too.

*** a/src/include/catalog/pg_type.h
--- b/src/include/catalog/pg_type.h

! DATA(insert OID = 1021 ( _float4 PGNSP PGUID -1 f b A f t \054 0 700 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ ));

With this patch, a fresh database sets typanalyze = array_typanalyze for 27
array types and leaves typanalyze = NULL for the other 38 array types. What
is the rationale for the split? For example, why does real[] use the new
typanalyze but numeric[] does not?

Thanks,
nm

Attachments:

regression.diffstext/plain; charset=us-asciiDownload
*** /home/nm/src/pg/postgresql/src/test/regress/expected/arrays.out	2011-12-29 00:59:49.000000000 -0500
--- /home/nm/src/pg/postgresql/src/test/regress/results/arrays.out	2011-12-29 01:04:42.000000000 -0500
***************
*** 422,427 ****
--- 422,429 ----
  (1 row)
  
  ANALYZE array_op_test;
+ WARNING:  problem in alloc set Analyze: detected write past chunk end in block 0xcfacd0, chunk 0xd07ed8
+ WARNING:  problem in alloc set Analyze: detected write past chunk end in block 0xcecc90, chunk 0xcf3d68
  SELECT * FROM array_op_test WHERE i @> '{32}' ORDER BY seqno;
   seqno |                i                |                                                                 t                                                                  
  -------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------

======================================================================

*** /home/nm/src/pg/postgresql/src/test/regress/expected/rules.out	2011-11-05 13:58:21.000000000 -0400
--- /home/nm/src/pg/postgresql/src/test/regress/results/rules.out	2011-12-29 01:04:49.000000000 -0500
***************
*** 1317,1323 ****
   pg_statio_user_indexes          | SELECT pg_statio_all_indexes.relid, pg_statio_all_indexes.indexrelid, pg_statio_all_indexes.schemaname, pg_statio_all_indexes.relname, pg_statio_all_indexes.indexrelname, pg_statio_all_indexes.idx_blks_read, pg_statio_all_indexes.idx_blks_hit FROM pg_statio_all_indexes WHERE ((pg_statio_all_indexes.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_indexes.schemaname !~ '^pg_toast'::text));
   pg_statio_user_sequences        | SELECT pg_statio_all_sequences.relid, pg_statio_all_sequences.schemaname, pg_statio_all_sequences.relname, pg_statio_all_sequences.blks_read, pg_statio_all_sequences.blks_hit FROM pg_statio_all_sequences WHERE ((pg_statio_all_sequences.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_sequences.schemaname !~ '^pg_toast'::text));
   pg_statio_user_tables           | SELECT pg_statio_all_tables.relid, pg_statio_all_tables.schemaname, pg_statio_all_tables.relname, pg_statio_all_tables.heap_blks_read, pg_statio_all_tables.heap_blks_hit, pg_statio_all_tables.idx_blks_read, pg_statio_all_tables.idx_blks_hit, pg_statio_all_tables.toast_blks_read, pg_statio_all_tables.toast_blks_hit, pg_statio_all_tables.tidx_blks_read, pg_statio_all_tables.tidx_blks_hit FROM pg_statio_all_tables WHERE ((pg_statio_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_tables.schemaname !~ '^pg_toast'::text));
!  pg_stats                        | SELECT n.nspname AS schemaname, c.relname AS tablename, a.attname, s.stainherit AS inherited, s.stanullfrac AS null_frac, s.stawidth AS avg_width, s.stadistinct AS n_distinct, CASE WHEN (s.stakind1 = ANY (ARRAY[1, 4])) THEN s.stavalues1 WHEN (s.stakind2 = ANY (ARRAY[1, 4])) THEN s.stavalues2 WHEN (s.stakind3 = ANY (ARRAY[1, 4])) THEN s.stavalues3 WHEN (s.stakind4 = ANY (ARRAY[1, 4])) THEN s.stavalues4 ELSE NULL::anyarray END AS most_common_vals, CASE WHEN (s.stakind1 = ANY (ARRAY[1, 4])) THEN s.stanumbers1 WHEN (s.stakind2 = ANY (ARRAY[1, 4])) THEN s.stanumbers2 WHEN (s.stakind3 = ANY (ARRAY[1, 4])) THEN s.stanumbers3 WHEN (s.stakind4 = ANY (ARRAY[1, 4])) THEN s.stanumbers4 ELSE NULL::real[] END AS most_common_freqs, CASE WHEN (s.stakind1 = 2) THEN s.stavalues1 WHEN (s.stakind2 = 2) THEN s.stavalues2 WHEN (s.stakind3 = 2) THEN s.stavalues3 WHEN (s.stakind4 = 2) THEN s.stavalues4 ELSE NULL::anyarray END AS histogram_bounds, CASE WHEN (s.stakind1 = 3) THEN s.stanumbers1[1] WHEN (s.stakind2 = 3) THEN s.stanumbers2[1] WHEN (s.stakind3 = 3) THEN s.stanumbers3[1] WHEN (s.stakind4 = 3) THEN s.stanumbers4[1] ELSE NULL::real END AS correlation FROM (((pg_statistic s JOIN pg_class c ON ((c.oid = s.starelid))) JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum)))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text));
   pg_tables                       | SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
   pg_timezone_abbrevs             | SELECT pg_timezone_abbrevs.abbrev, pg_timezone_abbrevs.utc_offset, pg_timezone_abbrevs.is_dst FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
   pg_timezone_names               | SELECT pg_timezone_names.name, pg_timezone_names.abbrev, pg_timezone_names.utc_offset, pg_timezone_names.is_dst FROM pg_timezone_names() pg_timezone_names(name, abbrev, utc_offset, is_dst);
--- 1317,1323 ----
   pg_statio_user_indexes          | SELECT pg_statio_all_indexes.relid, pg_statio_all_indexes.indexrelid, pg_statio_all_indexes.schemaname, pg_statio_all_indexes.relname, pg_statio_all_indexes.indexrelname, pg_statio_all_indexes.idx_blks_read, pg_statio_all_indexes.idx_blks_hit FROM pg_statio_all_indexes WHERE ((pg_statio_all_indexes.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_indexes.schemaname !~ '^pg_toast'::text));
   pg_statio_user_sequences        | SELECT pg_statio_all_sequences.relid, pg_statio_all_sequences.schemaname, pg_statio_all_sequences.relname, pg_statio_all_sequences.blks_read, pg_statio_all_sequences.blks_hit FROM pg_statio_all_sequences WHERE ((pg_statio_all_sequences.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_sequences.schemaname !~ '^pg_toast'::text));
   pg_statio_user_tables           | SELECT pg_statio_all_tables.relid, pg_statio_all_tables.schemaname, pg_statio_all_tables.relname, pg_statio_all_tables.heap_blks_read, pg_statio_all_tables.heap_blks_hit, pg_statio_all_tables.idx_blks_read, pg_statio_all_tables.idx_blks_hit, pg_statio_all_tables.toast_blks_read, pg_statio_all_tables.toast_blks_hit, pg_statio_all_tables.tidx_blks_read, pg_statio_all_tables.tidx_blks_hit FROM pg_statio_all_tables WHERE ((pg_statio_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_tables.schemaname !~ '^pg_toast'::text));
!  pg_stats                        | SELECT n.nspname AS schemaname, c.relname AS tablename, a.attname, s.stainherit AS inherited, s.stanullfrac AS null_frac, s.stawidth AS avg_width, s.stadistinct AS n_distinct, CASE WHEN (s.stakind1 = 1) THEN s.stavalues1 WHEN (s.stakind2 = 1) THEN s.stavalues2 WHEN (s.stakind3 = 1) THEN s.stavalues3 WHEN (s.stakind4 = 1) THEN s.stavalues4 WHEN (s.stakind5 = 1) THEN s.stavalues5 ELSE NULL::anyarray END AS most_common_vals, CASE WHEN (s.stakind1 = 1) THEN s.stanumbers1 WHEN (s.stakind2 = 1) THEN s.stanumbers2 WHEN (s.stakind3 = 1) THEN s.stanumbers3 WHEN (s.stakind4 = 1) THEN s.stanumbers4 WHEN (s.stakind5 = 1) THEN s.stanumbers5 ELSE NULL::real[] END AS most_common_freqs, CASE WHEN (s.stakind1 = 2) THEN s.stavalues1 WHEN (s.stakind2 = 2) THEN s.stavalues2 WHEN (s.stakind3 = 2) THEN s.stavalues3 WHEN (s.stakind4 = 2) THEN s.stavalues4 WHEN (s.stakind5 = 2) THEN s.stavalues5 ELSE NULL::anyarray END AS histogram_bounds, CASE WHEN (s.stakind1 = 3) THEN s.stanumbers1[1] WHEN (s.stakind2 = 3) THEN s.stanumbers2[1] WHEN (s.stakind3 = 3) THEN s.stanumbers3[1] WHEN (s.stakind4 = 3) THEN s.stanumbers4[1] WHEN (s.stakind5 = 3) THEN s.stanumbers5[1] ELSE NULL::real END AS correlation, CASE WHEN (s.stakind1 = 4) THEN s.stavalues1 WHEN (s.stakind2 = 4) THEN s.stavalues2 WHEN (s.stakind3 = 4) THEN s.stavalues3 WHEN (s.stakind4 = 4) THEN s.stavalues4 WHEN (s.stakind5 = 4) THEN s.stavalues5 ELSE NULL::anyarray END AS most_common_elems, CASE WHEN (s.stakind1 = 4) THEN s.stanumbers1 WHEN (s.stakind2 = 4) THEN s.stanumbers2 WHEN (s.stakind3 = 4) THEN s.stanumbers3 WHEN (s.stakind4 = 4) THEN s.stanumbers4 WHEN (s.stakind5 = 4) THEN s.stanumbers5 ELSE NULL::real[] END AS most_common_elem_freqs, CASE WHEN (s.stakind1 = 5) THEN s.stavalues1 WHEN (s.stakind2 = 5) THEN s.stavalues2 WHEN (s.stakind3 = 5) THEN s.stavalues3 WHEN (s.stakind4 = 5) THEN s.stavalues4 WHEN (s.stakind5 = 5) THEN s.stavalues5 ELSE NULL::anyarray END AS length_histogram_bounds FROM (((pg_statistic s JOIN pg_class c ON ((c.oid = s.starelid))) JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum)))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text));
   pg_tables                       | SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
   pg_timezone_abbrevs             | SELECT pg_timezone_abbrevs.abbrev, pg_timezone_abbrevs.utc_offset, pg_timezone_abbrevs.is_dst FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
   pg_timezone_names               | SELECT pg_timezone_names.name, pg_timezone_names.abbrev, pg_timezone_names.utc_offset, pg_timezone_names.is_dst FROM pg_timezone_names() pg_timezone_names(name, abbrev, utc_offset, is_dst);

======================================================================

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#6)
1 attachment(s)
Re: Collect frequency statistics for arrays

Hi!

Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS etc.).

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.8.patch.gzapplication/x-gzip; name=arrayanalyze-0.8.patch.gzDownload
#8Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#7)
Re: Collect frequency statistics for arrays

On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:

Thanks for your great work on reviewing this patch. Now I'm trying to find
memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS etc.).

I get the same diagnostic from this version. Opteron processor, operating
system is Ubuntu 8.04 (64-bit). You're using --enable-cassert, right?

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#8)
Re: Collect frequency statistics for arrays

On Wed, Jan 4, 2012 at 12:33 AM, Noah Misch <noah@leadboat.com> wrote:

On Wed, Jan 04, 2012 at 12:09:16AM +0400, Alexander Korotkov wrote:

Thanks for your great work on reviewing this patch. Now I'm trying to

find

memory corruption bug. Unfortunately it doesn't appears on my system. Can
you check if this bug remains in attached version of patch. If so, please
provide me information about system you're running (processor, OS etc.).

I get the same diagnostic from this version. Opteron processor, operating
system is Ubuntu 8.04 (64-bit). You're using --enable-cassert, right?

Oh, actually no. Thanks for point.

------
With best regards,
Alexander Korotkov.

#10Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#6)
Re: Collect frequency statistics for arrays

Corrections:

On Thu, Dec 29, 2011 at 11:35:00AM -0500, Noah Misch wrote:

On Wed, Nov 09, 2011 at 08:49:35PM +0400, Alexander Korotkov wrote:

+  *	We set s to be the estimated frequency of the K'th element in a natural
+  *	language's frequency table, where K is the target number of entries in
+  *	the MCELEM array. We assume that the distribution of element frequencies
+  *	follows Zipf's law with an exponent of 1.
+  *
+  *	Assuming Zipfian distribution, the frequency of the K'th element is equal
+  *	to 1/(K * H(W)) where H(n) is 1/2 + 1/3 + ... + 1/n and W is the number of
+  *	elements in the language.	Putting W as one million, we get roughly
+  *	0.07/K. This gives s = 0.07/K.	We set epsilon = s/10, which gives bucket
+  *	width w = K/0.007 and maximum expected hashtable size of about 1000 * K.

These last two paragraphs, adapted from ts_typanalyze.c, assume natural
language documents. To what extent do these parameter choices remain sensible
for arbitrary data such as users may place in arrays? In any event, we need a
different justification, even if it's just a hand-wavy justification.

If I'm following this correctly, this choice of "s" makes the algorithm
guaranteed to find only elements constituting >= 7% of the input elements.
Incidentally, isn't that far too high for natural language documents? If the
English word "the" only constitutes 7% of typical documents, then this "s"
value would permit us to discard practically every word; we'd be left with
words read while filling the last bucket and words that happened to repeat
exceedingly often in the column. I haven't tried to make a test case to
observe this problem; am I missing something? (This question is largely
orthogonal to your patch.)

No, we'll find elements of frequency at least 0.07/(default_statistics_target
* 10) -- in the default configuration, 0.007%. Also, ts_typanalyze() counts
the number of documents that contain one or more instances of each lexeme,
ignoring the number of appearances within each document. The word "the" may
constitute 7% of a typical document, but it will appear at least once in
nearly 100% of documents. Therefore, this "s" value is adequate even for the
pathological case of each "document" containing just one lexeme.

+  *
+  *	Note: in the above discussion, s, epsilon, and f/N are in terms of a
+  *	element's frequency as a fraction of all elements seen in the input.
+  *	However, what we actually want to store in the finished pg_statistic
+  *	entry is each element's frequency as a fraction of all rows that it occurs
+  *	in. Elements might be repeated in the same array. Since operators
+  *	<@, &&, @> takes care only about element occurence itself and not about
+  *	occurence count, function takes additional care about uniqueness of
+  *	counting. Also we need to change the divisor from N to nonnull_cnt to get
+  *	the number we want.

On the same tangent, why does ts_typanalyze() not deduplicate the same way?
The @@ operator has the same property.

Answer: to_tsvector() will have already done so.

+ 	/*
+ 	 * We set bucket width equal to (num_mcelem + 10) / 0.007 as per the
+ 	 * comment above.
+ 	 */
+ 	bucket_width = num_mcelem * 1000 / 7;

The addend mentioned is not present in the code or discussed in "the comment
above". (I see the comment is copied verbatim from ts_typanalyze(), where the
addend *is* present, though again the preceding comment says nothing of it.)

The addend rationale in fact does appear in the ts_typanalyze() comment.

Thanks,
nm

#11Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#10)
1 attachment(s)
Re: Collect frequency statistics for arrays

Hi!

Patch where most part of issues are fixed is attached.

On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch <noah@leadboat.com> wrote:

I find distressing the thought of having two copies of the lossy sampling
code, each implementing the algorithm with different variable names and

levels

of generality. We might someday extend this to hstore, and then we'd

have yet

another copy. Tom commented[1] that ts_typanalyze() and

array_typanalyze()

should remain distinct, and I agree. However, they could call a shared
counting module. Is that practical? Possible API:

typedef struct LossyCountCtl;
LossyCountCtl *LossyCountStart(float s,
float epsilon,
int2 typlen,
bool typbyval,
Oid eqfunc); /*

+ hash func, a few others */

void LossyCountAdd(LossyCountCtl *ctl, Datum elem);
TrackItem **LossyCountGetAll(LossyCountCtl *ctl);

[1]

http://archives.postgresql.org/message-id/12406.1298055475@sss.pgh.pa.us

I'm not sure about shared lossy counting module, because part of shared
code would be relatively small. Part of compute_array_stats function which
is taking care about array decompression, distinct occurence calculation,
disting element count histogram, packing statistics slots etc is much
larger than lossy counting algorithm itself. May be, there is some other
opinions in community?

I think this is an improvement, but some code out there may rely on the
ability to get stakind = 4 data from the most_common_vals column. We'll

need

to mention this in the release notes as an incompatibility.

I'm not sure I understand mechanism of release notes. Does it require
something in a patch itself?

+ /*
+  * Let be n independent events with probabilities p. This function

calculates

+  * probabilities of exact k of events occurence for k in [0;m].
+  * Imagine matrix M of (n + 1) x (m + 1) size. Element M[i,j] denotes
+  * probability that exact j of first i events occurs. Obviously

M[0,0] = 1.

+ * Each next event increase total number of occured events if it

occurs and

+ * leave last value of that number if it doesn't occur. So, by the

law of

+ * total probability: M[i,j] = M[i - 1, j] * (1 - p[i]) + M[i - 1, j

- 1] * p[i]

+  * for i > 0, j > 0. M[i,0] = M[i - 1, 0] * (1 - p[i]) for i > 0.
+  * Also there could be some events with low probabilities. Their

summary

+  * probability passed in the rest parameter.
+  */
+ static float *
+ calc_distr(float *p, int n, int m, float rest)
+ {
+     /* Take care about events with low probabilities. */
+     if (rest > 0.0f)
+     {
+             /*
+              * The probability of no occurence of events which forms

"rest"

+ * probability have a limit of exp(-rest) when number of

events fo to

+ * infinity. Another simplification is to replace that

events with one

+              * event with (1 - exp(-rest)) probability.
+              */
+             rest = 1.0f - exp(-rest);

What is the name of the underlying concept in probability theory?

The most closest concept to caculated distribution is multinomial
distribution. But it's not exactly same, because multinomial distribution
gives probability of particular count of each event occurece, not
probability of summary occurence. Actually, distribution is caclulated just
from assumption of events independence. The most closest concept of rest
probability is approximation by exponential distribution. It's quite rough
approximation, but I can't invent something better with low calculation
complexity.

+ /*
+  * Array selectivity estimation based on most common elements

statistics for

+ * "column <@ const" case. Assumption that element occurences are

independent

+ * means certain distribution of array lengths. Typically real

distribution

+ * of lengths is significantly different from it. For example, if

even we

+ * have set of arrays with 1 integer element in range [0;10] each,

element

+ * occurences are not independent. Because in the case of

independence we

Do you refer to a column where '{1,12,46}' and '{13,7}' may appear, but
'{6,19,4}' cannot appear?

I refer column where only one element exists, i.e. only possible values are
'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}', '{7}', '{8}', '{9}',
'{10}'. That is a corner case. But similar situation occurs when, for
example, we've distribution of distinct element count between 1 and 3. It
significantly differs from distribution from independent occurence.

+ * have probabilities of length of 0, 1, 2 etc. In the "column @>

const"

+ * and "column && const" cases we usually have "const" with low

summary

+ * frequency of elements (otherwise we have selectivity close to 0 or

1

+ * correspondingly). That's why effect of dependence related to

lengths

+ * distribution id negligible there. In the "column <@ const" case

summary

+ * frequency of elements is high (otherwise we have selectivity close

to 0).

What does the term "summary frequency" denote?

I meant summ of frequences of "const" array elements.

+     /*
+      * Rest is a average length of elements which aren't present in

mcelem.

+ */
+ rest = avg_length;

You define "rest" here as an array length ...

+
+     default_freq = Min(DEFAULT_CONT_SEL, minfreq / 2);
+
+     mcelem_index = 0;
+
+     /*
+      * mult is the multiplier that presents estimate of probability

that each

+      * mcelem which is not present in constant doesn't occur.
+      */
+     mult = 1.0f;
+
+     for (i = 0; i < nitems; i++)
+     {
+             bool            found = false;
+
+             /* Comparison with previous value in order to guarantee

uniquness */

+             if (i > 0)
+             {
+                     if (!element_compare(&array_data[i - 1],

&array_data[i]))

+                             continue;
+             }
+
+             /*
+              * Iterate over mcelem until find mcelem that is greater

or equal to

+ * element of constant. Simultaneously taking care about

rest and

+ * mult. If that mcelem is found then fill corresponding

elem_selec.

+              */
+             while (mcelem_index < nmcelem)
+             {
+                     int                     cmp =

element_compare(&mcelem[mcelem_index], &array_data[i]);

+
+                     if (cmp < 0)
+                     {
+                             mult *= (1.0f - numbers[mcelem_index]);
+                             rest -= numbers[mcelem_index];

... But here, you're subtracting a frequency from an array length?

Yes, because average distinct element count is summ of frequencies of
elements. Substracting mcelem frequencies from avg_length we have summ of
frequencies of non-mcelem elements.

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.9.patch.gzapplication/x-gzip; name=arrayanalyze-0.9.patch.gzDownload
#12Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#11)
1 attachment(s)
Re: Collect frequency statistics for arrays

On Sat, Jan 07, 2012 at 09:36:42PM +0400, Alexander Korotkov wrote:

Patch where most part of issues are fixed is attached.

Thanks. I've made several, largely cosmetic, edits. See attached version
0.10. Please use it as the basis for your next version, and feel free to
revert any changes you deem inappropriate. Where I made non-cosmetic edits, I
attempt to point that out below. I've left unfixed a few more-substantive
problems, also described below.

When you post another update, could you add it to the open CF? Given the
timing, I think we might as well consider any further activity to have
happened under the aegis of the 2012-01 CF. I'm marking the current entry
Returned with Feedback.

On Thu, Dec 29, 2011 at 8:35 PM, Noah Misch <noah@leadboat.com> wrote:

I find distressing the thought of having two copies of the lossy sampling
code, each implementing the algorithm with different variable names and

levels

of generality. We might someday extend this to hstore, and then we'd

have yet

another copy. Tom commented[1] that ts_typanalyze() and

array_typanalyze()

should remain distinct, and I agree. However, they could call a shared
counting module. Is that practical? Possible API:

typedef struct LossyCountCtl;
LossyCountCtl *LossyCountStart(float s,
float epsilon,
int2 typlen,
bool typbyval,
Oid eqfunc); /*

+ hash func, a few others */

void LossyCountAdd(LossyCountCtl *ctl, Datum elem);
TrackItem **LossyCountGetAll(LossyCountCtl *ctl);

[1]

http://archives.postgresql.org/message-id/12406.1298055475@sss.pgh.pa.us

I'm not sure about shared lossy counting module, because part of shared
code would be relatively small. Part of compute_array_stats function which
is taking care about array decompression, distinct occurence calculation,
disting element count histogram, packing statistics slots etc is much
larger than lossy counting algorithm itself. May be, there is some other
opinions in community?

True; it would probably increase total lines of code. The benefit, if any,
lies in separation of concerns; the business of implementing this algorithm is
quite different from the other roles of these typanalyze functions. I won't
insist that you try it, though.

I think this is an improvement, but some code out there may rely on the
ability to get stakind = 4 data from the most_common_vals column. We'll

need

to mention this in the release notes as an incompatibility.

I'm not sure I understand mechanism of release notes. Does it require
something in a patch itself?

No. I just wanted to call attention to the fact in the hope that someone
remembers as the release notes get drafted.

+             /*
+              * The probability of no occurence of events which forms

"rest"

+ * probability have a limit of exp(-rest) when number of

events fo to

+ * infinity. Another simplification is to replace that

events with one

+              * event with (1 - exp(-rest)) probability.
+              */
+             rest = 1.0f - exp(-rest);

What is the name of the underlying concept in probability theory?

The most closest concept to caculated distribution is multinomial
distribution. But it's not exactly same, because multinomial distribution
gives probability of particular count of each event occurece, not
probability of summary occurence. Actually, distribution is caclulated just
from assumption of events independence. The most closest concept of rest
probability is approximation by exponential distribution. It's quite rough
approximation, but I can't invent something better with low calculation
complexity.

Do you have a URL of a tutorial or paper that explains the method in more
detail? If, rather, this is a novel synthesis, could you write a proof to
include in the comments?

+ /*
+  * Array selectivity estimation based on most common elements

statistics for

+ * "column <@ const" case. Assumption that element occurences are

independent

+ * means certain distribution of array lengths. Typically real

distribution

+ * of lengths is significantly different from it. For example, if

even we

+ * have set of arrays with 1 integer element in range [0;10] each,

element

+ * occurences are not independent. Because in the case of

independence we

Do you refer to a column where '{1,12,46}' and '{13,7}' may appear, but
'{6,19,4}' cannot appear?

I refer column where only one element exists, i.e. only possible values are
'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}', '{7}', '{8}', '{9}',
'{10}'. That is a corner case. But similar situation occurs when, for
example, we've distribution of distinct element count between 1 and 3. It
significantly differs from distribution from independent occurence.

Oh, I think I see now. If each element 1..10 had frequency 0.1 independently,
column values would have exactly one distinct element just 39% of the time?

If probability theory has a prototypical problem resembling this, it would be
nice to include a URL to a thorough discussion thereof. I could not think of
the search terms to find one, though.

+ * have probabilities of length of 0, 1, 2 etc. In the "column @>

const"

+ * and "column && const" cases we usually have "const" with low

summary

+ * frequency of elements (otherwise we have selectivity close to 0 or

1

+ * correspondingly). That's why effect of dependence related to

lengths

+ * distribution id negligible there. In the "column <@ const" case

summary

+ * frequency of elements is high (otherwise we have selectivity close

to 0).

What does the term "summary frequency" denote?

I meant summ of frequences of "const" array elements.

Do you mean literally P_0 + P_1 ... + P_N? If so, I can follow the above
argument for "column && const" and "column <@ const", but not for "column @>
const". For "column @> const", selectivity cannot exceed the smallest
frequency among elements of "const". Several high-frequency elements together
will drive up the sum of frequencies without increasing the true selectivity.

+     /*
+      * Rest is a average length of elements which aren't present in

mcelem.

+ */
+ rest = avg_length;

You define "rest" here as an array length ...

+ rest -= numbers[mcelem_index];

... But here, you're subtracting a frequency from an array length?

Yes, because average distinct element count is summ of frequencies of
elements. Substracting mcelem frequencies from avg_length we have summ of
frequencies of non-mcelem elements.

I see now; thanks. I updated the comments so that this would have been
clearer to me.

*** /dev/null
--- b/src/backend/utils/adt/array_selfuncs.c
+ Selectivity
+ calc_scalararraysel(VariableStatData *vardata, Datum constval, bool orClause,
+ 					Oid operator)
+ {
+ 	Oid			elemtype;
+ 	Selectivity selec;
+ 	TypeCacheEntry *typentry;
+ 	Datum	   *hist;
+ 	int			nhist;
+ 	FunctionCallInfoData cmpfunc;
+ 
+ 	elemtype = get_base_element_type(vardata->vartype);
+ 
+ 
+ 	/* Get default comparison function */
+ 	typentry = lookup_type_cache(elemtype,
+ 		   TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO | TYPECACHE_EQ_OPR);
+ 
+ 	/* Handle only "=" operator. Return default selectivity in other cases. */
+ 	if (operator != typentry->eq_opr)
+ 		return (Selectivity) 0.5;

Punting on other operators this way creates a plan quality regression for
operations like "const < ANY (column)". Please do it some way that falls
back on the somewhat-better existing scalararraysel() treatment for this.

+ 
+ 	/* Without comparison function return default selectivity estimation */
+ 	if (!OidIsValid(typentry->cmp_proc))
+ 	{
+ 		if (orClause)
+ 			return DEFAULT_OVERLAP_SEL;
+ 		else
+ 			return DEFAULT_CONTAIN_SEL;
+ 	}

Since "const = ANY (column)" is equivalent to "column @> array[const]" and
"const = ALL (column)" is equivalent to "column <@ array[const]",
DEFAULT_CONTAIN_SEL is always correct here. I've made that change.

+ /*
+  * Calculate first n distinct element counts probabilities by histogram. We
+  * assume that any interval between a and b histogram values gives
+  * 1 / ((b - a + 1) * (nhist - 1)) probability to values between a and b and
+  * half of that to a and b. Returns total probability that distinct element
+  * count is less of equal to n.
+  */
+ static float
+ calc_hist(Datum *hist, int nhist, float *hist_part, int n)

To test this function, I ran the following test case:

set default_statistics_target = 4;
create table t3 as select array(select * from generate_series(1, v)) as arr
from (values (2),(2),(2),(3),(5),(5),(5)) v(v), generate_series(1,100);
analyze t3; -- length_histogram_bounds = {2,2,5,5}
select * from t3 where arr <@ array[6,7,8,9,10,11];

Using gdb to observe calc_hist()'s result during the last command:

(gdb) p calc_hist(hist, nhist, hist_part, unique_nitems)
$23 = 0.666666687
(gdb) x/6f hist_part
0xcd4bc8: 0 0 0.333333343 0
0xcd4bd8: 0 0.333333343

I expected an equal, nonzero probability in hist_part[3] and hist_part[4] and
a total probability of 1.0.

+ {
+ 	int			k,
+ 				i = 0,
+ 				prev_interval = 0,
+ 				next_interval = 0;
+ 	float		frac,
+ 				total = 0.0f;
+ 
+ 	/*
+ 	 * frac is a probability contribution by each interval between histogram
+ 	 * values. We have nhist - 1 intervals. Contribution of one will be 1 /
+ 	 * (nhist - 1).
+ 	 */
+ 	frac = 1.0f / ((float) (nhist - 1));
+ 	for (k = 0; k <= n; k++)
+ 	{
+ 		int			count = 0;
+ 
+ 		/* Count occurences of k distinct element counts in histogram. */
+ 		while (i < nhist && DatumGetInt32(hist[i]) <= k)
+ 		{
+ 			if (DatumGetInt32(hist[i]) == k)
+ 				count++;
+ 			i++;
+ 		}
+ 
+ 		if (count > 0)
+ 		{
+ 			float		val;
+ 
+ 			/* Find length between current histogram value and the next one */
+ 			if (i < nhist)
+ 				next_interval = DatumGetInt32(hist[i + 1]) -

Doesn't this read past the array end when i == nhist - 1?

+ /*
+  * Let be n independent events with probabilities p. This function calculates
+  * probabilities of exact k of events occurence for k in [0;m].
+  * Imagine matrix M of (n + 1) x (m + 1) size. Element M[i,j] denotes
+  * probability that exact j of first i events occurs. Obviously M[0,0] = 1.
+  * Each next event increase total number of occured events if it occurs and
+  * leave last value of that number if it doesn't occur. So, by the law of
+  * total probability: M[i,j] = M[i - 1, j] * (1 - p[i]) + M[i - 1, j - 1] * p[i]
+  * for i > 0, j > 0. M[i,0] = M[i - 1, 0] * (1 - p[i]) for i > 0.
+  * Also there could be some events with low probabilities. Their summary
+  * probability passed in the rest parameter.
+  */
+ static float *
+ calc_distr(float *p, int n, int m, float rest)

I attempted to clarify this comment; please see if I preserved its accuracy.

+ 	/*
+ 	 * Using of distinct element counts histogram requires O(nitems * (nmcelem
+ 	 * + nitems)) operations. It's reasonable to limit the number of required
+ 	 * operation and give less accurate answer when this limit exceed.
+ 	 */
+ 	if (nhist > 0 && unique_nitems <=
+ 		300 * default_statistics_target / (nmcelem + unique_nitems))

I benchmarked the quadratic complexity here. With default settings, this
cutoff skips the algorithm beginning around a 170-element constant array,
which would nonetheless take single-digit milliseconds to plan. When I
temporarily removed the cutoff, I needed much larger scales to get poor plan
times. 5000 elements took 180ms to plan, and 10000 elements took 620 ms.
Bottom line, the cutoff you've chosen is plenty conservative.

+ static int
+ element_compare(const void *key1, const void *key2, void *arg)
+ {
+ 	const Datum *d1 = (const Datum *) key1;
+ 	const Datum *d2 = (const Datum *) key2;
+ 	FunctionCallInfo cmpfunc = (FunctionCallInfo) arg;
+ 
+ 	cmpfunc   ->arg[0] = *d1;
+ 	cmpfunc   ->arg[1] = *d2;
+ 	cmpfunc   ->argnull[0] = false;
+ 	cmpfunc   ->argnull[1] = false;
+ 	cmpfunc   ->isnull = false;

This indented poorly due to "cmpfunc" having a place in our typedefs list. I
changed the identifier.

*** /dev/null
--- b/src/backend/utils/adt/array_typanalyze.c
+  *	We set s to be the estimated frequency of the K'th element in a natural
+  *	language's frequency table, where K is the target number of entries in
+  *	the MCELEM array. We assume that the distribution of element frequencies
+  *	follows Zipf's law with an exponent of 1.
+  *
+  *	Assuming Zipfian distribution, the frequency of the K'th element is equal
+  *	to 1/(K * H(W)) where H(n) is 1/2 + 1/3 + ... + 1/n and W is the number of
+  *	elements in the language.	Putting W as one million, we get roughly
+  *	0.07/K. This gives s = 0.07/K.	We set epsilon = s/10, which gives bucket
+  *	width w = K/0.007 and maximum expected hashtable size of about 1000 * K.

Given the lack of applicability to arrays, I replaced these last two
paragraphs with some weasel words. My gut feeling is that we're priming the
algorithm to deliver answers far more precise than needed. However, I haven't
attempted a principled replacement.

+ 	/* This is 'w' from the LC algorithm */
+ 	int			bucket_width;
+ 	int			array_no,
+ 				element_no;

I think it's possible for element_no to overflow. Consider rows with 2000
distinct elements apiece at a statistics target of 10000 (3M sample rows).
So, I made it a uint64.

+ extra_data = (ArrayAnalyzeExtraData *) stats->extra_data;

This still isn't reentrant; you'd need to save the existing static extra_data
and restore it on function exit. However, it turns out that do_analyze_rel()
itself isn't reentrant on account of its similar management of "anl_context";
any nested ANALYZE crashes the backend. So, I don't think we need further
change here. It will be easy to make reentrant later if necessary, though I'd
probably fix do_analyze_rel() by just throwing an error on recursive ANALYZE.

+ 	stats->extra_data = extra_data->std_extra_data;
+ 	old_context = CurrentMemoryContext;
+ 	extra_data->std_compute_stats(stats, fetchfunc, samplerows, totalrows);
+ 	MemoryContextSwitchTo(old_context);

Is the callee known to change CurrentMemoryContext and not restore it?
Offhand, I'm not seeing how it could do so.

+ 	/*
+ 	 * hashtable for arrays distinct element count.
+ 	 */
+ 	MemSet(&count_hash_ctl, 0, sizeof(count_hash_ctl));
+ 	count_hash_ctl.keysize = sizeof(int);
+ 	count_hash_ctl.entrysize = sizeof(DistinctElementCountItem);
+ 	count_hash_ctl.hash = tag_hash;
+ 	count_hash_ctl.match = memcmp;
+ 	count_hash_ctl.hcxt = CurrentMemoryContext;
+ 	count_tab = hash_create("Array distinct element count table",
+ 							64,
+ 							&count_hash_ctl,
+ 					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);

This HASH_COMPARE setting is redundant, so I've removed it.

+ 		/* Skip too large values. */
+ 		if (toast_raw_datum_size(value) > WIDTH_THRESHOLD)

Fixed this warning:

array_typanalyze.c: In function `compute_array_stats':
array_typanalyze.c:361: warning: implicit declaration of function `toast_raw_datum_size'

+ 			continue;
+ 		else
+ 			analyzed_rows++;
+ 
+ 		/*
+ 		 * Add up widths for average-width calculation.  Since it's a array,
+ 		 * we know it's varlena.  As in the regular compute_minimal_stats
+ 		 * function, we use the toasted width for this calculation.
+ 		 */
+ 		total_width += VARSIZE_ANY(DatumGetPointer(value));

Since this is now unused, I removed it.

+ 			/* Lookup current element in hashtable, adding it if new */
+ 			item = (TrackItem *) hash_search(elements_tab,
+ 											 (const void *) &hash_key,
+ 											 HASH_ENTER, &found);
+ 
+ 			if (found)
+ 			{

I added a pfree(hash_key) here. In one of my default_statistics_target=3000
tests on a table with few possible elements, this saved hundreds of megabytes
of memory.

+ 				int			i;
+ 
+ 				/*
+ 				 * The element is already on the tracking list. Check if it's
+ 				 * first occurence of this element in array.
+ 				 */
+ 				for (i = 0; i < occurence_index; i++)
+ 				{
+ 					if (occurences[i] == item)
+ 						break;
+ 				}

This wasn't what I had in mind when I suggested the different approach last
time. See how I changed it in this version, and let me know if you see any
essential disadvantages.

+ 		/* Update frequency of particular array distinct element count. */
+ 		count_item = (DistinctElementCountItem *) hash_search(count_tab,
+ 															&occurence_index,
+ 											  HASH_ENTER, &count_item_found);
+ 		if (count_item_found)
+ 			count_item->frequency++;
+ 		else
+ 		{
+ 			count_item->count = occurence_index;

The key gets initialized automatically, so I removed this line.

+ 			count_item->frequency = 1;
+ 		}
+ 		total_distinct_count += occurence_index;

total_distinct_count seemed to follow element_no exactly, so I removed it.

*** a/src/include/catalog/pg_statistic.h
--- b/src/include/catalog/pg_statistic.h
***************
*** 260,263 **** typedef FormData_pg_statistic *Form_pg_statistic;
--- 268,285 ----
*/
#define STATISTIC_KIND_MCELEM  4
+ /*
+  * A "length histogram" slot describes the distribution of lengths of data for
+  * datatypes where length is important for selectivity estimation. stavalues
+  * contains M (>=2) non-null values that divide the non-null column data values
+  * into M-1 bins of approximately equal population. The first stavalues item
+  * is the minimum length and the last is the maximum length. In dependence on
+  * datatype this slot can hold distribution of not exactly length, but of
+  * similar value. For instance, it hold distribution of distinct elements count
+  * for arrays, because multiple occurences of array elements are ignored by
+  * array comparison operators. 
+  *
+  */
+ #define STATISTIC_KIND_LENGTH_HISTOGRAM  5

I changed this text to say that we always store distinct element counts. We
can always update the comment later if we diversify its applications.

*** a/src/include/catalog/pg_type.h
--- b/src/include/catalog/pg_type.h

This now updates all array types except record[]. I'm don't know offhand how
to even make a non-empty value of type record[], let alone get it into a
context where ANALYZE would see it. However, is there a particular reason to
make that one different?

Thanks,
nm

Attachments:

arrayanalyze-0.10.patchtext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index be4bbc7..46121a7 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 8279,8286 ****
        <entry>
         A list of the most common values in the column. (Null if
         no values seem to be more common than any others.)
-        For some data types such as <type>tsvector</>, this is a list of
-        the most common element values rather than values of the type itself.
        </entry>
       </row>
  
--- 8279,8284 ----
***************
*** 8289,8300 ****
        <entry><type>real[]</type></entry>
        <entry></entry>
        <entry>
!        A list of the frequencies of the most common values or elements,
         i.e., number of occurrences of each divided by total number of rows.
         (Null when <structfield>most_common_vals</structfield> is.)
-        For some data types such as <type>tsvector</>, it can also store some
-        additional information, making it longer than the
-        <structfield>most_common_vals</> array.
        </entry>
       </row>
  
--- 8287,8295 ----
        <entry><type>real[]</type></entry>
        <entry></entry>
        <entry>
!        A list of the frequencies of the most common values,
         i.e., number of occurrences of each divided by total number of rows.
         (Null when <structfield>most_common_vals</structfield> is.)
        </entry>
       </row>
  
***************
*** 8326,8331 ****
--- 8321,8358 ----
         type does not have a <literal>&lt;</> operator.)
        </entry>
       </row>
+ 
+      <row>
+       <entry><structfield>most_common_elems</structfield></entry>
+       <entry><type>anyarray</type></entry>
+       <entry></entry>
+       <entry>
+        A list of element values most often appearing within values of the
+        column. (Null for scalar types.)
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>most_common_freqs</structfield></entry>
+       <entry><type>real[]</type></entry>
+       <entry></entry>
+       <entry>
+        A list of the frequencies of the most common element values, i.e., the
+        fraction of rows containing at least one of the given element.  Two or
+        four additional values follow those; they bear type-specific summary
+        information.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>length_histogram_bounds</structfield></entry>
+       <entry><type>int[]</type></entry>
+       <entry></entry>
+       <entry>
+        For <type>arrays</>, it holds a list of histogram bounds of
+        distinct elements count in array. For other datatypes, it is null.
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/index dc801ae..cdc4317 100644
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 45,50 ****
--- 45,51 ----
  #include "catalog/pg_namespace.h"
  #include "catalog/pg_statistic.h"
  #include "catalog/pg_tablespace.h"
+ #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
  #include "catalog/pg_type_fn.h"
  #include "catalog/storage.h"
***************
*** 1182,1188 **** heap_create_with_catalog(const char *relname,
  				   F_ARRAY_SEND,	/* array send (bin) proc */
  				   InvalidOid,	/* typmodin procedure - none */
  				   InvalidOid,	/* typmodout procedure - none */
! 				   InvalidOid,	/* analyze procedure - default */
  				   new_type_oid,	/* array element type - the rowtype */
  				   true,		/* yes, this is an array type */
  				   InvalidOid,	/* this has no array type */
--- 1183,1189 ----
  				   F_ARRAY_SEND,	/* array send (bin) proc */
  				   InvalidOid,	/* typmodin procedure - none */
  				   InvalidOid,	/* typmodout procedure - none */
! 				   F_ARRAY_TYPANALYZE,	/* special analyze procedure for arrays */
  				   new_type_oid,	/* array element type - the rowtype */
  				   true,		/* yes, this is an array type */
  				   InvalidOid,	/* this has no array type */
diff --git a/src/backend/catalog/index 50ba20c..3fea98f 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 117,145 **** CREATE VIEW pg_stats AS
          stawidth AS avg_width,
          stadistinct AS n_distinct,
          CASE
!             WHEN stakind1 IN (1, 4) THEN stavalues1
!             WHEN stakind2 IN (1, 4) THEN stavalues2
!             WHEN stakind3 IN (1, 4) THEN stavalues3
!             WHEN stakind4 IN (1, 4) THEN stavalues4
          END AS most_common_vals,
          CASE
!             WHEN stakind1 IN (1, 4) THEN stanumbers1
!             WHEN stakind2 IN (1, 4) THEN stanumbers2
!             WHEN stakind3 IN (1, 4) THEN stanumbers3
!             WHEN stakind4 IN (1, 4) THEN stanumbers4
          END AS most_common_freqs,
          CASE
              WHEN stakind1 = 2 THEN stavalues1
              WHEN stakind2 = 2 THEN stavalues2
              WHEN stakind3 = 2 THEN stavalues3
              WHEN stakind4 = 2 THEN stavalues4
          END AS histogram_bounds,
          CASE
              WHEN stakind1 = 3 THEN stanumbers1[1]
              WHEN stakind2 = 3 THEN stanumbers2[1]
              WHEN stakind3 = 3 THEN stanumbers3[1]
              WHEN stakind4 = 3 THEN stanumbers4[1]
!         END AS correlation
      FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
           JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
           LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
--- 117,170 ----
          stawidth AS avg_width,
          stadistinct AS n_distinct,
          CASE
!             WHEN stakind1 = 1 THEN stavalues1
!             WHEN stakind2 = 1 THEN stavalues2
!             WHEN stakind3 = 1 THEN stavalues3
!             WHEN stakind4 = 1 THEN stavalues4
!             WHEN stakind5 = 1 THEN stavalues5
          END AS most_common_vals,
          CASE
!             WHEN stakind1 = 1 THEN stanumbers1
!             WHEN stakind2 = 1 THEN stanumbers2
!             WHEN stakind3 = 1 THEN stanumbers3
!             WHEN stakind4 = 1 THEN stanumbers4
!             WHEN stakind5 = 1 THEN stanumbers5
          END AS most_common_freqs,
          CASE
              WHEN stakind1 = 2 THEN stavalues1
              WHEN stakind2 = 2 THEN stavalues2
              WHEN stakind3 = 2 THEN stavalues3
              WHEN stakind4 = 2 THEN stavalues4
+             WHEN stakind5 = 2 THEN stavalues5
          END AS histogram_bounds,
          CASE
              WHEN stakind1 = 3 THEN stanumbers1[1]
              WHEN stakind2 = 3 THEN stanumbers2[1]
              WHEN stakind3 = 3 THEN stanumbers3[1]
              WHEN stakind4 = 3 THEN stanumbers4[1]
!             WHEN stakind5 = 3 THEN stanumbers5[1]
!         END AS correlation,
!         CASE
!             WHEN stakind1 = 4 THEN stavalues1
!             WHEN stakind2 = 4 THEN stavalues2
!             WHEN stakind3 = 4 THEN stavalues3
!             WHEN stakind4 = 4 THEN stavalues4
!             WHEN stakind5 = 4 THEN stavalues5
!         END AS most_common_elems,
!         CASE
!             WHEN stakind1 = 4 THEN stanumbers1
!             WHEN stakind2 = 4 THEN stanumbers2
!             WHEN stakind3 = 4 THEN stanumbers3
!             WHEN stakind4 = 4 THEN stanumbers4
!             WHEN stakind5 = 4 THEN stanumbers5
!         END AS most_common_elem_freqs,
!         CASE
!             WHEN stakind1 = 5 THEN stavalues1
!             WHEN stakind2 = 5 THEN stavalues2
!             WHEN stakind3 = 5 THEN stavalues3
!             WHEN stakind4 = 5 THEN stavalues4
!             WHEN stakind5 = 5 THEN stavalues5
!         END AS length_histogram_bounds
      FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
           JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
           LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/backend/commands/analyze.cindex b40e57b..bfe5683 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***************
*** 110,117 **** static void update_attstats(Oid relid, bool inh,
  static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
  static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
  
- static bool std_typanalyze(VacAttrStats *stats);
- 
  
  /*
   *	analyze_rel() -- analyze one relation
--- 110,115 ----
***************
*** 1794,1800 **** static int	compare_mcvs(const void *a, const void *b);
  /*
   * std_typanalyze -- the default type-specific typanalyze function
   */
! static bool
  std_typanalyze(VacAttrStats *stats)
  {
  	Form_pg_attribute attr = stats->attr;
--- 1792,1798 ----
  /*
   * std_typanalyze -- the default type-specific typanalyze function
   */
! bool
  std_typanalyze(VacAttrStats *stats)
  {
  	Form_pg_attribute attr = stats->attr;
diff --git a/src/backend/commands/typindex 0f8af31..49ea30f 100644
*** a/src/backend/commands/typecmds.c
--- b/src/backend/commands/typecmds.c
***************
*** 609,615 **** DefineType(List *names, List *parameters)
  			   F_ARRAY_SEND,	/* send procedure */
  			   typmodinOid,		/* typmodin procedure */
  			   typmodoutOid,	/* typmodout procedure */
! 			   InvalidOid,		/* analyze procedure - default */
  			   typoid,			/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
--- 609,615 ----
  			   F_ARRAY_SEND,	/* send procedure */
  			   typmodinOid,		/* typmodin procedure */
  			   typmodoutOid,	/* typmodout procedure */
! 			   F_ARRAY_TYPANALYZE,	/* special analyze procedure for arrays */
  			   typoid,			/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
***************
*** 1140,1146 **** DefineEnum(CreateEnumStmt *stmt)
  			   F_ARRAY_SEND,	/* send procedure */
  			   InvalidOid,		/* typmodin procedure - none */
  			   InvalidOid,		/* typmodout procedure - none */
! 			   InvalidOid,		/* analyze procedure - default */
  			   enumTypeOid,		/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
--- 1140,1146 ----
  			   F_ARRAY_SEND,	/* send procedure */
  			   InvalidOid,		/* typmodin procedure - none */
  			   InvalidOid,		/* typmodout procedure - none */
! 			   F_ARRAY_TYPANALYZE,	/* special analyze procedure for arrays */
  			   enumTypeOid,		/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
***************
*** 1450,1456 **** DefineRange(CreateRangeStmt *stmt)
  			   F_ARRAY_SEND,	/* send procedure */
  			   InvalidOid,		/* typmodin procedure - none */
  			   InvalidOid,		/* typmodout procedure - none */
! 			   InvalidOid,		/* analyze procedure - default */
  			   typoid,			/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
--- 1450,1456 ----
  			   F_ARRAY_SEND,	/* send procedure */
  			   InvalidOid,		/* typmodin procedure - none */
  			   InvalidOid,		/* typmodout procedure - none */
! 			   F_ARRAY_TYPANALYZE,	/* special analyze procedure for arrays */
  			   typoid,			/* element type ID */
  			   true,			/* yes this is an array type */
  			   InvalidOid,		/* no further array type */
diff --git a/src/backend/utils/adt/Makindex 5f968b0..0c13d75 100644
*** a/src/backend/utils/adt/Makefile
--- b/src/backend/utils/adt/Makefile
***************
*** 15,21 **** override CFLAGS+= -mieee
  endif
  endif
  
! OBJS = acl.o arrayfuncs.o array_userfuncs.o arrayutils.o bool.o \
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o like.o lockfuncs.o \
--- 15,22 ----
  endif
  endif
  
! OBJS = acl.o arrayfuncs.o array_userfuncs.o arrayutils.o \
! 	array_selfuncs.o array_typanalyze.o bool.o \
  	cash.o char.o date.o datetime.o datum.o domains.o \
  	enum.o float.o format_type.o \
  	geo_ops.o geo_selfuncs.o int.o int8.o like.o lockfuncs.o \
diff --git a/src/backend/utils/adt/arnew file mode 100644
index 0000000..886516b
*** /dev/null
--- b/src/backend/utils/adt/array_selfuncs.c
***************
*** 0 ****
--- 1,970 ----
+ /*-------------------------------------------------------------------------
+  *
+  * array_selfuncs.c
+  *	  Functions for selectivity estimation of array operators.
+  *
+  * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/adt/array_selfuncs.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ 
+ #include "postgres.h"
+ 
+ #include "access/hash.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_collation.h"
+ #include "catalog/pg_operator.h"
+ #include "commands/defrem.h"
+ #include "commands/vacuum.h"
+ #include "utils/array.h"
+ #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"
+ #include "utils/typcache.h"
+ 
+ /* Default selectivity constant for "@>" and "<@" operators */
+ #define DEFAULT_CONTAIN_SEL 0.005
+ 
+ /* Default selectivity constant for "&&" operator */
+ #define DEFAULT_OVERLAP_SEL 0.01
+ 
+ /* Default selectivity for given operator */
+ #define DEFAULT_SEL(operator) \
+ 	((operator) == OID_ARRAY_OVERLAP_OP ? \
+ 		DEFAULT_OVERLAP_SEL : DEFAULT_CONTAIN_SEL)
+ 
+ /* Macro for selectivity estimation to be used if we have no statistics */
+ #define array_selec_no_stats(array,nitems,op,cmpfunc) \
+ 	mcelem_array_selec(array, nitems, typentry, NULL, 0, NULL, 0, NULL, 0, op, cmpfunc)
+ 
+ static Selectivity calc_arraysel(VariableStatData *vardata, Datum constval,
+ 			  Oid operator);
+ static Selectivity mcelem_array_selec(ArrayType *array, int nitems,
+ 				   TypeCacheEntry *typentry, Datum *mcelem, int nmcelem,
+ 				   float4 *numbers, int nnumbers, Datum *hist, int nhist,
+ 				   Oid operator, FunctionCallInfo cmpfunc);
+ static int	element_compare(const void *key1, const void *key2, void *arg);
+ static bool find_next_mcelem(Datum *mcelem, int nmcelem, Datum value,
+ 				 int *index, FunctionCallInfo cmpfunc);
+ static Selectivity mcelem_array_contain_overlap_selec(Datum *mcelem,
+    int nmcelem, float4 *numbers, Datum *array_data, int nitems, Oid operator,
+ 								   FunctionCallInfo cmpfunc);
+ static float calc_hist(Datum *hist, int nhist, float *hist_part, int n);
+ static Selectivity mcelem_array_contained_selec(Datum *mcelem, int nmcelem,
+ 							 float4 *numbers, Datum *array_data, int nitems,
+ 							 Datum *hist, int nhist, Oid operator,
+ 							 FunctionCallInfo cmpfunc);
+ static float *calc_distr(float *p, int n, int m, float rest);
+ 
+ /* selectivity for "const op ANY(column)" and "const op ALL(column)" */
+ Selectivity
+ calc_scalararraysel(VariableStatData *vardata, Datum constval, bool orClause,
+ 					Oid operator)
+ {
+ 	Oid			elemtype;
+ 	Selectivity selec;
+ 	TypeCacheEntry *typentry;
+ 	Datum	   *hist;
+ 	int			nhist;
+ 	FunctionCallInfoData cmpfunc;
+ 
+ 	elemtype = get_base_element_type(vardata->vartype);
+ 
+ 
+ 	/* Get default comparison function */
+ 	typentry = lookup_type_cache(elemtype,
+ 		   TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO | TYPECACHE_EQ_OPR);
+ 
+ 	/* Handle only "=" operator. Return default selectivity in other cases. */
+ 	if (operator != typentry->eq_opr)
+ 		return (Selectivity) 0.5;
+ 
+ 	/* Without a comparison function, return default selectivity estimation */
+ 	if (!OidIsValid(typentry->cmp_proc))
+ 		return DEFAULT_CONTAIN_SEL;
+ 
+ 	InitFunctionCallInfoData(cmpfunc, &typentry->cmp_proc_finfo, 2,
+ 							 DEFAULT_COLLATION_OID, NULL, NULL);
+ 
+ 	if (HeapTupleIsValid(vardata->statsTuple))
+ 	{
+ 		Form_pg_statistic stats;
+ 		Datum	   *values;
+ 		int			nvalues;
+ 		float4	   *numbers;
+ 		int			nnumbers;
+ 
+ 		stats = (Form_pg_statistic) GETSTRUCT(vardata->statsTuple);
+ 
+ 		/* MCELEM will be an array of same type as element */
+ 		if (get_attstatsslot(vardata->statsTuple,
+ 							 elemtype, vardata->atttypmod,
+ 							 STATISTIC_KIND_MCELEM, InvalidOid,
+ 							 NULL,
+ 							 &values, &nvalues,
+ 							 &numbers, &nnumbers))
+ 		{
+ 			/* For const = ALL(column) get histogram of distinct element count */
+ 			if (orClause
+ 				|| !get_attstatsslot(vardata->statsTuple,
+ 									 INT4OID, -1,
+ 								 STATISTIC_KIND_LENGTH_HISTOGRAM, InvalidOid,
+ 									 NULL,
+ 									 &hist, &nhist,
+ 									 NULL, NULL))
+ 			{
+ 				hist = NULL;
+ 				nhist = 0;
+ 			}
+ 
+ 			/* Use the most-common-elements slot for the array Var. */
+ 			if (orClause)
+ 				selec = mcelem_array_contain_overlap_selec(values, nvalues,
+ 					  numbers, &constval, 1, OID_ARRAY_CONTAIN_OP, &cmpfunc);
+ 			else
+ 				selec = mcelem_array_contained_selec(values, nvalues, numbers,
+ 												   &constval, 1, hist, nhist,
+ 										   OID_ARRAY_CONTAINED_OP, &cmpfunc);
+ 			if (hist)
+ 				free_attstatsslot(INT4OID, hist, nhist, NULL, 0);
+ 			free_attstatsslot(elemtype, values, nvalues, numbers, nnumbers);
+ 		}
+ 		else
+ 		{
+ 			/* No most-common-elements info, so do without */
+ 			if (orClause)
+ 				selec = mcelem_array_contain_overlap_selec(NULL, 0,
+ 						 NULL, &constval, 1, OID_ARRAY_CONTAIN_OP, &cmpfunc);
+ 			else
+ 				selec = mcelem_array_contained_selec(NULL, 0, NULL, &constval,
+ 							   1, NULL, 0, OID_ARRAY_CONTAINED_OP, &cmpfunc);
+ 		}
+ 
+ 		/*
+ 		 * MCE stats count only non-null rows, so adjust for null rows.
+ 		 */
+ 		selec *= (1.0 - stats->stanullfrac);
+ 	}
+ 	else
+ 	{
+ 		/* No stats at all, so do without */
+ 		selec = mcelem_array_contain_overlap_selec(NULL, 0, NULL, &constval,
+ 										  1, OID_ARRAY_CONTAIN_OP, &cmpfunc);
+ 		/* we assume no nulls here, so no stanullfrac correction */
+ 	}
+ 
+ 	return selec;
+ }
+ 
+ /*
+  * arraysel -- restriction selectivity for "column @> const", "column && const"
+  * and "column <@ const"
+  */
+ Datum
+ arraysel(PG_FUNCTION_ARGS)
+ {
+ 	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+ 
+ 	Oid			operator = PG_GETARG_OID(1);
+ 	List	   *args = (List *) PG_GETARG_POINTER(2);
+ 	int			varRelid = PG_GETARG_INT32(3);
+ 	VariableStatData vardata;
+ 	Node	   *other;
+ 	bool		varonleft;
+ 	Selectivity selec;
+ 	Oid			element_typeid;
+ 
+ 	/*
+ 	 * If expression is not (variable op pseudoconstant) or (pseudoconstant op
+ 	 * variable), then punt and return a default estimate.
+ 	 */
+ 	if (!get_restriction_variable(root, args, varRelid,
+ 								  &vardata, &other, &varonleft))
+ 		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+ 
+ 	/*
+ 	 * Can't do anything useful if the something is not a constant, either.
+ 	 */
+ 	if (!IsA(other, Const))
+ 	{
+ 		ReleaseVariableStats(vardata);
+ 		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+ 	}
+ 
+ 	/*
+ 	 * The "&&", "@>" and "<@" operators are strict, so we can cope with NULL
+ 	 * right away.
+ 	 */
+ 	if (((Const *) other)->constisnull)
+ 	{
+ 		ReleaseVariableStats(vardata);
+ 		PG_RETURN_FLOAT8(0.0);
+ 	}
+ 
+ 	if (!varonleft && operator == OID_ARRAY_CONTAIN_OP)
+ 		operator = OID_ARRAY_CONTAINED_OP;
+ 
+ 	/*
+ 	 * OK, there's a Var and a Const we're dealing with here.  We need the
+ 	 * Const to be a array with same element type as column, else we can't do
+ 	 * anything useful.
+ 	 */
+ 	element_typeid = get_base_element_type(((Const *) other)->consttype);
+ 	if (element_typeid != InvalidOid &&
+ 		element_typeid == get_base_element_type(vardata.vartype))
+ 	{
+ 		selec = calc_arraysel(&vardata, ((Const *) other)->constvalue,
+ 							  operator);
+ 	}
+ 	else
+ 	{
+ 		/* If we can't see the query structure, must punt */
+ 		selec = DEFAULT_SEL(operator);
+ 	}
+ 
+ 	ReleaseVariableStats(vardata);
+ 
+ 	CLAMP_PROBABILITY(selec);
+ 
+ 	PG_RETURN_FLOAT8((float8) selec);
+ }
+ 
+ /*
+  * Calculate selectivity for "column @> const", "column && const" and
+  * "column <@ const" based on the statistics.
+  */
+ static Selectivity
+ calc_arraysel(VariableStatData *vardata, Datum constval, Oid operator)
+ {
+ 	Selectivity selec;
+ 	ArrayType  *array;
+ 	int			ndims;
+ 	int		   *dims;
+ 	int			nitems;
+ 	TypeCacheEntry *typentry;
+ 	FunctionCallInfoData cmpfunc;
+ 
+ 	/*
+ 	 * The caller made sure the const is a array with same element type, so
+ 	 * get it now
+ 	 */
+ 	array = DatumGetArrayTypeP(constval);
+ 	ndims = ARR_NDIM(array);
+ 	dims = ARR_DIMS(array);
+ 	nitems = ArrayGetNItems(ndims, dims);
+ 
+ 	/* Get default comparison function */
+ 	typentry = lookup_type_cache(array->elemtype,
+ 							  TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO);
+ 
+ 	if (!OidIsValid(typentry->cmp_proc))
+ 		return DEFAULT_SEL(operator);
+ 
+ 	InitFunctionCallInfoData(cmpfunc, &typentry->cmp_proc_finfo, 2,
+ 							 DEFAULT_COLLATION_OID, NULL, NULL);
+ 
+ 	if (HeapTupleIsValid(vardata->statsTuple))
+ 	{
+ 		Form_pg_statistic stats;
+ 		Datum	   *values;
+ 		int			nvalues;
+ 		Datum	   *hist;
+ 		int			nhist;
+ 		float4	   *numbers;
+ 		int			nnumbers;
+ 
+ 		stats = (Form_pg_statistic) GETSTRUCT(vardata->statsTuple);
+ 
+ 		/* MCELEM will be an array of same type as column */
+ 		if (get_attstatsslot(vardata->statsTuple,
+ 							 array->elemtype, vardata->atttypmod,
+ 							 STATISTIC_KIND_MCELEM, InvalidOid,
+ 							 NULL,
+ 							 &values, &nvalues,
+ 							 &numbers, &nnumbers))
+ 		{
+ 			/*
+ 			 * For "array <@ const" case we also need histogram of distinct
+ 			 * element counts.
+ 			 */
+ 			if (operator != OID_ARRAY_CONTAINED_OP
+ 				|| !get_attstatsslot(vardata->statsTuple,
+ 									 INT4OID, -1,
+ 									 STATISTIC_KIND_LENGTH_HISTOGRAM,
+ 									 InvalidOid,
+ 									 NULL,
+ 									 &hist, &nhist,
+ 									 NULL, NULL))
+ 			{
+ 				hist = NULL;
+ 				nhist = 0;
+ 			}
+ 
+ 			/* Use the most-common-elements slot for the array Var. */
+ 			selec = mcelem_array_selec(array, nitems, typentry, values, nvalues,
+ 						 numbers, nnumbers, hist, nhist, operator, &cmpfunc);
+ 			free_attstatsslot(array->elemtype, values, nvalues, numbers,
+ 							  nnumbers);
+ 		}
+ 		else
+ 		{
+ 			/* No most-common-elements info, so do without */
+ 			selec = array_selec_no_stats(array, nitems, operator, &cmpfunc);
+ 		}
+ 
+ 		/*
+ 		 * MCE stats count only non-null rows, so adjust for null rows.
+ 		 */
+ 		selec *= (1.0 - stats->stanullfrac);
+ 	}
+ 	else
+ 	{
+ 		/* No stats at all, so do without */
+ 		selec = array_selec_no_stats(array, nitems, operator, &cmpfunc);
+ 		/* we assume no nulls here, so no stanullfrac correction */
+ 	}
+ 
+ 	return selec;
+ }
+ 
+ /*
+  * find_next_mcelem binary-searches a most common elements array, starting
+  * from *index, for the first member >= value.	It saves the position of the
+  * match into *index and returns true if it's an exact match.
+  */
+ static bool
+ find_next_mcelem(Datum *mcelem, int nmcelem, Datum value, int *index,
+ 				 FunctionCallInfo cmpfunc)
+ {
+ 	int			l = *index,
+ 				r = nmcelem - 1,
+ 				i,
+ 				res;
+ 
+ 	while (l <= r)
+ 	{
+ 		i = (l + r) / 2;
+ 		res = element_compare(&mcelem[i], &value, cmpfunc);
+ 		if (res == 0)
+ 		{
+ 			*index = i;
+ 			return true;
+ 		}
+ 		else if (res < 0)
+ 			l = i + 1;
+ 		else
+ 			r = i - 1;
+ 	}
+ 	*index = l;
+ 	return false;
+ }
+ 
+ /* Array selectivity estimation based on most common elements statistics. */
+ static Selectivity
+ mcelem_array_selec(ArrayType *array, int nitems, TypeCacheEntry *typentry,
+ 	  Datum *mcelem, int nmcelem, float4 *numbers, int nnumbers, Datum *hist,
+ 				   int nhist, Oid operator, FunctionCallInfo cmpfunc)
+ {
+ 	int			i;
+ 	char	   *ptr;
+ 	bits8	   *bitmap;
+ 	int			bitmask;
+ 	Datum	   *array_data;
+ 	bool		null_present;
+ 	int			nonnull_nitems;
+ 
+ 	/*
+ 	 * There should be four more Numbers than Values, because the last four
+ 	 * cells are taken for nulls, minimal frequency, maximal frequency, and
+ 	 * average distinct element count.	Punt if not.
+ 	 */
+ 	if (nnumbers != nmcelem + 4)
+ 		mcelem = NULL;
+ 
+ 	if (!mcelem)
+ 		nmcelem = 0;
+ 
+ 	/*
+ 	 * Prepare constant array data for sorting.  Sorting lets us find unique
+ 	 * elements and efficiently merge with the MCELEM array.
+ 	 */
+ 	array_data = (Datum *) palloc(sizeof(Datum) * nitems);
+ 	bitmap = ARR_NULLBITMAP(array);
+ 	ptr = ARR_DATA_PTR(array);
+ 	bitmask = 1;
+ 	nonnull_nitems = 0;
+ 	null_present = false;
+ 	for (i = 0; i < nitems; i++)
+ 	{
+ 		if (bitmap && (*bitmap & bitmask) == 0)
+ 			null_present = true;
+ 		else
+ 		{
+ 			/* Extract array data */
+ 			array_data[nonnull_nitems] = fetch_att(ptr, typentry->typbyval,
+ 												   typentry->typlen);
+ 			ptr = att_addlength_pointer(ptr, typentry->typlen, ptr);
+ 			ptr = (char *) att_align_nominal(ptr, typentry->typalign);
+ 			nonnull_nitems++;
+ 		}
+ 		/* Adjust bitmask and bitmap pointer */
+ 		bitmask <<= 1;
+ 		if (bitmask == 0x100)
+ 		{
+ 			if (bitmap)
+ 				bitmap++;
+ 			bitmask = 1;
+ 		}
+ 	}
+ 
+ 	/* Query "column @> '{smth., null}'" matches nothing. */
+ 	if (null_present && operator == OID_ARRAY_CONTAIN_OP)
+ 		return 0.0;
+ 
+ 	/* Sort extracted elements using their default comparison function. */
+ 	qsort_arg(array_data, nonnull_nitems, sizeof(Datum), element_compare, cmpfunc);
+ 
+ 	/* "column @> const" and "column && const" cases */
+ 	if (operator == OID_ARRAY_CONTAIN_OP || operator == OID_ARRAY_OVERLAP_OP)
+ 		return mcelem_array_contain_overlap_selec(mcelem, nmcelem, numbers,
+ 							  array_data, nonnull_nitems, operator, cmpfunc);
+ 
+ 	/* "column <@ const" case */
+ 	if (operator == OID_ARRAY_CONTAINED_OP)
+ 		return mcelem_array_contained_selec(mcelem, nmcelem, numbers,
+ 				 array_data, nonnull_nitems, hist, nhist, operator, cmpfunc);
+ 
+ 	elog(ERROR, "arraysel call for invalid operator (oid = %d)", operator);
+ 	return 0.0;					/* keep compiler quiet */
+ }
+ 
+ /* Fast function for floor value of 2 based logarithm calculation. */
+ static int
+ floor_log2(uint32 n)
+ {
+ 	int			pos = 0;
+ 
+ 	if (n == 0)
+ 		return -1;
+ 	if (n >= 1 << 16)
+ 	{
+ 		n >>= 16;
+ 		pos += 16;
+ 	}
+ 	if (n >= 1 << 8)
+ 	{
+ 		n >>= 8;
+ 		pos += 8;
+ 	}
+ 	if (n >= 1 << 4)
+ 	{
+ 		n >>= 4;
+ 		pos += 4;
+ 	}
+ 	if (n >= 1 << 2)
+ 	{
+ 		n >>= 2;
+ 		pos += 2;
+ 	}
+ 	if (n >= 1 << 1)
+ 	{
+ 		pos += 1;
+ 	}
+ 	return pos;
+ }
+ 
+ /*
+  * Estimate selectivity of "column @> const" and "column && const" based on
+  * most common element statistics.	This estimation assumes element
+  * occurrences are independent.
+  *
+  * TODO: this estimation probably could be improved by using the distinct
+  * element count histogram.  For example, excepting the special case of
+  * "column @> '{}'", we can multiply the calculated selectivity by the
+  * fraction of nonempty arrays in the column.
+  */
+ static Selectivity
+ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,
+ 							  float4 *numbers, Datum *array_data, int nitems,
+ 								   Oid operator, FunctionCallInfo cmpfunc)
+ {
+ 	Selectivity selec,
+ 				elem_selec;
+ 	int			mcelem_index,
+ 				i;
+ 	bool		use_bsearch;
+ 	float4		minfreq;
+ 
+ 	if (mcelem)
+ 	{
+ 		/*
+ 		 * Grab the lowest frequency.  compute_array_stats() stored it as the
+ 		 * second trailing number.
+ 		 */
+ 		minfreq = numbers[nmcelem + 1];
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Without statistics set minfreq so that minfreq / 2 =
+ 		 * DEFAULT_CONTAIN_SEL
+ 		 */
+ 		minfreq = 2 * DEFAULT_CONTAIN_SEL;
+ 	}
+ 
+ 	/* Decide whether it is faster to use binary search or not. */
+ 	if (nitems * floor_log2((unsigned int) nmcelem) < nmcelem + nitems)
+ 		use_bsearch = true;
+ 	else
+ 		use_bsearch = false;
+ 
+ 	if (operator == OID_ARRAY_CONTAIN_OP)
+ 	{
+ 		/*
+ 		 * Initial selectivity for "column @> const" query is 1.0, and it will
+ 		 * be decreased with each element of constant array.
+ 		 */
+ 		selec = 1.0;
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Initial selectivity for "column && const" query is 0.0, and it will
+ 		 * be increased with each element of constant array.
+ 		 */
+ 		selec = 0.0;
+ 	}
+ 	mcelem_index = 0;
+ 	for (i = 0; i < nitems; i++)
+ 	{
+ 		bool		found = false;
+ 
+ 		/* Compare with previous value and skip duplicates. */
+ 		if (i > 0 &&
+ 			!element_compare(&array_data[i - 1], &array_data[i], cmpfunc))
+ 			continue;
+ 
+ 		/* Find the smallest MCELEM >= this. */
+ 		if (use_bsearch)
+ 		{
+ 			found = find_next_mcelem(mcelem, nmcelem, array_data[i],
+ 									 &mcelem_index, cmpfunc);
+ 		}
+ 		else
+ 		{
+ 			while (mcelem_index < nmcelem)
+ 			{
+ 				int			cmp = element_compare(&mcelem[mcelem_index],
+ 												  &array_data[i], cmpfunc);
+ 
+ 				if (cmp < 0)
+ 					mcelem_index++;
+ 				else
+ 				{
+ 					/* mcelem is found */
+ 					if (cmp == 0)
+ 						found = true;
+ 					break;
+ 				}
+ 			}
+ 		}
+ 
+ 		if (found)
+ 		{
+ 			/* MCELEM is found; use its frequency. */
+ 			elem_selec = numbers[mcelem_index];
+ 			mcelem_index++;
+ 		}
+ 		else
+ 		{
+ 			/*
+ 			 * The element is not in MCELEM.  Punt, but assume that the
+ 			 * selectivity cannot be more than minfreq / 2.
+ 			 */
+ 			elem_selec = Min(DEFAULT_CONTAIN_SEL, minfreq / 2);
+ 		}
+ 
+ 		/*
+ 		 * Adjust overall selectivity using the current element's selectivity
+ 		 * and an assumption of element occurrence independence.
+ 		 */
+ 		if (operator == OID_ARRAY_CONTAIN_OP)
+ 			selec *= elem_selec;
+ 		else
+ 			selec = selec + elem_selec - selec * elem_selec;
+ 	}
+ 
+ 	/* Clamp intermediate results to stay sane despite roundoff error */
+ 	CLAMP_PROBABILITY(selec);
+ 
+ 	return selec;
+ }
+ 
+ /*
+  * Calculate the first n distinct element count probabilities from a
+  * histogram.  We assume that a histogram box with bounds a and b gives 1 /
+  * ((b - a + 1) * (nhist - 1)) probability to each value in (a,b) and an
+  * additional half of that to a and b themselves.  Returns the probability
+  * that the distinct element count is <= n.
+  */
+ static float
+ calc_hist(Datum *hist, int nhist, float *hist_part, int n)
+ {
+ 	int			k,
+ 				i = 0,
+ 				prev_interval = 0,
+ 				next_interval = 0;
+ 	float		frac,
+ 				total = 0.0f;
+ 
+ 	/*
+ 	 * frac is a probability contribution by each interval between histogram
+ 	 * values. We have nhist - 1 intervals. Contribution of one will be 1 /
+ 	 * (nhist - 1).
+ 	 */
+ 	frac = 1.0f / ((float) (nhist - 1));
+ 	for (k = 0; k <= n; k++)
+ 	{
+ 		int			count = 0;
+ 
+ 		/* Count the histogram boundaries precisely equal to k. */
+ 		while (i < nhist && DatumGetInt32(hist[i]) <= k)
+ 		{
+ 			if (DatumGetInt32(hist[i]) == k)
+ 				count++;
+ 			i++;
+ 		}
+ 
+ 		if (count > 0)
+ 		{
+ 			/* k is an exact bound for at least one histogram box. */
+ 			float		val;
+ 
+ 			/* Find length between current histogram value and the next one */
+ 			if (i < nhist)
+ 				next_interval = DatumGetInt32(hist[i + 1]) -
+ 					DatumGetInt32(hist[i]);
+ 			else
+ 				next_interval = 0;
+ 
+ 			/*
+ 			 * count - 1 histogram boxes contain k exclusively.  They
+ 			 * contribute a total of (count - 1) * frac probability.  Also
+ 			 * factor in the partial histogram boxes on either side.
+ 			 */
+ 			val = (float) (count - 1);
+ 			if (next_interval > 0)
+ 				val += 0.5f / ((float) next_interval);
+ 			if (prev_interval > 0)
+ 				val += 0.5f / ((float) prev_interval);
+ 			hist_part[k] = frac * val;
+ 			prev_interval = next_interval;
+ 		}
+ 		else
+ 		{
+ 			/* k does not appear as an exact histogram bound. */
+ 			if (prev_interval == 0)
+ 				hist_part[k] = 0.0f;
+ 			else
+ 				hist_part[k] = frac / ((float) prev_interval);
+ 		}
+ 		/* Accumulate total probability. */
+ 		total += hist_part[k];
+ 	}
+ 	return total;
+ }
+ 
+ /*
+  * Consider n independent events with probabilities p.	This function
+  * calculates probabilities of exact k of events occurrence for k in [0;m].
+  * Imagine matrix M of (n + 1) x (m + 1) size.	Element M[i,j] denotes the
+  * probability that exactly j of first i events occur.	Obviously M[0,0] = 1.
+  * For any constant j, each increment of i increases the probability iff the
+  * event occurs.  So, by the law of total probability:	M[i,j] = M[i - 1, j] *
+  * (1 - p[i]) + M[i - 1, j - 1] * p[i] for i > 0, j > 0. M[i,0] = M[i - 1, 0]
+  * * (1 - p[i]) for i > 0.	"rest" is the sum of the probabilities of all
+  * low-probability events not included in p.
+  */
+ static float *
+ calc_distr(float *p, int n, int m, float rest)
+ {
+ 	float	   *row,
+ 			   *prev_row,
+ 			   *tmp;
+ 	int			i,
+ 				j;
+ 
+ 	/*
+ 	 * Since we return only the last row of the matrix and need only the
+ 	 * current and previous row for calculations, allocate two rows.
+ 	 */
+ 	row = (float *) palloc(2 * (m + 1) * sizeof(float));
+ 	prev_row = row + (m + 1);
+ 
+ 	/* M[0,0] = 1 */
+ 	row[0] = 1.0f;
+ 	for (i = 1; i <= n; i++)
+ 	{
+ 		float		t = p[i - 1];
+ 
+ 		/* Swap rows */
+ 		tmp = row;
+ 		row = prev_row;
+ 		prev_row = tmp;
+ 		/* Calculate next row */
+ 		for (j = 0; j <= i && j <= m; j++)
+ 		{
+ 			float		val = 0.0f;
+ 
+ 			if (j < i)
+ 				val += prev_row[j] * (1.0f - t);
+ 			if (j > 0)
+ 				val += prev_row[j - 1] * t;
+ 			row[j] = val;
+ 
+ 		}
+ 	}
+ 
+ 	/* Take care about events with low probabilities. */
+ 	if (rest > 0.0f)
+ 	{
+ 		/*
+ 		 * The probability of no occurrence of events contributing to the
+ 		 * "rest" probability has a limit of exp(-rest) when the number of
+ 		 * events is high.	Another simplification is to replace those events
+ 		 * with one event having (1 - exp(-rest)) probability.
+ 		 */
+ 		rest = 1.0f - exp(-rest);
+ 		for (i = 0; i <= m; i++)
+ 		{
+ 			if (i < m)
+ 				row[i + 1] += row[i] * rest;
+ 			row[i] *= (1.0f - rest);
+ 		}
+ 	}
+ 	return row;
+ }
+ 
+ /*
+  * Estimate selectivity of "column <@ const" based on most common element
+  * statistics.	Independent element occurrence would imply a particular
+  * distribution of distinct element counts among matching rows.  Real data
+  * usually falsifies that assumption.  For example, in a set of 1-element
+  * integer arrays having elements in the range [0;10], element occurrences are
+  * not independent.  If they were, a sufficiently-large set would include all
+  * distinct element counts 0 through 11.  We correct for this using the
+  * histogram of distinct element counts.
+  *
+  * In the "column @> const" and "column && const" cases, we usually have
+  * "const" with low summary frequency of elements (otherwise we have
+  * selectivity close to 0 or 1 correspondingly).  That's why the effect of
+  * dependence related to distinct element counts distribution is negligible
+  * there.  In the "column <@ const" case, summary frequency of elements is
+  * high (otherwise we have selectivity close to 0).  That's why we should do
+  * correction due to array distinct element counts distribution.
+  */
+ static Selectivity
+ mcelem_array_contained_selec(Datum *mcelem, int nmcelem,
+ 							 float4 *numbers, Datum *array_data, int nitems,
+ 							 Datum *hist, int nhist, Oid operator,
+ 							 FunctionCallInfo cmpfunc)
+ {
+ 	int			mcelem_index,
+ 				i,
+ 				unique_nitems = 0;
+ 	float		selec,
+ 				minfreq,
+ 				default_freq,
+ 				nullelem_freq;
+ 	float	   *dist,
+ 			   *mcelem_dist,
+ 			   *hist_part;
+ 	float		avg_count,
+ 				mult,
+ 				rest;
+ 	float	   *elem_selec;
+ 
+ 	/*
+ 	 * elem_selec is array of estimated frequencies for elements in the
+ 	 * constant.
+ 	 */
+ 	elem_selec = (float *) palloc(sizeof(float) * nitems);
+ 
+ 	if (mcelem)
+ 	{
+ 		/*
+ 		 * Grab some of the summary statistics that compute_array_stats()
+ 		 * stores: frequency of the null elements, lowest frequency, and
+ 		 * average distinct element count.
+ 		 */
+ 		nullelem_freq = numbers[nmcelem];
+ 		minfreq = numbers[nmcelem + 1];
+ 		avg_count = numbers[nmcelem + 3];
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 * Without statistics set minfreq so that minfreq / 2 =
+ 		 * DEFAULT_CONTAIN_SEL
+ 		 */
+ 		nullelem_freq = 0.0f;
+ 		minfreq = 2 * DEFAULT_CONTAIN_SEL;
+ 		avg_count = 10.0f;
+ 	}
+ 
+ 	/*
+ 	 * "rest" will be the sum of the frequencies of all elements not
+ 	 * represented in MCELEM.  The average distinct element count is the sum
+ 	 * of the frequencies of *all* elements.  Begin with that; we will proceed
+ 	 * to subtract the MCELEM frequencies.
+ 	 */
+ 	rest = avg_count;
+ 
+ 	default_freq = Min(DEFAULT_CONTAIN_SEL, minfreq / 2);
+ 
+ 	mcelem_index = 0;
+ 
+ 	/*
+ 	 * mult is the multiplier that presents estimate of probability that each
+ 	 * mcelem which is not present in constant doesn't occur.
+ 	 */
+ 	mult = 1.0f;
+ 
+ 	for (i = 0; i < nitems; i++)
+ 	{
+ 		bool		found = false;
+ 
+ 		/* Compare with previous value and skip duplicates. */
+ 		if (i > 0 &&
+ 			!element_compare(&array_data[i - 1], &array_data[i], cmpfunc))
+ 			continue;
+ 		unique_nitems++;
+ 
+ 		/*
+ 		 * Iterate over MCELEM until we find an entry greater than or equal to
+ 		 * this element of the constant.  Simultaneously update "rest" and
+ 		 * "mult".	If we find an exact match, update elem_selec.
+ 		 */
+ 		while (mcelem_index < nmcelem)
+ 		{
+ 			int			cmp = element_compare(&mcelem[mcelem_index], &array_data[i],
+ 											  cmpfunc);
+ 
+ 			if (cmp < 0)
+ 			{
+ 				mult *= (1.0f - numbers[mcelem_index]);
+ 				rest -= numbers[mcelem_index];
+ 				mcelem_index++;
+ 			}
+ 			else
+ 			{
+ 				if (cmp == 0)
+ 				{
+ 					elem_selec[unique_nitems - 1] = numbers[mcelem_index];
+ 					rest -= numbers[mcelem_index];
+ 					found = true;
+ 				}
+ 				break;
+ 			}
+ 		}
+ 
+ 		if (found)
+ 		{
+ 			mcelem_index++;
+ 		}
+ 		else
+ 		{
+ 			/*
+ 			 * The element is not in MCELEM.  Punt, but assume that the
+ 			 * selectivity cannot be more than minfreq / 2.
+ 			 */
+ 			elem_selec[unique_nitems - 1] = Min(DEFAULT_CONTAIN_SEL,
+ 												minfreq / 2);
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * If we handled all constant elements without exhausting the MCELEM
+ 	 * array, finish walking it to complete "rest" and "mult".
+ 	 */
+ 	while (mcelem_index < nmcelem)
+ 	{
+ 		mult *= (1.0f - numbers[mcelem_index]);
+ 		rest -= numbers[mcelem_index];
+ 		mcelem_index++;
+ 	}
+ 
+ 	/*
+ 	 * We should take care about elements which aren't in mcelem... somehow...
+ 	 */
+ 	mult *= exp(-rest);
+ 
+ 	/*
+ 	 * Using the distinct element count histogram requires O(nitems * (nmcelem
+ 	 * + nitems)) operations.  Beyond a certain computational cost threshold,
+ 	 * it's reasonable to sacrifice accuracy for decreased plan time.
+ 	 */
+ 	if (nhist > 0 && unique_nitems <=
+ 		300 * default_statistics_target / (nmcelem + unique_nitems))
+ 	{
+ 		/*
+ 		 * Calculate probabilities of each distinct element count for both
+ 		 * mcelems and constant elements.  At this point, assume independent
+ 		 * element occurrence.
+ 		 */
+ 		dist = calc_distr(elem_selec, unique_nitems, unique_nitems, 0.0f);
+ 		mcelem_dist = calc_distr(numbers, nmcelem, unique_nitems, rest);
+ 
+ 		hist_part = (float *) palloc((unique_nitems + 1) * sizeof(float));
+ 		calc_hist(hist, nhist, hist_part, unique_nitems);
+ 
+ 		selec = 0.0f;
+ 
+ 		for (i = 0; i <= unique_nitems; i++)
+ 		{
+ 			/*
+ 			 * mult * dist[i] / mcelem_dist[i] gives us probability of qual
+ 			 * matching from assumption of independent element occurrence with
+ 			 * the condition that distinct element count = i.
+ 			 */
+ 			if (hist_part[i] > 0)
+ 				selec += hist_part[i] * mult * dist[i] / mcelem_dist[i];
+ 		}
+ 	}
+ 	else
+ 	{
+ 		/* We don't have histogram.  Use a rough estimate. */
+ 		selec = mult;
+ 	}
+ 
+ 	/* Take into account occurrence of NULL element. */
+ 	selec *= (1.0f - nullelem_freq);
+ 
+ 	CLAMP_PROBABILITY(selec);
+ 
+ 	return selec;
+ }
+ 
+ /*
+  * Comparison function for elements. Based on default comparison function for
+  * array element data type.
+  */
+ static int
+ element_compare(const void *key1, const void *key2, void *arg)
+ {
+ 	const Datum *d1 = (const Datum *) key1;
+ 	const Datum *d2 = (const Datum *) key2;
+ 	FunctionCallInfo cmpf = (FunctionCallInfo) arg;
+ 
+ 	cmpf->arg[0] = *d1;
+ 	cmpf->arg[1] = *d2;
+ 	cmpf->argnull[0] = false;
+ 	cmpf->argnull[1] = false;
+ 	cmpf->isnull = false;
+ 
+ 	return DatumGetInt32(FunctionCallInvoke(cmpf));
+ }
diff --git a/src/backend/utils/adt/array_typanew file mode 100644
index 0000000..8b9a1a8
*** /dev/null
--- b/src/backend/utils/adt/array_typanalyze.c
***************
*** 0 ****
--- 1,759 ----
+ /*-------------------------------------------------------------------------
+  *
+  * array_typanalyze.c
+  *	  functions for gathering statistics from array columns
+  *
+  * Portions Copyright (c) 1996-2011, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/adt/array_typanalyze.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ 
+ #include "postgres.h"
+ 
+ #include "access/hash.h"
+ #include "access/tuptoaster.h"
+ #include "catalog/pg_am.h"
+ #include "catalog/pg_collation.h"
+ #include "catalog/pg_operator.h"
+ #include "commands/vacuum.h"
+ #include "commands/defrem.h"
+ #include "parser/parse_oper.h"
+ #include "utils/array.h"
+ #include "utils/builtins.h"
+ #include "utils/datum.h"
+ #include "utils/hsearch.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"
+ #include "utils/typcache.h"
+ 
+ /*
+  * To avoid consuming too much memory, IO and CPU load during analysis, we
+  * ignore arrays that are wider than WIDTH_THRESHOLD (after detoasting!).
+  */
+ #define WIDTH_THRESHOLD 0x10000
+ 
+ /* Extra data for compute_array_stats function */
+ typedef struct
+ {
+ 	/* Information about element type */
+ 	Oid			type_id;
+ 	Oid			eq_opr;
+ 	bool		typbyval;
+ 	int16		typlen;
+ 	char		typalign;
+ 	FunctionCallInfoData cmp,
+ 				eq,
+ 				hash;
+ 	FmgrInfo	hash_func_info;
+ 
+ 	/* std_typanalyze() state */
+ 	void	   *std_extra_data;
+ 	void		(*std_compute_stats) (VacAttrStatsP stats,
+ 											  AnalyzeAttrFetchFunc fetchfunc,
+ 												  int samplerows,
+ 												  double totalrows);
+ } ArrayAnalyzeExtraData;
+ 
+ static ArrayAnalyzeExtraData *extra_data;
+ 
+ /* A hash table entry for the Lossy Counting algorithm */
+ typedef struct
+ {
+ 	Datum		key;			/* This is 'e' from the LC algorithm. */
+ 	int			last_container; /* Supports deduplication. */
+ 	int			frequency;		/* This is 'f'. */
+ 	int			delta;			/* And this is 'delta'. */
+ } TrackItem;
+ 
+ /* A entry for distinct element count hash table */
+ typedef struct
+ {
+ 	int			count;
+ 	int			frequency;
+ }	DistinctElementCountItem;
+ 
+ static void compute_array_stats(VacAttrStats *stats,
+ 		   AnalyzeAttrFetchFunc fetchfunc, int samplerows, double totalrows);
+ static void prune_element_hashtable(HTAB *elements_tab, int b_current);
+ static uint32 element_hash(const void *key, Size keysize);
+ static int	element_match(const void *key1, const void *key2, Size keysize);
+ static int	element_compare(const void *key1, const void *key2);
+ static int	trackitem_compare_frequencies_desc(const void *e1, const void *e2);
+ static int	trackitem_compare_element(const void *e1, const void *e2);
+ static int	countitem_compare_element(const void *e1, const void *e2);
+ 
+ /*
+  *	array_typanalyze -- a custom typanalyze function for array columns
+  */
+ Datum
+ array_typanalyze(PG_FUNCTION_ARGS)
+ {
+ 	VacAttrStats *stats = (VacAttrStats *) PG_GETARG_POINTER(0);
+ 	TypeCacheEntry *typentry;
+ 	Oid			hash_opclass,
+ 				hash_opfamily,
+ 				element_typeid,
+ 				hash_proc;
+ 	ArrayAnalyzeExtraData *extra_data;
+ 
+ 	/*
+ 	 * Call the standard typanalyze function.  It may fail to find needed
+ 	 * operators, in which case we also can't do anything.
+ 	 */
+ 	if (!std_typanalyze(stats))
+ 		PG_RETURN_BOOL(false);
+ 
+ 	/*
+ 	 * Gather information about the element type.  If we fail to find
+ 	 * something, leave the state from std_typanalyze() in place.
+ 	 */
+ 	element_typeid = stats->attrtype->typelem;
+ 
+ 	if (!OidIsValid(element_typeid))
+ 		elog(ERROR, "array_typanalyze was invoked with %d non-array type",
+ 			 stats->attrtypid);
+ 
+ 	typentry = lookup_type_cache(element_typeid, TYPECACHE_EQ_OPR |
+ 	 TYPECACHE_CMP_PROC | TYPECACHE_EQ_OPR_FINFO | TYPECACHE_CMP_PROC_FINFO);
+ 
+ 	if (!OidIsValid(typentry->cmp_proc) || !OidIsValid(typentry->eq_opr))
+ 		PG_RETURN_BOOL(true);
+ 
+ 	hash_opclass = GetDefaultOpClass(element_typeid, HASH_AM_OID);
+ 	if (!OidIsValid(hash_opclass))
+ 		PG_RETURN_BOOL(true);
+ 
+ 	hash_opfamily = get_opclass_family(hash_opclass);
+ 	if (!OidIsValid(hash_opfamily))
+ 		PG_RETURN_BOOL(true);
+ 
+ 	hash_proc = get_opfamily_proc(hash_opfamily, element_typeid,
+ 								  element_typeid, HASHPROC);
+ 	if (!OidIsValid(hash_proc))
+ 		PG_RETURN_BOOL(true);
+ 
+ 	/* Store our findings for use by compute_array_stats() */
+ 	extra_data = (ArrayAnalyzeExtraData *) palloc(sizeof(ArrayAnalyzeExtraData));
+ 	fmgr_info(hash_proc, &extra_data->hash_func_info);
+ 	InitFunctionCallInfoData(extra_data->cmp, &typentry->cmp_proc_finfo,
+ 							 2, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	InitFunctionCallInfoData(extra_data->eq, &typentry->eq_opr_finfo,
+ 							 2, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	InitFunctionCallInfoData(extra_data->hash, &extra_data->hash_func_info,
+ 							 1, DEFAULT_COLLATION_OID, NULL, NULL);
+ 	extra_data->type_id = typentry->type_id;
+ 	extra_data->typbyval = typentry->typbyval;
+ 	extra_data->typlen = typentry->typlen;
+ 	extra_data->typalign = typentry->typalign;
+ 	extra_data->eq_opr = typentry->eq_opr;
+ 	extra_data->std_extra_data = stats->extra_data;
+ 	extra_data->std_compute_stats = stats->compute_stats;
+ 
+ 	/* Save old extra_data and compute_stats for scalar statistics. */
+ 	stats->compute_stats = compute_array_stats;
+ 	stats->extra_data = extra_data;
+ 
+ 	PG_RETURN_BOOL(true);
+ }
+ 
+ /*
+  *	compute_array_stats() -- compute statistics for a array column
+  *
+  *	This function computes statistics useful for determining selectivity for
+  *	operators <@, &&, and @>.
+  *
+  *	In addition to finding the most common values, as we do for most
+  *	datatypes, find the most common array elements and compute a histogram of
+  *	distinct element counts.  Exact duplicates of an entire array may be rare
+  *	despite many arrays sharing individual elements.  This especially afflicts
+  *	long arrays, which are also liable to lack all scalar statistics due to
+  *	the analyze.c WIDTH_THRESHOLD.
+  *
+  *	The algorithm used is Lossy Counting, as proposed in the paper "Approximate
+  *	frequency counts over data streams" by G. S. Manku and R. Motwani, in
+  *	Proceedings of the 28th International Conference on Very Large Data Bases,
+  *	Hong Kong, China, August 2002, section 4.2. The paper is available at
+  *	http://www.vldb.org/conf/2002/S10P03.pdf
+  *
+  *	The Lossy Counting (aka LC) algorithm goes like this:
+  *	Let s be the threshold frequency for an item (the minimum frequency we
+  *	are interested in) and epsilon the error margin for the frequency. Let D
+  *	be a set of triples (e, f, delta), where e is an element value, f is that
+  *	element's frequency (actually, its current occurrence count) and delta is
+  *	the maximum error in f. We start with D empty and process the elements in
+  *	batches of size w. (The batch size is also known as "bucket size" and is
+  *	equal to 1/epsilon.) Let the current batch number be b_current, starting
+  *	with 1. For each element e we either increment its f count, if it's
+  *	already in D, or insert a new triple into D with values (e, 1, b_current
+  *	- 1). After processing each batch we prune D, by removing from it all
+  *	elements with f + delta <= b_current.  After the algorithm finishes we
+  *	suppress all elements from D that do not satisfy f >= (s - epsilon) * N,
+  *	where N is the total number of elements in the input.  We emit the
+  *	remaining elements with estimated frequency f/N.  The LC paper proves
+  *	that this algorithm finds all elements with true frequency at least s,
+  *	and that no frequency is overestimated or is underestimated by more than
+  *	epsilon.  Furthermore, given reasonable assumptions about the input
+  *	distribution, the required table size is no more than about 7 times w.
+  *
+  *	In the absence of a principled basis for other particular values, we
+  *	follow ts_typanalyze() and use parameters s = 0.07/K, epsilon = s/10.  We
+  *	merely leave out the correction for stopwords, which do not apply to
+  *	arrays.  These parameters give bucket width w = K/0.007 and maximum
+  *	expected hashtable size of about 1000 * K.
+  *
+  *	Elements may repeat within an array.  Since duplicates do not change the
+  *	behavior of <@, && or @>, take measures to count each element only once
+  *	per array.	Therefore, we store in the finished pg_statistic entry each
+  *	element's frequency as the fraction of all non-null rows that bear it.
+  *	Divide the raw counts by nonnull_cnt to get those figures.
+  */
+ static void
+ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
+ 					int samplerows, double totalrows)
+ {
+ 	int			num_mcelem;
+ 	int			null_cnt = 0;
+ 	int			analyzed_rows = 0;
+ 
+ 	/*
+ 	 * We should count not only null array values, but also null array
+ 	 * elements
+ 	 */
+ 	int			null_elem_cnt = 0;
+ 
+ 	/* This is D from the LC algorithm. */
+ 	HTAB	   *elements_tab;
+ 	HASHCTL		elem_hash_ctl;
+ 	HASH_SEQ_STATUS scan_status;
+ 
+ 	/* This is the current bucket number from the LC algorithm */
+ 	int			b_current;
+ 
+ 	/* This is 'w' from the LC algorithm */
+ 	int			bucket_width;
+ 	int			array_no;
+ 	uint64		element_no;
+ 	Datum		hash_key;
+ 	TrackItem  *item;
+ 
+ 	int			count_items_count;
+ 	int			count_item_index;
+ 	int			slot_idx = 0;
+ 	HTAB	   *count_tab;
+ 	HASHCTL		count_hash_ctl;
+ 	DistinctElementCountItem *count_item;
+ 	DistinctElementCountItem *sorted_count_items_tab;
+ 	MemoryContext old_context;
+ 
+ 	extra_data = (ArrayAnalyzeExtraData *) stats->extra_data;
+ 	stats->extra_data = extra_data->std_extra_data;
+ 	old_context = CurrentMemoryContext;
+ 	extra_data->std_compute_stats(stats, fetchfunc, samplerows, totalrows);
+ 	MemoryContextSwitchTo(old_context);
+ 
+ 	/*
+ 	 * We want statistics_target * 10 elements in the MCELEM array. This
+ 	 * multiplier is pretty arbitrary, but is meant to reflect the fact that
+ 	 * the number of individual elements tracked in pg_statistic ought to be
+ 	 * more than the number of values for a simple scalar column.
+ 	 */
+ 	num_mcelem = stats->attr->attstattarget * 10;
+ 
+ 	/*
+ 	 * We set bucket width equal to num_mcelem / 0.007 as per the comment
+ 	 * above.
+ 	 */
+ 	bucket_width = num_mcelem * 1000 / 7;
+ 
+ 	/*
+ 	 * Create the hashtable. It will be in local memory, so we don't need to
+ 	 * worry about overflowing the initial size. Also we don't need to pay any
+ 	 * attention to locking and memory management.
+ 	 */
+ 	MemSet(&elem_hash_ctl, 0, sizeof(elem_hash_ctl));
+ 	elem_hash_ctl.keysize = sizeof(Datum);
+ 	elem_hash_ctl.entrysize = sizeof(TrackItem);
+ 	elem_hash_ctl.hash = element_hash;
+ 	elem_hash_ctl.match = element_match;
+ 	elem_hash_ctl.hcxt = CurrentMemoryContext;
+ 	elements_tab = hash_create("Analyzed elements table",
+ 							   bucket_width * 7,
+ 							   &elem_hash_ctl,
+ 					HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
+ 
+ 	/* hashtable for arrays distinct element count */
+ 	MemSet(&count_hash_ctl, 0, sizeof(count_hash_ctl));
+ 	count_hash_ctl.keysize = sizeof(int);
+ 	count_hash_ctl.entrysize = sizeof(DistinctElementCountItem);
+ 	count_hash_ctl.hash = tag_hash;
+ 	count_hash_ctl.hcxt = CurrentMemoryContext;
+ 	count_tab = hash_create("Array distinct element count table",
+ 							64,
+ 							&count_hash_ctl,
+ 							HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
+ 
+ 	/* Initialize counters. */
+ 	b_current = 1;
+ 	element_no = 0;
+ 
+ 	/* Loop over the arrays. */
+ 	for (array_no = 0; array_no < samplerows; array_no++)
+ 	{
+ 		Datum		value;
+ 		bool		isnull;
+ 		bool		null_present;
+ 		ArrayType  *array;
+ 		char	   *ptr;
+ 		bits8	   *bitmap;
+ 		int			bitmask;
+ 		int			j;
+ 		int			ndims;
+ 		int		   *dims;
+ 		int			nitems;
+ 		uint64		prev_element_no = element_no;
+ 		int			distinct_count;
+ 		bool		count_item_found;
+ 
+ 		vacuum_delay_point();
+ 
+ 		value = fetchfunc(stats, array_no, &isnull);
+ 		if (isnull)
+ 		{
+ 			null_cnt++;
+ 			continue;
+ 		}
+ 
+ 		/* Skip too-large values. */
+ 		if (toast_raw_datum_size(value) > WIDTH_THRESHOLD)
+ 			continue;
+ 		else
+ 			analyzed_rows++;
+ 
+ 		/*
+ 		 * Now detoast the array if needed.
+ 		 */
+ 		array = DatumGetArrayTypeP(value);
+ 		ptr = ARR_DATA_PTR(array);
+ 		bitmap = ARR_NULLBITMAP(array);
+ 		bitmask = 1;
+ 		ndims = ARR_NDIM(array);
+ 		dims = ARR_DIMS(array);
+ 		nitems = ArrayGetNItems(ndims, dims);
+ 
+ 		null_present = false;
+ 
+ 		/*
+ 		 * We loop through the elements in the array and add them to our
+ 		 * tracking hashtable.
+ 		 */
+ 		for (j = 0; j < nitems; j++)
+ 		{
+ 			bool		found;
+ 			bool		isnull;
+ 
+ 			/* Get elements, checking for NULL */
+ 			if (bitmap && (*bitmap & bitmask) == 0)
+ 			{
+ 				hash_key = (Datum) 0;
+ 				isnull = true;
+ 				null_present = true;
+ 			}
+ 			else
+ 			{
+ 				/* Must copy the target values into anl_context */
+ 				old_context = MemoryContextSwitchTo(stats->anl_context);
+ 
+ 				/* Get element value */
+ 				hash_key = datumCopy(fetch_att(ptr, extra_data->typbyval,
+ 											   extra_data->typlen),
+ 									 extra_data->typbyval,
+ 									 extra_data->typlen);
+ 				isnull = false;
+ 				ptr = att_addlength_pointer(ptr, extra_data->typlen, ptr);
+ 				ptr = (char *) att_align_nominal(ptr, extra_data->typalign);
+ 
+ 				MemoryContextSwitchTo(old_context);
+ 			}
+ 
+ 			/* Advance bitmap pointers if any */
+ 			bitmask <<= 1;
+ 			if (bitmask == 0x100)
+ 			{
+ 				if (bitmap)
+ 					bitmap++;
+ 				bitmask = 1;
+ 			}
+ 
+ 			/* No null element processing other then flag setting here */
+ 			if (isnull)
+ 				continue;
+ 
+ 			/* Lookup current element in hashtable, adding it if new */
+ 			item = (TrackItem *) hash_search(elements_tab,
+ 											 (const void *) &hash_key,
+ 											 HASH_ENTER, &found);
+ 
+ 			if (found)
+ 			{
+ 				if (!extra_data->typbyval)
+ 					pfree(DatumGetPointer(hash_key));
+ 
+ 				/*
+ 				 * The operators we assist ignore duplicate array elements.
+ 				 * Count a given distinct element once per array.
+ 				 */
+ 				if (item->last_container != array_no)
+ 				{
+ 					item->last_container = array_no;
+ 					item->frequency++;
+ 					element_no++;
+ 				}
+ 			}
+ 			else
+ 			{
+ 				/* Initialize new tracking list element */
+ 				item->last_container = array_no;
+ 				item->frequency = 1;
+ 				item->delta = b_current - 1;
+ 				element_no++;
+ 			}
+ 
+ 			/* We prune the D structure after processing each bucket */
+ 			if (element_no % bucket_width == 0)
+ 			{
+ 				prune_element_hashtable(elements_tab, b_current);
+ 				b_current++;
+ 			}
+ 		}
+ 
+ 		/* Count null element presence once per array. */
+ 		if (null_present)
+ 			null_elem_cnt++;
+ 
+ 		/* Update frequency of the particular array distinct element count. */
+ 		distinct_count = element_no - prev_element_no;
+ 		count_item = (DistinctElementCountItem *)
+ 			hash_search(count_tab, &distinct_count,
+ 						HASH_ENTER, &count_item_found);
+ 
+ 		if (count_item_found)
+ 			count_item->frequency++;
+ 		else
+ 			count_item->frequency = 1;
+ 
+ 		/* Free memory allocated while detoasting. */
+ 		if (PointerGetDatum(array) != value)
+ 			pfree(array);
+ 	}
+ 
+ 	/* Skip slots occupied by standard statistics */
+ 	while (OidIsValid(stats->stakind[slot_idx]))
+ 		slot_idx++;
+ 
+ 	/* Fill histogram of distinct element counts. */
+ 	count_items_count = hash_get_num_entries(count_tab);
+ 	if (count_items_count > 0)
+ 	{
+ 		int			num_hist = stats->attr->attstattarget;
+ 		int			delta;
+ 		int			frac;
+ 		int			i;
+ 		Datum	   *hist_values;
+ 
+ 		/*
+ 		 * Copy distinct elements count statistics from hashtab to array and
+ 		 * sort them.
+ 		 */
+ 		count_item_index = 0;
+ 		sorted_count_items_tab = (DistinctElementCountItem *)
+ 			palloc(sizeof(DistinctElementCountItem) * count_items_count);
+ 		hash_seq_init(&scan_status, count_tab);
+ 		while ((count_item =
+ 		 (DistinctElementCountItem *) hash_seq_search(&scan_status)) != NULL)
+ 		{
+ 			memcpy(&sorted_count_items_tab[count_item_index], count_item,
+ 				   sizeof(DistinctElementCountItem));
+ 			count_item_index++;
+ 		}
+ 		qsort(sorted_count_items_tab, count_items_count,
+ 			  sizeof(DistinctElementCountItem), countitem_compare_element);
+ 
+ 		/* Histogram should be stored in anl_context. */
+ 		hist_values = (Datum *) MemoryContextAlloc(stats->anl_context,
+ 												   sizeof(Datum) * num_hist);
+ 		/* Fill histogram by hashtab. */
+ 		delta = analyzed_rows - null_cnt - 1;
+ 		count_item_index = 0;
+ 		frac = sorted_count_items_tab[0].frequency * (num_hist - 1);
+ 		for (i = 0; i < num_hist; i++)
+ 		{
+ 			hist_values[i] =
+ 				Int32GetDatum(sorted_count_items_tab[count_item_index].count);
+ 			frac -= delta;
+ 			while (frac <= 0)
+ 			{
+ 				count_item_index++;
+ 				frac += sorted_count_items_tab[count_item_index].frequency *
+ 					(num_hist - 1);
+ 			}
+ 		}
+ 
+ 		stats->stakind[slot_idx] = STATISTIC_KIND_LENGTH_HISTOGRAM;
+ 		stats->staop[slot_idx] = Int4LessOperator;
+ 		stats->stavalues[slot_idx] = hist_values;
+ 		stats->numvalues[slot_idx] = num_hist;
+ 		stats->statypid[slot_idx] = INT4OID;
+ 		stats->statyplen[slot_idx] = 4;
+ 		stats->statypbyval[slot_idx] = true;
+ 		stats->statypalign[slot_idx] = 'i';
+ 		slot_idx++;
+ 	}
+ 
+ 	/* We can only compute real stats if we found some non-null values. */
+ 	if (null_cnt < analyzed_rows)
+ 	{
+ 		int			nonnull_cnt = analyzed_rows - null_cnt;
+ 		int			i;
+ 		TrackItem **sort_table;
+ 		int			track_len;
+ 		int			cutoff_freq;
+ 		int			minfreq,
+ 					maxfreq;
+ 
+ 		/*
+ 		 * Construct an array of the interesting hashtable items, that is,
+ 		 * those meeting the cutoff frequency (s - epsilon)*N.	Also identify
+ 		 * the minimum and maximum frequencies among these items.
+ 		 *
+ 		 * Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
+ 		 * frequency is 9*N / bucket_width.
+ 		 */
+ 		cutoff_freq = 9 * element_no / bucket_width;
+ 
+ 		i = hash_get_num_entries(elements_tab); /* surely enough space */
+ 		sort_table = (TrackItem **) palloc(sizeof(TrackItem *) * i);
+ 
+ 		hash_seq_init(&scan_status, elements_tab);
+ 		track_len = 0;
+ 		minfreq = element_no;
+ 		maxfreq = 0;
+ 		while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
+ 		{
+ 			if (item->frequency > cutoff_freq)
+ 			{
+ 				sort_table[track_len++] = item;
+ 				minfreq = Min(minfreq, item->frequency);
+ 				maxfreq = Max(maxfreq, item->frequency);
+ 			}
+ 		}
+ 		Assert(track_len <= i);
+ 
+ 		/* emit some statistics for debug purposes */
+ 		elog(DEBUG3, "array: target # mces = %d, bucket width = %d, "
+ 			 "# elements = %lu, hashtable size = %d, usable entries = %d",
+ 			 num_mcelem, bucket_width, element_no, i, track_len);
+ 
+ 		/*
+ 		 * If we obtained more elements than we really want, get rid of those
+ 		 * with least frequencies.	The easiest way is to qsort the array into
+ 		 * descending frequency order and truncate the array.
+ 		 */
+ 		if (num_mcelem < track_len)
+ 		{
+ 			qsort(sort_table, track_len, sizeof(TrackItem *),
+ 				  trackitem_compare_frequencies_desc);
+ 			/* reset minfreq to the smallest frequency we're keeping */
+ 			minfreq = sort_table[num_mcelem - 1]->frequency;
+ 		}
+ 		else
+ 			num_mcelem = track_len;
+ 
+ 		/* Generate MCELEM slot entry */
+ 		if (num_mcelem > 0)
+ 		{
+ 			MemoryContext old_context;
+ 			Datum	   *mcelem_values;
+ 			float4	   *mcelem_freqs;
+ 
+ 			/*
+ 			 * We want to store statistics sorted on the element value using
+ 			 * the element type's default comparison function.  This permits
+ 			 * fast binary searches in selectivity estimation functions.
+ 			 */
+ 			qsort(sort_table, num_mcelem, sizeof(TrackItem *),
+ 				  trackitem_compare_element);
+ 
+ 			/* Must copy the target values into anl_context */
+ 			old_context = MemoryContextSwitchTo(stats->anl_context);
+ 
+ 			/*
+ 			 * We sorted statistics on the element value, but we want to be
+ 			 * able to find the minimal and maximal frequencies without going
+ 			 * through all the values.	We also want the frequency of the null
+ 			 * element and the average distinct element count.	Store those
+ 			 * four values at the end of mcelem_freqs.
+ 			 */
+ 			mcelem_values = (Datum *) palloc(num_mcelem * sizeof(Datum));
+ 			mcelem_freqs = (float4 *) palloc((num_mcelem + 4) * sizeof(float4));
+ 
+ 			/*
+ 			 * See comments above about use of nonnull_cnt as the divisor for
+ 			 * the final frequency estimates.
+ 			 */
+ 			for (i = 0; i < num_mcelem; i++)
+ 			{
+ 				TrackItem  *item = sort_table[i];
+ 
+ 				mcelem_values[i] = item->key;
+ 				mcelem_freqs[i] = (double) item->frequency /
+ 					(double) nonnull_cnt;
+ 			}
+ 			mcelem_freqs[i++] = (double) null_elem_cnt / (double) nonnull_cnt;
+ 			mcelem_freqs[i++] = (double) minfreq / (double) nonnull_cnt;
+ 			mcelem_freqs[i++] = (double) maxfreq / (double) nonnull_cnt;
+ 			mcelem_freqs[i++] = (double) element_no / (double) nonnull_cnt;
+ 			MemoryContextSwitchTo(old_context);
+ 
+ 			stats->stakind[slot_idx] = STATISTIC_KIND_MCELEM;
+ 			stats->staop[slot_idx] = extra_data->eq_opr;
+ 			stats->stanumbers[slot_idx] = mcelem_freqs;
+ 			/* See above comment about extra fields */
+ 			stats->numnumbers[slot_idx] = num_mcelem + 4;
+ 			stats->stavalues[slot_idx] = mcelem_values;
+ 			stats->numvalues[slot_idx] = num_mcelem;
+ 			/* We are storing values of element type */
+ 			stats->statypid[slot_idx] = extra_data->type_id;
+ 			stats->statyplen[slot_idx] = extra_data->typlen;
+ 			stats->statypbyval[slot_idx] = extra_data->typbyval;
+ 			stats->statypalign[slot_idx] = extra_data->typalign;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * We don't need to bother cleaning up any of our temporary palloc's. The
+ 	 * hashtable should also go away, as it used a child memory context.
+ 	 */
+ }
+ 
+ /*
+  *	A function to prune the D structure from the Lossy Counting algorithm.
+  *	Consult compute_tsvector_stats() for wider explanation.
+  */
+ static void
+ prune_element_hashtable(HTAB *elements_tab, int b_current)
+ {
+ 	HASH_SEQ_STATUS scan_status;
+ 	TrackItem  *item;
+ 
+ 	hash_seq_init(&scan_status, elements_tab);
+ 	while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
+ 	{
+ 		if (item->frequency + item->delta <= b_current)
+ 		{
+ 			Datum		value = item->key;
+ 
+ 			if (hash_search(elements_tab, (const void *) item,
+ 							HASH_REMOVE, NULL) == NULL)
+ 				elog(ERROR, "hash table corrupted");
+ 			/* We should free memory if element is not passed by value */
+ 			if (!extra_data->typbyval)
+ 				pfree(DatumGetPointer(value));
+ 		}
+ 	}
+ }
+ 
+ /*
+  * Hash functions for elements. Based on default hash opclass.
+  */
+ static uint32
+ element_hash(const void *key, Size keysize)
+ {
+ 	const Datum *l = (const Datum *) key;
+ 
+ 	extra_data->hash.arg[0] = *l;
+ 	extra_data->hash.argnull[0] = false;
+ 	extra_data->hash.isnull = false;
+ 	return DatumGetInt32(FunctionCallInvoke(&extra_data->hash));
+ }
+ 
+ /*
+  * Matching function for elements, to be used in hashtable lookups.
+  */
+ static int
+ element_match(const void *key1, const void *key2, Size keysize)
+ {
+ 	const Datum *d1 = (const Datum *) key1;
+ 	const Datum *d2 = (const Datum *) key2;
+ 
+ 	extra_data->eq.arg[0] = *d1;
+ 	extra_data->eq.arg[1] = *d2;
+ 	extra_data->eq.argnull[0] = false;
+ 	extra_data->eq.argnull[1] = false;
+ 	extra_data->eq.isnull = false;
+ 	return !DatumGetInt32(FunctionCallInvoke(&extra_data->eq));
+ }
+ 
+ /*
+  * Comparison function for elements, based on default comparison function for
+  * element data type.
+  *
+  * XXX this may as well use SortSupport
+  */
+ static int
+ element_compare(const void *key1, const void *key2)
+ {
+ 	const Datum *d1 = (const Datum *) key1;
+ 	const Datum *d2 = (const Datum *) key2;
+ 
+ 	extra_data->cmp.arg[0] = *d1;
+ 	extra_data->cmp.arg[1] = *d2;
+ 	extra_data->cmp.argnull[0] = false;
+ 	extra_data->cmp.argnull[1] = false;
+ 	extra_data->cmp.isnull = false;
+ 	return DatumGetInt32(FunctionCallInvoke(&extra_data->cmp));
+ }
+ 
+ /*
+  *	qsort() comparator for sorting TrackItems on frequencies (descending sort)
+  */
+ static int
+ trackitem_compare_frequencies_desc(const void *e1, const void *e2)
+ {
+ 	const TrackItem *const * t1 = (const TrackItem *const *) e1;
+ 	const TrackItem *const * t2 = (const TrackItem *const *) e2;
+ 
+ 	return (*t2)->frequency - (*t1)->frequency;
+ }
+ 
+ /*
+  *	qsort() comparator for sorting TrackItems on elements
+  */
+ static int
+ trackitem_compare_element(const void *e1, const void *e2)
+ {
+ 	const TrackItem *const * t1 = (const TrackItem *const *) e1;
+ 	const TrackItem *const * t2 = (const TrackItem *const *) e2;
+ 
+ 	return element_compare(&(*t1)->key, &(*t2)->key);
+ }
+ 
+ /*
+  *	qsort() comparator for sorting DistinctElementCountItem on elements
+  */
+ static int
+ countitem_compare_element(const void *e1, const void *e2)
+ {
+ 	const DistinctElementCountItem *t1 = (const DistinctElementCountItem *) e1;
+ 	const DistinctElementCountItem *t2 = (const DistinctElementCountItem *) e2;
+ 
+ 	if (t1->count < t2->count)
+ 		return -1;
+ 	else if (t1->count == t2->count)
+ 		return 0;
+ 	else
+ 		return 1;
+ }
diff --git a/src/backend/utils/adt/selfuncs.c bindex da638f8..e6ab0f1 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
***************
*** 1705,1710 **** scalararraysel(PlannerInfo *root,
--- 1705,1735 ----
  	RegProcedure oprsel;
  	FmgrInfo	oprselproc;
  	Selectivity s1;
+ 	bool		varonleft;
+ 	Node	   *other;
+ 	VariableStatData vardata;
+ 
+ 	/* Handle "const = qual(column)" case using array column statistics. */
+ 	if (get_restriction_variable(root, clause->args, varRelid,
+ 								 &vardata, &other, &varonleft))
+ 	{
+ 		Oid			elemtype = get_base_element_type(vardata.vartype);
+ 
+ 		if (OidIsValid(elemtype) && IsA(other, Const))
+ 		{
+ 			if (((Const *) other)->constisnull)
+ 			{
+ 				/* qual can't succeed if null array */
+ 				ReleaseVariableStats(vardata);
+ 				return (Selectivity) 0.0;
+ 			}
+ 			s1 = calc_scalararraysel(&vardata, ((Const *) other)->constvalue,
+ 									 useOr, operator);
+ 			ReleaseVariableStats(vardata);
+ 			return s1;
+ 		}
+ 		ReleaseVariableStats(vardata);
+ 	}
  
  	/*
  	 * First, look up the underlying operator's selectivity estimator. Punt if
diff --git a/src/include/catalog/pg_opeindex f19865d..ad88dc3 100644
*** a/src/include/catalog/pg_operator.h
--- b/src/include/catalog/pg_operator.h
***************
*** 130,135 **** DATA(insert OID =  96 ( "="		   PGNSP PGUID b t t	23	23	16	96 518 int4eq eqsel e
--- 130,136 ----
  DESCR("equal");
  DATA(insert OID =  97 ( "<"		   PGNSP PGUID b f f	23	23	16 521 525 int4lt scalarltsel scalarltjoinsel ));
  DESCR("less than");
+ #define Int4LessOperator	97
  DATA(insert OID =  98 ( "="		   PGNSP PGUID b t t	25	25	16	98 531 texteq eqsel eqjoinsel ));
  DESCR("equal");
  #define TextEqualOperator	98
***************
*** 1513,1524 **** DATA(insert OID = 2590 (  "|&>"    PGNSP PGUID b f f 718 718	16	 0	 0 circle_ove
  DESCR("overlaps or is above");
  
  /* overlap/contains/contained for arrays */
! DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap areasel areajoinsel ));
  DESCR("overlaps");
! DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains contsel contjoinsel ));
  DESCR("contains");
! DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained contsel contjoinsel ));
  DESCR("is contained by");
  
  /* capturing operators to preserve pre-8.3 behavior of text concatenation */
  DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - ));
--- 1514,1528 ----
  DESCR("overlaps or is above");
  
  /* overlap/contains/contained for arrays */
! DATA(insert OID = 2750 (  "&&"	   PGNSP PGUID b f f 2277 2277	16 2750  0 arrayoverlap arraysel areajoinsel ));
  DESCR("overlaps");
! #define OID_ARRAY_OVERLAP_OP	2750
! DATA(insert OID = 2751 (  "@>"	   PGNSP PGUID b f f 2277 2277	16 2752  0 arraycontains arraysel contjoinsel ));
  DESCR("contains");
! #define OID_ARRAY_CONTAIN_OP	2751
! DATA(insert OID = 2752 (  "<@"	   PGNSP PGUID b f f 2277 2277	16 2751  0 arraycontained arraysel contjoinsel ));
  DESCR("is contained by");
+ #define OID_ARRAY_CONTAINED_OP	2752
  
  /* capturing operators to preserve pre-8.3 behavior of text concatenation */
  DATA(insert OID = 2779 (  "||"	   PGNSP PGUID b f f 25 2776	25	 0 0 textanycat - - ));
diff --git a/src/include/catalog/pg_procindex 355c61a..623e749 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***************
*** 865,870 **** DATA(insert OID = 2334 (  array_agg_finalfn   PGNSP PGUID 12 1 0 0 0 f f f f f i
--- 865,874 ----
  DESCR("aggregate final function");
  DATA(insert OID = 2335 (  array_agg		   PGNSP PGUID 12 1 0 0 0 t f f f f i 1 0 2277 "2283" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
  DESCR("concatenate aggregate input into an array");
+ DATA(insert OID = 3816 (  array_typanalyze PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 16 "2281" _null_ _null_ _null_ _null_ array_typanalyze _null_ _null_ _null_ ));
+ DESCR("array statistics collector");
+ DATA(insert OID = 3817 (  arraysel		   PGNSP PGUID 12 1 0 0 0 f f f t f s 4 0 701 "2281 26 2281 23" _null_ _null_ _null_ _null_ arraysel _null_ _null_ _null_ ));
+ DESCR("array selectivity estimation functions");
  
  DATA(insert OID = 760 (  smgrin			   PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 210 "2275" _null_ _null_ _null_ _null_	smgrin _null_ _null_ _null_ ));
  DESCR("I/O");
diff --git a/src/include/catalog/pg_index 7d1d127..cab2826 100644
*** a/src/include/catalog/pg_statistic.h
--- b/src/include/catalog/pg_statistic.h
***************
*** 98,108 **** CATALOG(pg_statistic,2619) BKI_WITHOUT_OIDS
--- 98,110 ----
  	int2		stakind2;
  	int2		stakind3;
  	int2		stakind4;
+ 	int2		stakind5;
  
  	Oid			staop1;
  	Oid			staop2;
  	Oid			staop3;
  	Oid			staop4;
+ 	Oid			staop5;
  
  	/*
  	 * THE REST OF THESE ARE VARIABLE LENGTH FIELDS, and may even be absent
***************
*** 115,120 **** CATALOG(pg_statistic,2619) BKI_WITHOUT_OIDS
--- 117,123 ----
  	float4		stanumbers2[1];
  	float4		stanumbers3[1];
  	float4		stanumbers4[1];
+ 	float4		stanumbers5[1];
  
  	/*
  	 * Values in these arrays are values of the column's data type.  We
***************
*** 125,133 **** CATALOG(pg_statistic,2619) BKI_WITHOUT_OIDS
  	anyarray	stavalues2;
  	anyarray	stavalues3;
  	anyarray	stavalues4;
  } FormData_pg_statistic;
  
! #define STATISTIC_NUM_SLOTS  4
  
  #undef anyarray
  
--- 128,137 ----
  	anyarray	stavalues2;
  	anyarray	stavalues3;
  	anyarray	stavalues4;
+ 	anyarray	stavalues5;
  } FormData_pg_statistic;
  
! #define STATISTIC_NUM_SLOTS  5
  
  #undef anyarray
  
***************
*** 143,149 **** typedef FormData_pg_statistic *Form_pg_statistic;
   *		compiler constants for pg_statistic
   * ----------------
   */
! #define Natts_pg_statistic				22
  #define Anum_pg_statistic_starelid		1
  #define Anum_pg_statistic_staattnum		2
  #define Anum_pg_statistic_stainherit	3
--- 147,153 ----
   *		compiler constants for pg_statistic
   * ----------------
   */
! #define Natts_pg_statistic				26
  #define Anum_pg_statistic_starelid		1
  #define Anum_pg_statistic_staattnum		2
  #define Anum_pg_statistic_stainherit	3
***************
*** 154,179 **** typedef FormData_pg_statistic *Form_pg_statistic;
  #define Anum_pg_statistic_stakind2		8
  #define Anum_pg_statistic_stakind3		9
  #define Anum_pg_statistic_stakind4		10
! #define Anum_pg_statistic_staop1		11
! #define Anum_pg_statistic_staop2		12
! #define Anum_pg_statistic_staop3		13
! #define Anum_pg_statistic_staop4		14
! #define Anum_pg_statistic_stanumbers1	15
! #define Anum_pg_statistic_stanumbers2	16
! #define Anum_pg_statistic_stanumbers3	17
! #define Anum_pg_statistic_stanumbers4	18
! #define Anum_pg_statistic_stavalues1	19
! #define Anum_pg_statistic_stavalues2	20
! #define Anum_pg_statistic_stavalues3	21
! #define Anum_pg_statistic_stavalues4	22
  
  /*
!  * Currently, three statistical slot "kinds" are defined: most common values,
!  * histogram, and correlation.	Additional "kinds" will probably appear in
!  * future to help cope with non-scalar datatypes.  Also, custom data types
!  * can define their own "kind" codes by mutual agreement between a custom
!  * typanalyze routine and the selectivity estimation functions of the type's
!  * operators.
   *
   * Code reading the pg_statistic relation should not assume that a particular
   * data "kind" will appear in any particular slot.	Instead, search the
--- 158,186 ----
  #define Anum_pg_statistic_stakind2		8
  #define Anum_pg_statistic_stakind3		9
  #define Anum_pg_statistic_stakind4		10
! #define Anum_pg_statistic_stakind5		11
! #define Anum_pg_statistic_staop1		12
! #define Anum_pg_statistic_staop2		13
! #define Anum_pg_statistic_staop3		14
! #define Anum_pg_statistic_staop4		15
! #define Anum_pg_statistic_staop5		16
! #define Anum_pg_statistic_stanumbers1	17
! #define Anum_pg_statistic_stanumbers2	18
! #define Anum_pg_statistic_stanumbers3	19
! #define Anum_pg_statistic_stanumbers4	20
! #define Anum_pg_statistic_stanumbers5	21
! #define Anum_pg_statistic_stavalues1	22
! #define Anum_pg_statistic_stavalues2	23
! #define Anum_pg_statistic_stavalues3	24
! #define Anum_pg_statistic_stavalues4	25
! #define Anum_pg_statistic_stavalues5	26
  
  /*
!  * Currently, five statistical slot "kinds" are defined: most common values,
!  * histogram, correlation, most common elements and histogram of distinct
!  * element count.  Also, custom data types can define their own "kind" codes
!  * by mutual agreement between a custom typanalyze routine and the selectivity
!  * estimation functions of the type's operators.
   *
   * Code reading the pg_statistic relation should not assume that a particular
   * data "kind" will appear in any particular slot.	Instead, search the
***************
*** 260,263 **** typedef FormData_pg_statistic *Form_pg_statistic;
--- 267,280 ----
   */
  #define STATISTIC_KIND_MCELEM  4
  
+ /*
+  * A "length histogram" slot resembles a "histogram" slot in structure.
+  * Instead of actual column values, the population consists of counts of
+  * distinct elements found within the column values.  stavalues contains M
+  * (>=2) non-null values that divide the non-null column data values into M-1
+  * bins of approximately equal population.  The first stavalues item is the
+  * minimum count and the last is the maximum count.
+  */
+ #define STATISTIC_KIND_LENGTH_HISTOGRAM  5
+ 
  #endif   /* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_type.index e12efe4..2580a38 100644
*** a/src/include/catalog/pg_type.h
--- b/src/include/catalog/pg_type.h
***************
*** 353,359 **** DATA(insert OID = 83 (	pg_class		PGNSP PGUID -1 f c C f t \054 1259 0 0 record_i
  DATA(insert OID = 142 ( xml		   PGNSP PGUID -1 f b U f t \054 0 0 143 xml_in xml_out xml_recv xml_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("XML content");
  #define XMLOID 142
! DATA(insert OID = 143 ( _xml	   PGNSP PGUID -1 f b A f t \054 0 142 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node_tree_in pg_node_tree_out pg_node_tree_recv pg_node_tree_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
  DESCR("string representing an internal node tree");
--- 353,359 ----
  DATA(insert OID = 142 ( xml		   PGNSP PGUID -1 f b U f t \054 0 0 143 xml_in xml_out xml_recv xml_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("XML content");
  #define XMLOID 142
! DATA(insert OID = 143 ( _xml	   PGNSP PGUID -1 f b A f t \054 0 142 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node_tree_in pg_node_tree_out pg_node_tree_recv pg_node_tree_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
  DESCR("string representing an internal node tree");
***************
*** 390,396 **** DESCR("geometric polygon '(pt1,...)'");
  DATA(insert OID = 628 (  line	   PGNSP PGUID 32 f b G f t \054 0 701 629 line_in line_out line_recv line_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("geometric line (not implemented)");
  #define LINEOID			628
! DATA(insert OID = 629 (  _line	   PGNSP PGUID	-1 f b A f t \054 0 628 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("");
  
  /* OIDS 700 - 799 */
--- 390,396 ----
  DATA(insert OID = 628 (  line	   PGNSP PGUID 32 f b G f t \054 0 701 629 line_in line_out line_recv line_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("geometric line (not implemented)");
  #define LINEOID			628
! DATA(insert OID = 629 (  _line	   PGNSP PGUID	-1 f b A f t \054 0 628 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("");
  
  /* OIDS 700 - 799 */
***************
*** 417,427 **** DESCR("");
  DATA(insert OID = 718 (  circle    PGNSP PGUID	24 f b G f t \054 0 0 719 circle_in circle_out circle_recv circle_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("geometric circle '(center,radius)'");
  #define CIRCLEOID		718
! DATA(insert OID = 719 (  _circle   PGNSP PGUID	-1 f b A f t \054 0  718 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 790 (  money	   PGNSP PGUID	 8 FLOAT8PASSBYVAL b N f t \054 0 0 791 cash_in cash_out cash_recv cash_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("monetary amounts, $d,ddd.cc");
  #define CASHOID 790
! DATA(insert OID = 791 (  _money    PGNSP PGUID	-1 f b A f t \054 0  790 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 800 - 899 */
  DATA(insert OID = 829 ( macaddr    PGNSP PGUID	6 f b U f t \054 0 0 1040 macaddr_in macaddr_out macaddr_recv macaddr_send - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
--- 417,427 ----
  DATA(insert OID = 718 (  circle    PGNSP PGUID	24 f b G f t \054 0 0 719 circle_in circle_out circle_recv circle_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("geometric circle '(center,radius)'");
  #define CIRCLEOID		718
! DATA(insert OID = 719 (  _circle   PGNSP PGUID	-1 f b A f t \054 0  718 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 790 (  money	   PGNSP PGUID	 8 FLOAT8PASSBYVAL b N f t \054 0 0 791 cash_in cash_out cash_recv cash_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("monetary amounts, $d,ddd.cc");
  #define CASHOID 790
! DATA(insert OID = 791 (  _money    PGNSP PGUID	-1 f b A f t \054 0  790 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 800 - 899 */
  DATA(insert OID = 829 ( macaddr    PGNSP PGUID	6 f b U f t \054 0 0 1040 macaddr_in macaddr_out macaddr_recv macaddr_send - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
***************
*** 437,480 **** DESCR("network IP address/netmask, network address");
  /* OIDS 900 - 999 */
  
  /* OIDS 1000 - 1099 */
! DATA(insert OID = 1000 (  _bool		 PGNSP PGUID -1 f b A f t \054 0	16 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1001 (  _bytea	 PGNSP PGUID -1 f b A f t \054 0	17 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1002 (  _char		 PGNSP PGUID -1 f b A f t \054 0	18 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1003 (  _name		 PGNSP PGUID -1 f b A f t \054 0	19 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1005 (  _int2		 PGNSP PGUID -1 f b A f t \054 0	21 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1006 (  _int2vector PGNSP PGUID -1 f b A f t \054 0	22 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1007 (  _int4		 PGNSP PGUID -1 f b A f t \054 0	23 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define INT4ARRAYOID		1007
! DATA(insert OID = 1008 (  _regproc	 PGNSP PGUID -1 f b A f t \054 0	24 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1009 (  _text		 PGNSP PGUID -1 f b A f t \054 0	25 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
  #define TEXTARRAYOID		1009
! DATA(insert OID = 1028 (  _oid		 PGNSP PGUID -1 f b A f t \054 0	26 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1010 (  _tid		 PGNSP PGUID -1 f b A f t \054 0	27 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1011 (  _xid		 PGNSP PGUID -1 f b A f t \054 0	28 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1012 (  _cid		 PGNSP PGUID -1 f b A f t \054 0	29 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1013 (  _oidvector PGNSP PGUID -1 f b A f t \054 0	30 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1014 (  _bpchar	 PGNSP PGUID -1 f b A f t \054 0 1042 0 array_in array_out array_recv array_send bpchartypmodin bpchartypmodout - i x f 0 -1 0 100 _null_ _null_ _null_ ));
! DATA(insert OID = 1015 (  _varchar	 PGNSP PGUID -1 f b A f t \054 0 1043 0 array_in array_out array_recv array_send varchartypmodin varchartypmodout - i x f 0 -1 0 100 _null_ _null_ _null_ ));
! DATA(insert OID = 1016 (  _int8		 PGNSP PGUID -1 f b A f t \054 0	20 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1017 (  _point	 PGNSP PGUID -1 f b A f t \054 0 600 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1018 (  _lseg		 PGNSP PGUID -1 f b A f t \054 0 601 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1019 (  _path		 PGNSP PGUID -1 f b A f t \054 0 602 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1020 (  _box		 PGNSP PGUID -1 f b A f t \073 0 603 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1021 (  _float4	 PGNSP PGUID -1 f b A f t \054 0 700 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define FLOAT4ARRAYOID 1021
! DATA(insert OID = 1022 (  _float8	 PGNSP PGUID -1 f b A f t \054 0 701 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1023 (  _abstime	 PGNSP PGUID -1 f b A f t \054 0 702 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1024 (  _reltime	 PGNSP PGUID -1 f b A f t \054 0 703 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1025 (  _tinterval PGNSP PGUID -1 f b A f t \054 0 704 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1027 (  _polygon	 PGNSP PGUID -1 f b A f t \054 0 604 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1033 (  aclitem	 PGNSP PGUID 12 f b U f t \054 0 0 1034 aclitemin aclitemout - - - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("access control list");
  #define ACLITEMOID		1033
! DATA(insert OID = 1034 (  _aclitem	 PGNSP PGUID -1 f b A f t \054 0 1033 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1040 (  _macaddr	 PGNSP PGUID -1 f b A f t \054 0  829 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1041 (  _inet		 PGNSP PGUID -1 f b A f t \054 0  869 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 651  (  _cidr		 PGNSP PGUID -1 f b A f t \054 0  650 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1263 (  _cstring	 PGNSP PGUID -1 f b A f t \054 0 2275 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define CSTRINGARRAYOID		1263
  
  DATA(insert OID = 1042 ( bpchar		 PGNSP PGUID -1 f b S f t \054 0	0 1014 bpcharin bpcharout bpcharrecv bpcharsend bpchartypmodin bpchartypmodout - i x f 0 -1 0 100 _null_ _null_ _null_ ));
--- 437,480 ----
  /* OIDS 900 - 999 */
  
  /* OIDS 1000 - 1099 */
! DATA(insert OID = 1000 (  _bool		 PGNSP PGUID -1 f b A f t \054 0	16 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1001 (  _bytea	 PGNSP PGUID -1 f b A f t \054 0	17 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1002 (  _char		 PGNSP PGUID -1 f b A f t \054 0	18 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1003 (  _name		 PGNSP PGUID -1 f b A f t \054 0	19 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1005 (  _int2		 PGNSP PGUID -1 f b A f t \054 0	21 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1006 (  _int2vector PGNSP PGUID -1 f b A f t \054 0	22 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1007 (  _int4		 PGNSP PGUID -1 f b A f t \054 0	23 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
  #define INT4ARRAYOID		1007
! DATA(insert OID = 1008 (  _regproc	 PGNSP PGUID -1 f b A f t \054 0	24 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1009 (  _text		 PGNSP PGUID -1 f b A f t \054 0	25 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 100 _null_ _null_  _null_ ));
  #define TEXTARRAYOID		1009
! DATA(insert OID = 1028 (  _oid		 PGNSP PGUID -1 f b A f t \054 0	26 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1010 (  _tid		 PGNSP PGUID -1 f b A f t \054 0	27 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1011 (  _xid		 PGNSP PGUID -1 f b A f t \054 0	28 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1012 (  _cid		 PGNSP PGUID -1 f b A f t \054 0	29 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1013 (  _oidvector PGNSP PGUID -1 f b A f t \054 0	30 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1014 (  _bpchar	 PGNSP PGUID -1 f b A f t \054 0 1042 0 array_in array_out array_recv array_send bpchartypmodin bpchartypmodout array_typanalyze i x f 0 -1 0 100 _null_ _null_  _null_ ));
! DATA(insert OID = 1015 (  _varchar	 PGNSP PGUID -1 f b A f t \054 0 1043 0 array_in array_out array_recv array_send varchartypmodin varchartypmodout array_typanalyze i x f 0 -1 0 100 _null_ _null_  _null_ ));
! DATA(insert OID = 1016 (  _int8		 PGNSP PGUID -1 f b A f t \054 0	20 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1017 (  _point	 PGNSP PGUID -1 f b A f t \054 0 600 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1018 (  _lseg		 PGNSP PGUID -1 f b A f t \054 0 601 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1019 (  _path		 PGNSP PGUID -1 f b A f t \054 0 602 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1020 (  _box		 PGNSP PGUID -1 f b A f t \073 0 603 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1021 (  _float4	 PGNSP PGUID -1 f b A f t \054 0 700 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
  #define FLOAT4ARRAYOID 1021
! DATA(insert OID = 1022 (  _float8	 PGNSP PGUID -1 f b A f t \054 0 701 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1023 (  _abstime	 PGNSP PGUID -1 f b A f t \054 0 702 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1024 (  _reltime	 PGNSP PGUID -1 f b A f t \054 0 703 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1025 (  _tinterval PGNSP PGUID -1 f b A f t \054 0 704 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_  _null_ ));
! DATA(insert OID = 1027 (  _polygon	 PGNSP PGUID -1 f b A f t \054 0 604 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_  _null_ ));
  DATA(insert OID = 1033 (  aclitem	 PGNSP PGUID 12 f b U f t \054 0 0 1034 aclitemin aclitemout - - - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("access control list");
  #define ACLITEMOID		1033
! DATA(insert OID = 1034 (  _aclitem	 PGNSP PGUID -1 f b A f t \054 0 1033 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1040 (  _macaddr	 PGNSP PGUID -1 f b A f t \054 0  829 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1041 (  _inet		 PGNSP PGUID -1 f b A f t \054 0  869 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 651  (  _cidr		 PGNSP PGUID -1 f b A f t \054 0  650 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1263 (  _cstring	 PGNSP PGUID -1 f b A f t \054 0 2275 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define CSTRINGARRAYOID		1263
  
  DATA(insert OID = 1042 ( bpchar		 PGNSP PGUID -1 f b S f t \054 0	0 1014 bpcharin bpcharout bpcharrecv bpcharsend bpchartypmodin bpchartypmodout - i x f 0 -1 0 100 _null_ _null_ _null_ ));
***************
*** 495,528 **** DESCR("time of day");
  DATA(insert OID = 1114 ( timestamp	 PGNSP PGUID	8 FLOAT8PASSBYVAL b D f t \054 0	0 1115 timestamp_in timestamp_out timestamp_recv timestamp_send timestamptypmodin timestamptypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("date and time");
  #define TIMESTAMPOID	1114
! DATA(insert OID = 1115 ( _timestamp  PGNSP PGUID	-1 f b A f t \054 0 1114 0 array_in array_out array_recv array_send timestamptypmodin timestamptypmodout - d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1182 ( _date		 PGNSP PGUID	-1 f b A f t \054 0 1082 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1183 ( _time		 PGNSP PGUID	-1 f b A f t \054 0 1083 0 array_in array_out array_recv array_send timetypmodin timetypmodout - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1184 ( timestamptz PGNSP PGUID	8 FLOAT8PASSBYVAL b D t t \054 0	0 1185 timestamptz_in timestamptz_out timestamptz_recv timestamptz_send timestamptztypmodin timestamptztypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("date and time with time zone");
  #define TIMESTAMPTZOID	1184
! DATA(insert OID = 1185 ( _timestamptz PGNSP PGUID -1 f b A f t \054 0	1184 0 array_in array_out array_recv array_send timestamptztypmodin timestamptztypmodout - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1186 ( interval	 PGNSP PGUID 16 f b T t t \054 0	0 1187 interval_in interval_out interval_recv interval_send intervaltypmodin intervaltypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("@ <number> <units>, time interval");
  #define INTERVALOID		1186
! DATA(insert OID = 1187 ( _interval	 PGNSP PGUID	-1 f b A f t \054 0 1186 0 array_in array_out array_recv array_send intervaltypmodin intervaltypmodout - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1200 - 1299 */
! DATA(insert OID = 1231 (  _numeric	 PGNSP PGUID -1 f b A f t \054 0	1700 0 array_in array_out array_recv array_send numerictypmodin numerictypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1266 ( timetz		 PGNSP PGUID 12 f b D f t \054 0	0 1270 timetz_in timetz_out timetz_recv timetz_send timetztypmodin timetztypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("time of day with time zone");
  #define TIMETZOID		1266
! DATA(insert OID = 1270 ( _timetz	 PGNSP PGUID -1 f b A f t \054 0	1266 0 array_in array_out array_recv array_send timetztypmodin timetztypmodout - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1500 - 1599 */
  DATA(insert OID = 1560 ( bit		 PGNSP PGUID -1 f b V f t \054 0	0 1561 bit_in bit_out bit_recv bit_send bittypmodin bittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("fixed-length bit string");
  #define BITOID	 1560
! DATA(insert OID = 1561 ( _bit		 PGNSP PGUID -1 f b A f t \054 0	1560 0 array_in array_out array_recv array_send bittypmodin bittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1562 ( varbit		 PGNSP PGUID -1 f b V t t \054 0	0 1563 varbit_in varbit_out varbit_recv varbit_send varbittypmodin varbittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("variable-length bit string");
  #define VARBITOID	  1562
! DATA(insert OID = 1563 ( _varbit	 PGNSP PGUID -1 f b A f t \054 0	1562 0 array_in array_out array_recv array_send varbittypmodin varbittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1600 - 1699 */
  
--- 495,528 ----
  DATA(insert OID = 1114 ( timestamp	 PGNSP PGUID	8 FLOAT8PASSBYVAL b D f t \054 0	0 1115 timestamp_in timestamp_out timestamp_recv timestamp_send timestamptypmodin timestamptypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("date and time");
  #define TIMESTAMPOID	1114
! DATA(insert OID = 1115 ( _timestamp  PGNSP PGUID	-1 f b A f t \054 0 1114 0 array_in array_out array_recv array_send timestamptypmodin timestamptypmodout array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1182 ( _date		 PGNSP PGUID	-1 f b A f t \054 0 1082 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 1183 ( _time		 PGNSP PGUID	-1 f b A f t \054 0 1083 0 array_in array_out array_recv array_send timetypmodin timetypmodout array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1184 ( timestamptz PGNSP PGUID	8 FLOAT8PASSBYVAL b D t t \054 0	0 1185 timestamptz_in timestamptz_out timestamptz_recv timestamptz_send timestamptztypmodin timestamptztypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("date and time with time zone");
  #define TIMESTAMPTZOID	1184
! DATA(insert OID = 1185 ( _timestamptz PGNSP PGUID -1 f b A f t \054 0	1184 0 array_in array_out array_recv array_send timestamptztypmodin timestamptztypmodout array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1186 ( interval	 PGNSP PGUID 16 f b T t t \054 0	0 1187 interval_in interval_out interval_recv interval_send intervaltypmodin intervaltypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("@ <number> <units>, time interval");
  #define INTERVALOID		1186
! DATA(insert OID = 1187 ( _interval	 PGNSP PGUID	-1 f b A f t \054 0 1186 0 array_in array_out array_recv array_send intervaltypmodin intervaltypmodout array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1200 - 1299 */
! DATA(insert OID = 1231 (  _numeric	 PGNSP PGUID -1 f b A f t \054 0	1700 0 array_in array_out array_recv array_send numerictypmodin numerictypmodout array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1266 ( timetz		 PGNSP PGUID 12 f b D f t \054 0	0 1270 timetz_in timetz_out timetz_recv timetz_send timetztypmodin timetztypmodout - d p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("time of day with time zone");
  #define TIMETZOID		1266
! DATA(insert OID = 1270 ( _timetz	 PGNSP PGUID -1 f b A f t \054 0	1266 0 array_in array_out array_recv array_send timetztypmodin timetztypmodout array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1500 - 1599 */
  DATA(insert OID = 1560 ( bit		 PGNSP PGUID -1 f b V f t \054 0	0 1561 bit_in bit_out bit_recv bit_send bittypmodin bittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("fixed-length bit string");
  #define BITOID	 1560
! DATA(insert OID = 1561 ( _bit		 PGNSP PGUID -1 f b A f t \054 0	1560 0 array_in array_out array_recv array_send bittypmodin bittypmodout array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 1562 ( varbit		 PGNSP PGUID -1 f b V t t \054 0	0 1563 varbit_in varbit_out varbit_recv varbit_send varbittypmodin varbittypmodout - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("variable-length bit string");
  #define VARBITOID	  1562
! DATA(insert OID = 1563 ( _varbit	 PGNSP PGUID -1 f b A f t \054 0	1562 0 array_in array_out array_recv array_send varbittypmodin varbittypmodout array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* OIDS 1600 - 1699 */
  
***************
*** 536,542 **** DESCR("reference to cursor (portal name)");
  #define REFCURSOROID	1790
  
  /* OIDS 2200 - 2299 */
! DATA(insert OID = 2201 ( _refcursor    PGNSP PGUID -1 f b A f t \054 0 1790 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 2202 ( regprocedure  PGNSP PGUID	4 t b N f t \054 0	 0 2207 regprocedurein regprocedureout regprocedurerecv regproceduresend - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("registered procedure (with args)");
--- 536,542 ----
  #define REFCURSOROID	1790
  
  /* OIDS 2200 - 2299 */
! DATA(insert OID = 2201 ( _refcursor    PGNSP PGUID -1 f b A f t \054 0 1790 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 2202 ( regprocedure  PGNSP PGUID	4 t b N f t \054 0	 0 2207 regprocedurein regprocedureout regprocedurerecv regproceduresend - - - i p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("registered procedure (with args)");
***************
*** 558,574 **** DATA(insert OID = 2206 ( regtype	   PGNSP PGUID	4 t b N f t \054 0	 0 2211 regty
  DESCR("registered type");
  #define REGTYPEOID		2206
  
! DATA(insert OID = 2207 ( _regprocedure PGNSP PGUID -1 f b A f t \054 0 2202 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2208 ( _regoper	   PGNSP PGUID -1 f b A f t \054 0 2203 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2209 ( _regoperator  PGNSP PGUID -1 f b A f t \054 0 2204 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2210 ( _regclass	   PGNSP PGUID -1 f b A f t \054 0 2205 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2211 ( _regtype	   PGNSP PGUID -1 f b A f t \054 0 2206 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define REGTYPEARRAYOID 2211
  
  /* uuid */
  DATA(insert OID = 2950 ( uuid			PGNSP PGUID 16 f b U f t \054 0 0 2951 uuid_in uuid_out uuid_recv uuid_send - - - c p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("UUID datatype");
! DATA(insert OID = 2951 ( _uuid			PGNSP PGUID -1 f b A f t \054 0 2950 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* text search */
  DATA(insert OID = 3614 ( tsvector		PGNSP PGUID -1 f b U f t \054 0 0 3643 tsvectorin tsvectorout tsvectorrecv tsvectorsend - - ts_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
--- 558,574 ----
  DESCR("registered type");
  #define REGTYPEOID		2206
  
! DATA(insert OID = 2207 ( _regprocedure PGNSP PGUID -1 f b A f t \054 0 2202 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2208 ( _regoper	   PGNSP PGUID -1 f b A f t \054 0 2203 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2209 ( _regoperator  PGNSP PGUID -1 f b A f t \054 0 2204 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2210 ( _regclass	   PGNSP PGUID -1 f b A f t \054 0 2205 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 2211 ( _regtype	   PGNSP PGUID -1 f b A f t \054 0 2206 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  #define REGTYPEARRAYOID 2211
  
  /* uuid */
  DATA(insert OID = 2950 ( uuid			PGNSP PGUID 16 f b U f t \054 0 0 2951 uuid_in uuid_out uuid_recv uuid_send - - - c p f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("UUID datatype");
! DATA(insert OID = 2951 ( _uuid			PGNSP PGUID -1 f b A f t \054 0 2950 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* text search */
  DATA(insert OID = 3614 ( tsvector		PGNSP PGUID -1 f b U f t \054 0 0 3643 tsvectorin tsvectorout tsvectorrecv tsvectorsend - - ts_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
***************
*** 587,622 **** DATA(insert OID = 3769 ( regdictionary	PGNSP PGUID 4 t b N f t \054 0 0 3770 reg
  DESCR("registered text search dictionary");
  #define REGDICTIONARYOID	3769
  
! DATA(insert OID = 3643 ( _tsvector		PGNSP PGUID -1 f b A f t \054 0 3614 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3644 ( _gtsvector		PGNSP PGUID -1 f b A f t \054 0 3642 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3645 ( _tsquery		PGNSP PGUID -1 f b A f t \054 0 3615 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3735 ( _regconfig		PGNSP PGUID -1 f b A f t \054 0 3734 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3770 ( _regdictionary PGNSP PGUID -1 f b A f t \054 0 3769 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 2970 ( txid_snapshot	PGNSP PGUID -1 f b U f t \054 0 0 2949 txid_snapshot_in txid_snapshot_out txid_snapshot_recv txid_snapshot_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("txid snapshot");
! DATA(insert OID = 2949 ( _txid_snapshot PGNSP PGUID -1 f b A f t \054 0 2970 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* range types */
  DATA(insert OID = 3904 ( int4range		PGNSP PGUID  -1 f r R f t \054 0 0 3905 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of integers");
  #define INT4RANGEOID		3904
! DATA(insert OID = 3905 ( _int4range		PGNSP PGUID  -1 f b A f t \054 0 3904 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3906 ( numrange		PGNSP PGUID  -1 f r R f t \054 0 0 3907 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of numerics");
! DATA(insert OID = 3907 ( _numrange		PGNSP PGUID  -1 f b A f t \054 0 3906 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3908 ( tsrange		PGNSP PGUID  -1 f r R f t \054 0 0 3909 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of timestamps without time zone");
! DATA(insert OID = 3909 ( _tsrange		PGNSP PGUID  -1 f b A f t \054 0 3908 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3910 ( tstzrange		PGNSP PGUID  -1 f r R f t \054 0 0 3911 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of timestamps with time zone");
! DATA(insert OID = 3911 ( _tstzrange		PGNSP PGUID  -1 f b A f t \054 0 3910 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3912 ( daterange		PGNSP PGUID  -1 f r R f t \054 0 0 3913 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of dates");
! DATA(insert OID = 3913 ( _daterange		PGNSP PGUID  -1 f b A f t \054 0 3912 0 array_in array_out array_recv array_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3926 ( int8range		PGNSP PGUID  -1 f r R f t \054 0 0 3927 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of bigints");
! DATA(insert OID = 3927 ( _int8range		PGNSP PGUID  -1 f b A f t \054 0 3926 0 array_in array_out array_recv array_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /*
   * pseudo-types
--- 587,622 ----
  DESCR("registered text search dictionary");
  #define REGDICTIONARYOID	3769
  
! DATA(insert OID = 3643 ( _tsvector		PGNSP PGUID -1 f b A f t \054 0 3614 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3644 ( _gtsvector		PGNSP PGUID -1 f b A f t \054 0 3642 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3645 ( _tsquery		PGNSP PGUID -1 f b A f t \054 0 3615 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3735 ( _regconfig		PGNSP PGUID -1 f b A f t \054 0 3734 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
! DATA(insert OID = 3770 ( _regdictionary PGNSP PGUID -1 f b A f t \054 0 3769 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  DATA(insert OID = 2970 ( txid_snapshot	PGNSP PGUID -1 f b U f t \054 0 0 2949 txid_snapshot_in txid_snapshot_out txid_snapshot_recv txid_snapshot_send - - - d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("txid snapshot");
! DATA(insert OID = 2949 ( _txid_snapshot PGNSP PGUID -1 f b A f t \054 0 2970 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /* range types */
  DATA(insert OID = 3904 ( int4range		PGNSP PGUID  -1 f r R f t \054 0 0 3905 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of integers");
  #define INT4RANGEOID		3904
! DATA(insert OID = 3905 ( _int4range		PGNSP PGUID  -1 f b A f t \054 0 3904 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3906 ( numrange		PGNSP PGUID  -1 f r R f t \054 0 0 3907 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of numerics");
! DATA(insert OID = 3907 ( _numrange		PGNSP PGUID  -1 f b A f t \054 0 3906 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3908 ( tsrange		PGNSP PGUID  -1 f r R f t \054 0 0 3909 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of timestamps without time zone");
! DATA(insert OID = 3909 ( _tsrange		PGNSP PGUID  -1 f b A f t \054 0 3908 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3910 ( tstzrange		PGNSP PGUID  -1 f r R f t \054 0 0 3911 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of timestamps with time zone");
! DATA(insert OID = 3911 ( _tstzrange		PGNSP PGUID  -1 f b A f t \054 0 3910 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3912 ( daterange		PGNSP PGUID  -1 f r R f t \054 0 0 3913 range_in range_out range_recv range_send - - range_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of dates");
! DATA(insert OID = 3913 ( _daterange		PGNSP PGUID  -1 f b A f t \054 0 3912 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
  DATA(insert OID = 3926 ( int8range		PGNSP PGUID  -1 f r R f t \054 0 0 3927 range_in range_out range_recv range_send - - range_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  DESCR("range of bigints");
! DATA(insert OID = 3927 ( _int8range		PGNSP PGUID  -1 f b A f t \054 0 3926 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
  
  /*
   * pseudo-types
diff --git a/src/include/commands/vaindex 4526648..e994193 100644
*** a/src/include/commands/vacuum.h
--- b/src/include/commands/vacuum.h
***************
*** 167,171 **** extern void lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
--- 167,172 ----
  /* in commands/analyze.c */
  extern void analyze_rel(Oid relid, VacuumStmt *vacstmt,
  			BufferAccessStrategy bstrategy);
+ extern bool std_typanalyze(VacAttrStats *stats);
  
  #endif   /* VACUUM_H */
diff --git a/src/include/utils/arrayindex c6d0ad6..4e51491 100644
*** a/src/include/utils/array.h
--- b/src/include/utils/array.h
***************
*** 289,292 **** extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
--- 289,302 ----
  extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
  extern Datum array_agg_finalfn(PG_FUNCTION_ARGS);
  
+ /*
+  * prototypes for functions defined in array_selfuncs.c
+  */
+ extern Datum arraysel(PG_FUNCTION_ARGS);
+ 
+ /*
+  * prototypes for functions defined in array_typanalyze.c
+  */
+ extern Datum array_typanalyze(PG_FUNCTION_ARGS);
+ 
  #endif   /* ARRAY_H */
diff --git a/src/include/utils/sindex 78eda1b..335b2a0 100644
*** a/src/include/utils/selfuncs.h
--- b/src/include/utils/selfuncs.h
***************
*** 165,170 **** extern Datum icregexnejoinsel(PG_FUNCTION_ARGS);
--- 165,172 ----
  extern Datum nlikejoinsel(PG_FUNCTION_ARGS);
  extern Datum icnlikejoinsel(PG_FUNCTION_ARGS);
  
+ extern Selectivity calc_scalararraysel(VariableStatData *vardata, Datum constval,
+ 			bool orClause, Oid operator);
  extern Selectivity booltestsel(PlannerInfo *root, BoolTestType booltesttype,
  			Node *arg, int varRelid,
  			JoinType jointype, SpecialJoinInfo *sjinfo);
diff --git a/src/test/regress/expecindex 6e55349..9865b69 100644
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
***************
*** 421,426 **** SELECT 0 || ARRAY[1,2] || 3 AS "{0,1,2,3}";
--- 421,427 ----
   {0,1,2,3}
  (1 row)
  
+ ANALYZE array_op_test;
  SELECT * FROM array_op_test WHERE i @> '{32}' ORDER BY seqno;
   seqno |                i                |                                                                 t                                                                  
  -------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------
diff --git a/src/test/regress/expected/ruleindex 454e1f9..0a9287f 100644
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1317,1323 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
   pg_statio_user_indexes          | SELECT pg_statio_all_indexes.relid, pg_statio_all_indexes.indexrelid, pg_statio_all_indexes.schemaname, pg_statio_all_indexes.relname, pg_statio_all_indexes.indexrelname, pg_statio_all_indexes.idx_blks_read, pg_statio_all_indexes.idx_blks_hit FROM pg_statio_all_indexes WHERE ((pg_statio_all_indexes.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_indexes.schemaname !~ '^pg_toast'::text));
   pg_statio_user_sequences        | SELECT pg_statio_all_sequences.relid, pg_statio_all_sequences.schemaname, pg_statio_all_sequences.relname, pg_statio_all_sequences.blks_read, pg_statio_all_sequences.blks_hit FROM pg_statio_all_sequences WHERE ((pg_statio_all_sequences.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_sequences.schemaname !~ '^pg_toast'::text));
   pg_statio_user_tables           | SELECT pg_statio_all_tables.relid, pg_statio_all_tables.schemaname, pg_statio_all_tables.relname, pg_statio_all_tables.heap_blks_read, pg_statio_all_tables.heap_blks_hit, pg_statio_all_tables.idx_blks_read, pg_statio_all_tables.idx_blks_hit, pg_statio_all_tables.toast_blks_read, pg_statio_all_tables.toast_blks_hit, pg_statio_all_tables.tidx_blks_read, pg_statio_all_tables.tidx_blks_hit FROM pg_statio_all_tables WHERE ((pg_statio_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_tables.schemaname !~ '^pg_toast'::text));
!  pg_stats                        | SELECT n.nspname AS schemaname, c.relname AS tablename, a.attname, s.stainherit AS inherited, s.stanullfrac AS null_frac, s.stawidth AS avg_width, s.stadistinct AS n_distinct, CASE WHEN (s.stakind1 = ANY (ARRAY[1, 4])) THEN s.stavalues1 WHEN (s.stakind2 = ANY (ARRAY[1, 4])) THEN s.stavalues2 WHEN (s.stakind3 = ANY (ARRAY[1, 4])) THEN s.stavalues3 WHEN (s.stakind4 = ANY (ARRAY[1, 4])) THEN s.stavalues4 ELSE NULL::anyarray END AS most_common_vals, CASE WHEN (s.stakind1 = ANY (ARRAY[1, 4])) THEN s.stanumbers1 WHEN (s.stakind2 = ANY (ARRAY[1, 4])) THEN s.stanumbers2 WHEN (s.stakind3 = ANY (ARRAY[1, 4])) THEN s.stanumbers3 WHEN (s.stakind4 = ANY (ARRAY[1, 4])) THEN s.stanumbers4 ELSE NULL::real[] END AS most_common_freqs, CASE WHEN (s.stakind1 = 2) THEN s.stavalues1 WHEN (s.stakind2 = 2) THEN s.stavalues2 WHEN (s.stakind3 = 2) THEN s.stavalues3 WHEN (s.stakind4 = 2) THEN s.stavalues4 ELSE NULL::anyarray END AS histogram_bounds, CASE WHEN (s.stakind1 = 3) THEN s.stanumbers1[1] WHEN (s.stakind2 = 3) THEN s.stanumbers2[1] WHEN (s.stakind3 = 3) THEN s.stanumbers3[1] WHEN (s.stakind4 = 3) THEN s.stanumbers4[1] ELSE NULL::real END AS correlation FROM (((pg_statistic s JOIN pg_class c ON ((c.oid = s.starelid))) JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum)))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text));
   pg_tables                       | SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
   pg_timezone_abbrevs             | SELECT pg_timezone_abbrevs.abbrev, pg_timezone_abbrevs.utc_offset, pg_timezone_abbrevs.is_dst FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
   pg_timezone_names               | SELECT pg_timezone_names.name, pg_timezone_names.abbrev, pg_timezone_names.utc_offset, pg_timezone_names.is_dst FROM pg_timezone_names() pg_timezone_names(name, abbrev, utc_offset, is_dst);
--- 1317,1323 ----
   pg_statio_user_indexes          | SELECT pg_statio_all_indexes.relid, pg_statio_all_indexes.indexrelid, pg_statio_all_indexes.schemaname, pg_statio_all_indexes.relname, pg_statio_all_indexes.indexrelname, pg_statio_all_indexes.idx_blks_read, pg_statio_all_indexes.idx_blks_hit FROM pg_statio_all_indexes WHERE ((pg_statio_all_indexes.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_indexes.schemaname !~ '^pg_toast'::text));
   pg_statio_user_sequences        | SELECT pg_statio_all_sequences.relid, pg_statio_all_sequences.schemaname, pg_statio_all_sequences.relname, pg_statio_all_sequences.blks_read, pg_statio_all_sequences.blks_hit FROM pg_statio_all_sequences WHERE ((pg_statio_all_sequences.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_sequences.schemaname !~ '^pg_toast'::text));
   pg_statio_user_tables           | SELECT pg_statio_all_tables.relid, pg_statio_all_tables.schemaname, pg_statio_all_tables.relname, pg_statio_all_tables.heap_blks_read, pg_statio_all_tables.heap_blks_hit, pg_statio_all_tables.idx_blks_read, pg_statio_all_tables.idx_blks_hit, pg_statio_all_tables.toast_blks_read, pg_statio_all_tables.toast_blks_hit, pg_statio_all_tables.tidx_blks_read, pg_statio_all_tables.tidx_blks_hit FROM pg_statio_all_tables WHERE ((pg_statio_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_statio_all_tables.schemaname !~ '^pg_toast'::text));
!  pg_stats                        | SELECT n.nspname AS schemaname, c.relname AS tablename, a.attname, s.stainherit AS inherited, s.stanullfrac AS null_frac, s.stawidth AS avg_width, s.stadistinct AS n_distinct, CASE WHEN (s.stakind1 = 1) THEN s.stavalues1 WHEN (s.stakind2 = 1) THEN s.stavalues2 WHEN (s.stakind3 = 1) THEN s.stavalues3 WHEN (s.stakind4 = 1) THEN s.stavalues4 WHEN (s.stakind5 = 1) THEN s.stavalues5 ELSE NULL::anyarray END AS most_common_vals, CASE WHEN (s.stakind1 = 1) THEN s.stanumbers1 WHEN (s.stakind2 = 1) THEN s.stanumbers2 WHEN (s.stakind3 = 1) THEN s.stanumbers3 WHEN (s.stakind4 = 1) THEN s.stanumbers4 WHEN (s.stakind5 = 1) THEN s.stanumbers5 ELSE NULL::real[] END AS most_common_freqs, CASE WHEN (s.stakind1 = 2) THEN s.stavalues1 WHEN (s.stakind2 = 2) THEN s.stavalues2 WHEN (s.stakind3 = 2) THEN s.stavalues3 WHEN (s.stakind4 = 2) THEN s.stavalues4 WHEN (s.stakind5 = 2) THEN s.stavalues5 ELSE NULL::anyarray END AS histogram_bounds, CASE WHEN (s.stakind1 = 3) THEN s.stanumbers1[1] WHEN (s.stakind2 = 3) THEN s.stanumbers2[1] WHEN (s.stakind3 = 3) THEN s.stanumbers3[1] WHEN (s.stakind4 = 3) THEN s.stanumbers4[1] WHEN (s.stakind5 = 3) THEN s.stanumbers5[1] ELSE NULL::real END AS correlation, CASE WHEN (s.stakind1 = 4) THEN s.stavalues1 WHEN (s.stakind2 = 4) THEN s.stavalues2 WHEN (s.stakind3 = 4) THEN s.stavalues3 WHEN (s.stakind4 = 4) THEN s.stavalues4 WHEN (s.stakind5 = 4) THEN s.stavalues5 ELSE NULL::anyarray END AS most_common_elems, CASE WHEN (s.stakind1 = 4) THEN s.stanumbers1 WHEN (s.stakind2 = 4) THEN s.stanumbers2 WHEN (s.stakind3 = 4) THEN s.stanumbers3 WHEN (s.stakind4 = 4) THEN s.stanumbers4 WHEN (s.stakind5 = 4) THEN s.stanumbers5 ELSE NULL::real[] END AS most_common_elem_freqs, CASE WHEN (s.stakind1 = 5) THEN s.stavalues1 WHEN (s.stakind2 = 5) THEN s.stavalues2 WHEN (s.stakind3 = 5) THEN s.stavalues3 WHEN (s.stakind4 = 5) THEN s.stavalues4 WHEN (s.stakind5 = 5) THEN s.stavalues5 ELSE NULL::anyarray END AS length_histogram_bounds FROM (((pg_statistic s JOIN pg_class c ON ((c.oid = s.starelid))) JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum)))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text));
   pg_tables                       | SELECT n.nspname AS schemaname, c.relname AS tablename, pg_get_userbyid(c.relowner) AS tableowner, t.spcname AS tablespace, c.relhasindex AS hasindexes, c.relhasrules AS hasrules, c.relhastriggers AS hastriggers FROM ((pg_class c LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace))) WHERE (c.relkind = 'r'::"char");
   pg_timezone_abbrevs             | SELECT pg_timezone_abbrevs.abbrev, pg_timezone_abbrevs.utc_offset, pg_timezone_abbrevs.is_dst FROM pg_timezone_abbrevs() pg_timezone_abbrevs(abbrev, utc_offset, is_dst);
   pg_timezone_names               | SELECT pg_timezone_names.name, pg_timezone_names.abbrev, pg_timezone_names.utc_offset, pg_timezone_names.is_dst FROM pg_timezone_names() pg_timezone_names(name, abbrev, utc_offset, is_dst);
diff --git a/src/test/regress/sql/arrays.sindex 9ea53b1..294b44e 100644
*** a/src/test/regress/sql/arrays.sql
--- b/src/test/regress/sql/arrays.sql
***************
*** 196,201 **** SELECT ARRAY[[1,2],[3,4]] || ARRAY[5,6] AS "{{1,2},{3,4},{5,6}}";
--- 196,203 ----
  SELECT ARRAY[0,0] || ARRAY[1,1] || ARRAY[2,2] AS "{0,0,1,1,2,2}";
  SELECT 0 || ARRAY[1,2] || 3 AS "{0,1,2,3}";
  
+ ANALYZE array_op_test;
+ 
  SELECT * FROM array_op_test WHERE i @> '{32}' ORDER BY seqno;
  SELECT * FROM array_op_test WHERE i && '{32}' ORDER BY seqno;
  SELECT * FROM array_op_test WHERE i @> '{17}' ORDER BY seqno;
#13Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#12)
1 attachment(s)
Re: Collect frequency statistics for arrays

Hi!

Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.

On Thu, Jan 12, 2012 at 3:06 PM, Noah Misch <noah@leadboat.com> wrote:

I'm not sure about shared lossy counting module, because part of shared
code would be relatively small. Part of compute_array_stats function

which

is taking care about array decompression, distinct occurence calculation,
disting element count histogram, packing statistics slots etc is much
larger than lossy counting algorithm itself. May be, there is some other
opinions in community?

True; it would probably increase total lines of code. The benefit, if any,
lies in separation of concerns; the business of implementing this
algorithm is
quite different from the other roles of these typanalyze functions. I
won't
insist that you try it, though.

I'd prefer to try it as separate patch.

+             /*
+              * The probability of no occurence of events which

forms

"rest"

+ * probability have a limit of exp(-rest) when number

of

events fo to

+ * infinity. Another simplification is to replace that

events with one

+              * event with (1 - exp(-rest)) probability.
+              */
+             rest = 1.0f - exp(-rest);

What is the name of the underlying concept in probability theory?

The most closest concept to caculated distribution is multinomial
distribution. But it's not exactly same, because multinomial distribution
gives probability of particular count of each event occurece, not
probability of summary occurence. Actually, distribution is caclulated

just

from assumption of events independence. The most closest concept of rest
probability is approximation by exponential distribution. It's quite

rough

approximation, but I can't invent something better with low calculation
complexity.

Do you have a URL of a tutorial or paper that explains the method in more
detail? If, rather, this is a novel synthesis, could you write a proof to
include in the comments?

Unfortunately I don't have relevant paper for it. I'll try to search
it. Otherwise I'll try to do some proof.

+ /*
+  * Array selectivity estimation based on most common elements

statistics for

+ * "column <@ const" case. Assumption that element occurences are

independent

+ * means certain distribution of array lengths. Typically real

distribution

+ * of lengths is significantly different from it. For example, if

even we

+ * have set of arrays with 1 integer element in range [0;10] each,

element

+ * occurences are not independent. Because in the case of

independence we

Do you refer to a column where '{1,12,46}' and '{13,7}' may appear, but
'{6,19,4}' cannot appear?

I refer column where only one element exists, i.e. only possible values

are

'{0}', '{1}', '{2}', '{3}', '{4}', '{5}', '{6}', '{7}', '{8}', '{9}',
'{10}'. That is a corner case. But similar situation occurs when, for
example, we've distribution of distinct element count between 1 and 3. It
significantly differs from distribution from independent occurence.

Oh, I think I see now. If each element 1..10 had frequency 0.1
independently,
column values would have exactly one distinct element just 39% of the time?

Yes, it's right.

If probability theory has a prototypical problem resembling this, it would
be
nice to include a URL to a thorough discussion thereof. I could not think
of
the search terms to find one, though.

Actually, usage of both distinct element count histogram and element
occurrence frequencies produce some probability distribution (which is more
complex than just independent element occurrence). If real distribution is
close this distribution, then estimate is accurate. I didn't met such
distributions is papers, actually I've just invented it in my tries to do
accurate "column <@ const" estimation at least in simple cases. I'll try to
search for similar things in papers, but I doubt I'll succeed. Otherwise
I'll try to do some more detailed proof.

*** /dev/null
--- b/src/backend/utils/adt/array_selfuncs.c
+ Selectivity
+ calc_scalararraysel(VariableStatData *vardata, Datum constval, bool

orClause,

+                                     Oid operator)
+ {
+     Oid                     elemtype;
+     Selectivity selec;
+     TypeCacheEntry *typentry;
+     Datum      *hist;
+     int                     nhist;
+     FunctionCallInfoData cmpfunc;
+
+     elemtype = get_base_element_type(vardata->vartype);
+
+
+     /* Get default comparison function */
+     typentry = lookup_type_cache(elemtype,
+                TYPECACHE_CMP_PROC | TYPECACHE_CMP_PROC_FINFO |

TYPECACHE_EQ_OPR);

+
+     /* Handle only "=" operator. Return default selectivity in other

cases. */

+ if (operator != typentry->eq_opr)
+ return (Selectivity) 0.5;

Punting on other operators this way creates a plan quality regression for
operations like "const < ANY (column)". Please do it some way that falls
back on the somewhat-better existing scalararraysel() treatment for this.

I've made calc_scalararraysel return -1 in this case or in the case of no
comparison function. scalararraysel continues it's work
when calc_scalararraysel returns negative value.

+ /*
+  * Calculate first n distinct element counts probabilities by

histogram. We

+  * assume that any interval between a and b histogram values gives
+  * 1 / ((b - a + 1) * (nhist - 1)) probability to values between a and

b and

+ * half of that to a and b. Returns total probability that distinct

element

+  * count is less of equal to n.
+  */
+ static float
+ calc_hist(Datum *hist, int nhist, float *hist_part, int n)

To test this function, I ran the following test case:

set default_statistics_target = 4;
create table t3 as select array(select * from generate_series(1, v)) as arr
from (values (2),(2),(2),(3),(5),(5),(5)) v(v), generate_series(1,100);
analyze t3; -- length_histogram_bounds = {2,2,5,5}
select * from t3 where arr <@ array[6,7,8,9,10,11];

Using gdb to observe calc_hist()'s result during the last command:

(gdb) p calc_hist(hist, nhist, hist_part, unique_nitems)
$23 = 0.666666687
(gdb) x/6f hist_part
0xcd4bc8: 0 0 0.333333343 0
0xcd4bd8: 0 0.333333343

I expected an equal, nonzero probability in hist_part[3] and hist_part[4]
and
a total probability of 1.0.

+ {
+     int                     k,
+                             i = 0,
+                             prev_interval = 0,
+                             next_interval = 0;
+     float           frac,
+                             total = 0.0f;
+
+     /*
+      * frac is a probability contribution by each interval between

histogram

+ * values. We have nhist - 1 intervals. Contribution of one will

be 1 /

+      * (nhist - 1).
+      */
+     frac = 1.0f / ((float) (nhist - 1));
+     for (k = 0; k <= n; k++)
+     {
+             int                     count = 0;
+
+             /* Count occurences of k distinct element counts in

histogram. */

+             while (i < nhist && DatumGetInt32(hist[i]) <= k)
+             {
+                     if (DatumGetInt32(hist[i]) == k)
+                             count++;
+                     i++;
+             }
+
+             if (count > 0)
+             {
+                     float           val;
+
+                     /* Find length between current histogram value and

the next one */

+                     if (i < nhist)
+                             next_interval = DatumGetInt32(hist[i + 1])

-

Doesn't this read past the array end when i == nhist - 1?

It was a bug. It also causes wrong histogram calculation you noted above.
Fixed.

+     stats->extra_data = extra_data->std_extra_data;
+     old_context = CurrentMemoryContext;
+     extra_data->std_compute_stats(stats, fetchfunc, samplerows,

totalrows);

+ MemoryContextSwitchTo(old_context);

Is the callee known to change CurrentMemoryContext and not restore it?
Offhand, I'm not seeing how it could do so.

Right. Saving of memory context is not needed. Removed.

*** a/src/include/catalog/pg_type.h

--- b/src/include/catalog/pg_type.h

This now updates all array types except record[]. I'm don't know offhand
how
to even make a non-empty value of type record[], let alone get it into a
context where ANALYZE would see it. However, is there a particular reason
to
make that one different?

Oh, I didn't update all array types in 2 tries :) Fixed.

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.11.patch.gzapplication/x-gzip; name=arrayanalyze-0.11.patch.gzDownload
#14Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#13)
Re: Collect frequency statistics for arrays

On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:

Thanks for your fixes to the patch. Them looks correct to me. I did some
fixes in the patch. The proof of some concepts is still needed. I'm going
to provide it in a few days.

Your further fixes look good. Could you also answer my question about the
header comment of mcelem_array_contained_selec()?

/*
* Estimate selectivity of "column <@ const" based on most common element
* statistics. Independent element occurrence would imply a particular
* distribution of distinct element counts among matching rows. Real data
* usually falsifies that assumption. For example, in a set of 1-element
* integer arrays having elements in the range [0;10], element occurrences are
* not independent. If they were, a sufficiently-large set would include all
* distinct element counts 0 through 11. We correct for this using the
* histogram of distinct element counts.
*
* In the "column @> const" and "column && const" cases, we usually have
* "const" with low summary frequency of elements (otherwise we have
* selectivity close to 0 or 1 correspondingly). That's why the effect of
* dependence related to distinct element counts distribution is negligible
* there. In the "column <@ const" case, summary frequency of elements is
* high (otherwise we have selectivity close to 0). That's why we should do
* correction due to array distinct element counts distribution.
*/

By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const". For "column @> const", selectivity
cannot exceed the smallest frequency among const elements. A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Thanks,
nm

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#14)
1 attachment(s)
Re: Collect frequency statistics for arrays

Hi!

Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that "rest" behavious
should be better described by Poisson distribution, relevant changes were
made.

On Tue, Jan 17, 2012 at 2:33 PM, Noah Misch <noah@leadboat.com> wrote:

By "summary frequency of elements", do you mean literally P_0 + P_1 ... +
P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const". For "column @> const", selectivity
cannot exceed the smallest frequency among const elements. A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Referencing to summary frequency is not really correct. It would be more
correct to reference to number of element in "const". When there are many
elements in "const", "column @> const" selectivity tends to be close to 0
and "column @> const" tends to be close to 1. Surely, it's true when
elements have some kind of middle values of frequencies (not very close to
0 and not very close to 1). I've replaced "summary frequency of elements"
by "number of elements".

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.12.patch.gzapplication/x-gzip; name=arrayanalyze-0.12.patch.gzDownload
�A{Oarrayanalyze-0.12.patch�=�[�F�?��b�=/��0�@B.�����
N�����#���7��H2�k���3�Z}�&�����������|��h�k4��vCg;�������������wO�����"�F�������A���&&�� ����=�0���X2�l��N�a�nm�c�����|4Y�?�}��R
A�@c��,	�
��"�L����=�P7�O��Y�8�r���Xr?C*sg�����q�r'	���c(y0���j�)�������V5�����Td�\2���$�����i;����(��;������
����N)���"n�?�|�M7�5hE��o�%8���9/]a�P���A��J�k������X������	Z������r�����A	
p`Q��Pj����0N����<���8�����n�� �/P����!��ADN����%^���`FS�,6��{���0+E���K�p��(��WW����B/��
}e��U��r���n��nK"�~������,��'pCXX&lb�rp4�������7��%�9��-���m2�O4����������!O�t|+)_^U�����P���$	G	��������y��s�4���0Fh�����0b���EnZ_�=T�q�����w+�Y������F���sRV�Ab{r�?�m-�������p���]n���#��I��B���B��H��o)��x�o�9�r�S;�O�������y0N&C���p���M8����G'n<����b��LB�5�g��L1(�D\��'��-qb��
Campd49��R����&t���q�g�����7�)�����<pU�=�����8s@�����wR���-�W�k��.g���l<�)�g�����*�X��s���4��r�Y.��^�=U`�c"P��V��m���Cp6	��J25�kt0�F�}���k��p�����^w����vC(����j7^Pg�F��NQz���^x.!�*��*����CP�;x��H�<)�����r���q����If���	��
���"m�[��@����O��5D�Gk�I�4*V@��
&6n�9�`��.�����?� ���.;������69pp�E���,<�B2��+��>�z"������@�������F{r�����^�G&�R�:�i�Y�}���Z�};������Cug�t��i�N�~|��#2$&n�������^�
d���[���X�X��X���*���e�� ���rBdf�D{(/$�C�!��p���;X�H-v�����o��������.��+�/Yr>�{��w������Up��qv�pv�����!N�+�NI�O��$xvu������?/@�����bGvj�4�5�@��hxn���B��$�n�	��&�#��D�f��������@����d I:M�Pg]���E���2���[����d~�W(�/vE����%����E�V��%��h%�Y��:�$�6�9��f%�~	|	������c��e.�2zW0��:�<�2���/��<�2�����S�g~6�V�N�����w* ��&����I��1k�����_��<�2N���1*_������.v*�sA9lX�����uE���v���L���,���D�3�R�
Jqk��3Z��n��Q�&QN�V'�p�g2�����t�?��I�X�,k$�������<5P�/��dK�.�]r?����c
�X[��'��5	=�������	�[t�.�[��~����{V�@>VS����f�|��s����a[����e�����s�"�*�e��2S%T/2xD"X�%��6ga4f��P�l��^
�<�����~���k�oM~������/�>�L'������di������v���,&n@C�6�M�.�e��I�����
k��R#�*j��p�,�:�r#���V���a��H8��b�4'�������X2� ����J�����Vw�*�>F!�k`u����^s��i]p��z.����4A��&�����-��W"-���pX$Z����!�����o��G2�����������2�qec.m��������-2-���a���/�^���!�q��7�{����^%@���o��	0��(�\�N��;��7������?A����V�L�������9~S>��h4V7�y��LMn0F��#';��5>�����O�M��|������X�������	���qfAB�y8g��Wx�UN���~|�=b���^��[(:p����1�oTOcLJ4���K�0�.����8�/����8(��Z;����!m=�?,7�+�	[�*�dB�cy`�p{�%����l[;�<�M��e	
'��>������:k�x�l���jY'��_�^��
��S~��pF��PJ�	���n�;��t��������d�����Zz�g&��SG�.�qxoOP���$���N��*�}5��P�X#^Eu���|^����ci�]7s�O��J{��>vlgR8�%���J{�Z
\��(:0���#6����������l����TK����R�N�g������E���a�>g;���������/�~�C���s���G����b�@�� ���6�Z�h8:b�S���	\\��������u=�7���|�x���������t2]x��_[)w�A(�����
��3���P7����
L��$�x�b�����)��L���kS�����`��v���B� ����[;�c���Q�Z@<c��5a�TK�e�%������4�l��	�;!�	�EW����t����7):��{�!>H0}'i`E]v�e�o.��{�����`j����������)p����l�Z��*Fc�UqGX��d(�W[�r�"z����
 �?���J����C}{&%���x��SAC�e?�t�u�o���pB�Q��
!K8j��T�C8��R�)#����00��P����A��&QM����G	��~��p�"\��9���������3�����7!�r�J���u���O�Q�>�Y�g%]FW[��G6��:N1Dw�8���g�8aat����53�h�.�8�(d���=4yj��h�I@E'QS:���@��EKK��������e������rJ�^�+�:�lPrEd��+����pb^���CN����(~�����Y�bjj
�k���]vO:'����7������kI����?��tu��=�JM�����'R�=[?�G&k�,:H`���r��������	E�j5w���ZpN�<Ts������~-�C����������Z�r��F��p�/�X��q:�9?�b�
	��f#��5�g����j�Z/(J�.1k�q��x��p/E��uW<��R}�&�	Rz���F�4�X�0z����WoO�3�$0�;��w��;��1~��H�b{*_��c]�B%.�	����&ec�2���4 ��R�	w=	]z'����C1C�,�$A�~C�6����N��l(vJ�rD��[7��n��B�'��|G2H�t�r
����y �`���`��u����sl:��_
^_C����N��,C��]s#���B���x�]/eS �vd�'�h���1���m�c-[��'dm�!4��f)7�<�h���V���R�u��T#��
�4}R���%��mkZ��&����4��&�ED����OQ<����=-�,�	(��"����5�N�M�L�������BR�D5�R�����!\z�>z�K���*-�|�+�-"��n��H)�����(mmy�
h����tT%�d�_�k��`���y,lQw4%
�p���<�U�-uN����>��*&�I�\�#��
��Y@�[z�P?C~xHl\U�0�;���������Mx�i����V*Nz�����gt��IeO��+t�I� "O)Ip�a��]i��M�IRu~kv��(
�_�_�����P,��zu�#�K�Qz���z-�Xg�!�
//�kt�j;:���3�`��%��GY(�����q���t�n��]^�9;�.���k������:���iXz���0�!�`|�,HQ�\F%������#�?�@�1���]�&���,�s7Tu�:~p��m(IM���
����rQ�R�l�����7f{c�ah�P���l-��������+l�p��eQ���@W���������38���(�6����
���Sc,��������]����W�qgSXi/���<O��au�=S9b{I��R����`�X
�*k�6~q��@~|~��r~��$1�����o�d�X����Ph�u�{����#}�zw�;��W�����f}T�M�R�_�_���_����$��8h9��Fx�Y�m	��M�~�����#�b��F�2�$Jg2���X~�4
�))RKI,���4�n���P%0]] &��An���v-e\R�h���jW���S+��l�nJ�!S����{���1���9����`�w�)3#qRs,5��v.Uq:9��
�w��z����
��F �}u����������5*��S��
�e���V��$����O6?�I+-��o}�c	Y�5kp�7��{�]�-k�p����J
/�qa�>���v�_x�E|�u�>����KR��{	l3w���#��W<�,��)
}��c���j�?����D��=�f�`
vb��aC��Y����G�=��[���4��!3����*���Z����v���u������`^!�����e�@�.8�zn��}]�"��Z�&����z-[�0
��A����V���T�D%�����CUu��b�+WUK���kH�[��,�#��j�M�j������YjT_������?+�n���m4�v��J%����\�	�k,�E�T�P5�"��r�����P��ZB��Ns6cO����k@��8|�J�Fy#�3�X���t����D��bz�k�����B�!UP����}P�,&��r�':e�������&��Y�g�G,2�wr�mB#�f��2��
>�fC�����%7:�e�������$���'��T%�|�s0E�P��{0����!x�����P�KCgG�g�������`�N�J5����L=xU}���E��r�Ji�<���B
a�yi
g��uzI|@-p5�gF�F
���q���Q+Ew4�'������z�l�I._�
xP����>:I���^��� ����r�M�~IR�Y_��=�FhS/E��G6����M"�
�-��U<�m�]�7GX�����1BS�����a��4x����P��)��2�h���S�����fH>�����b>����<�`��V�����F���-��S�e�-C���\.Lj�>I�l��@	C�{���H4�a4�u�q�g*;^|���\��������3�/Wf��K���H�3Z�#AS'�Q;�5dU�#8�G�rrp��i��~������j
u���u�q��J=eIt�����ZM�� *e�O�E�����!�{,��\��o�uR �*�ZC�u��f!-�J����9�]��2�7 �c������k��47��"�SbF���ST�R��JR��~�|>�v��[*V�`X��Y�K�+�����`i���O�	}���?=]���0��������\=���N���8D?�L��i���h�`�^��<�f���{��y�2+"��L����F6���+�>3�#��'&�q�y�
�0Ho�a�s�����X��z�.K��'��p"WXq���/]�>@�l)�Q������K�����+K>��5�^i�D
 �RM|������{P`�6�����i����s��+
�h�W�Nb*�+��B2��A!L�>�L�&>�o�d�@p)�# ����9���Y�� W1we�fc��_�������C�zfl,�����wA����S���z��<�B�+ �e����v�]��2��RX� �(IG��Yu&�9��DT
 ��}<�H��"��@�d[iD�����0U����?I_C\�^�]��jP�E��}���A2�����o]o�H�M?E�f�M����nq�oY�uF�������D�b�`P�3�>����@��$JIf��D�hTWWW���Fp���hm��6@c/\������j?z�$U]����S8��Sw{�=����#��	�����V}�g_�2n'�����	4�'��b/�����}<��h���\����"����c�a�!��?��$P^�gl`aA�m��P�9��(��y3
�X��Bd�J�Y�5g����X�;��6��?+H�_{JB��>H�ADnt��k6��������(6�O�3�����:��?W�&1�x�z
8�K����Yl-�L`�x
+�bE:S�����`
�]��&x�0�<[�c%���aTg��[D1F�H�P�����m���,h]�}�����j���\b��Z:�1�����'t����0�s�#�%E���(���UP}v��B����EP�
@C�(��m�n�P��]����KfLK��b$�]��NP.��x�npl��h"���0��NVm�[$^��
��%g��L���������{REBv�g���E���z��IK�: Y�(cz�YmN�u�xzE<v���{�Z���:��k����z�cd�;��W������X�o3���h��(V �|HF�����L
�k�\a���*P����Y]1���������)	9�
��M#�G��IevOk_q�(�����m�e�Z�2T$��H�����,U>�������}�w��"ud��6������h\��Ju��b��4������
��YD�6�'��N�D�sR��(aN3>-�Zn/t�_�m���eD$�p~��3�%�������.�;9X�%h�#��arm��8�p���1nR�$��H���<���o�����L��qu���22V���Yp7���c�d���[���Q��>-���j��d�F�*{����jXv��>y`t,�A��&����J?]�����T���=��������g>���V�X���i$?zv���
���q{V��v������FhG�|��.cG�� g�'/-���S�[*��$��W����0�
����;�W�T8���`�*<����|�PG�0�T\����-	g�N2������r>����f���s�7i�-�	K7x	"S+���	�E@Y���g�
���z��[7IV��*o�6t�	��B�9g�-h�R��0�
7���y\�2�S4^�W���Qq-������(I��HJ�b#����MOF8D�"�
��K�]#�V@��y���s!=����'�W3������ 4����w	���.����c�|��a�g�9�A"ik�A;C��>U���qd�AK�1	���m���W����}!��0����{�;k�iow�2��9�UG�U~�L�%D1���pE��C�M]L���v�&2k�=����%�fV��%:��`���@�=�(��	$?$�_��8��Oj
�g
��eM�%9n��g�
i��I��m�K2�Q�#����%�!#������b�I�����}g�R�"��
���$���������}��?}	�8L����2p�'p�������9��V��H��F�>6��O��pwh��R�Z3���vJ����i�>'�h`�w��	9����)���8�y��"G��La;��tV�pgB,����8""��?)>�?�����c�����_\���a��XH���T���d������s��Q6-4V}ye.����(I��0��W:���K�����h�SN�:��x�Bu��but���2	�&H����Gm�e��F|qm�����0bW�	&�\_Q��
z@#�Z.O���9�k�8�����Tk�Uh����G�&$���&�H�r�KR�@�k"7h�u���\.���7��)#���c����9����md��O,%���F(�����@kk�2�4o
���2[��#.J��`�A{����C���Z�����Kaa����3�N� t��m���X�Y���W
�S�V-g���(�L�l������-"�>���n�OU��_���j��" ���\3��HCj������7����/P6��hq�f�bX�J��2	�X�1:�P�a��>��}���c�
���8J��bOR�q-f!��].��/��^�k���"WZQP��<�rD�P�!5(X�
�eR}�sL�.N�r�m�2jEP�<�J����~U��UHe���H�;zm����~���/pB@UnM��*�G��c�G�C��<�����]�P��Y6���(AN�+�--e������9�qrs_��{�3��+�NG#t�� �L�#���m�����k��K4	�E�7w6K@�����#D$��e��z���5�
�����Q�?uf����k�S����E�f�S�3zb��!�E��/��*�7���{dfb$���C�M"P&(��W����g��&?�/��#+|��k!�3RZ��<�Z�W�&-K�T����)\�����;�U�1��D�[E�Ms2��}��k�����W,���"��-�\R'�������-:`���#����C��$=JOtV�7X�}y-W^v�4����t�s?��o�g��hz>e���	.;u��,[�x����dMy�[�E�ZD[[�����
�Q��:���I�����}�%��
�{��1(Dq�.VWW����j|�_qSiG�f�_�#f�N�_qK��p��j��F�6�O��c�b\��������S�}n��e �w��@�N5��tCP��Rd�G5�[S:p��B'��?��hs?��e�����^����A�"�b���1�%�����p�?��g<`V�s+�B��Z���%�0u�{
R���o�)5��M>���*%u���
���A[�]l:u�]~*�yIri�)�(P���Y2��_����!��$�kp����>rB��-���KdOH%�	��\u)����=���o�@/�P�7}�R�����6X�����k�'K3_r}��l����s�[���C.L��\iKv�,u��(�'���8v���R��af eR��oFr�72N�Br�7&TA�FgI�H������2-87�(N9�c����D��!o�����������9�k��s����B�93fR�t�Hj�W�I��i(�e��D��`8����,��/������Z��.<�/��VA�eC�	��4�j����!�L���c��8���ZI6[�Np
���ts&�B�1�0�)G�U�G�l��z
�������M\2�G������^&L%���1�>z������g�Y��)n�!E)���HvtB}tt�����MY�Gf/�&�����I�d' ��$��![��^�eRT��`���
��� ��u���C\FZ"�O9�������/H�������T�Q�L>�&jo��2[�~>?4�(���KT�1z�u����GW�!��8�A�
8��:q���
��q�~qc��*����~�/��`��>mx�i��O���\�D'a|�~��wX�Qr!���g������v:g]�m����'�:��y��������xW�����,\����������@@MF���^u�A<��q
��i��$�X%�;&Y�h�����G|�d7��Y��3�rg���[	��"Ljqd�{T%�������&��I��S��^8��%�;K�C�}z}_�C���&4
b�!+%�{Ao���C4��@��s���d�j�Yh�0#��cYC���[���Xf\
6	��n�fm��
��������)���������v�'9!A�[a�V�b�+&mY&UGyK�+3�9_��`�s�p�U���b���DO����!P�������TL��s��
�u�R�:�b<�9���r��gm��i����&,D�
Z�%���"��D9�8Z��\���"IS�����8�q�r|)"_�5��[�2v��>-}���K4�#���r������,�=���s�����xy
W�tJh����|b����-��cK�/�KuQ�'k�m[Z�)o��D��X}3.������e��|g���{$-��F(�MJ�a��	��E�i���OW{
���1���������Wp�C����&�n��qn�
�$�}��=��Y=Y��tB�]�/��!����d��-�jU�-Q:us}�>*�:@�j��D�dz��
��ekcG�������R857�G����Q:��1��]k����.(�
�(	�k��>����\����i�O�V��S����"+�n�8��$�Q�5���G$����Q���"��$���,:�6.�B8���u}e��q�����������G�E5����@����B&��Di�
5N�d�_C&+3�J��L{����>�������C+?=o�\u��8s�	U��j��
6d��P*���A�����8&����L�}G���y�G�2:�* t����b�P+�5I����-�`p��9��(��I.G�*Uz��J����Ao'M����='��Y�v����{�����<Z�=�m^�{	�Qb���~<c�8������o)��0���x����iG "EJ� �cUJ��8I���)���y��m�g���d���t�����Ox���wMo:b�Y��������d6������Q�n+���}d/������������%������S��?c:tQ;E:���YG�^�F�T^�/�#4	<r�%'��mrEZ��*�i�,�dh0����"�pP}{��{�Q�}��;?���x�'���%+"��	h�(��FZT��t�R�u���K������W�G���u18��4�t��Py�~��n���J+Ya
+I���9�mG������P6�.,�!p�W���u�������6�x�������h������(NS��R�e9�]��Y����}8��Bt^}�r�i��(���+Q�+���t��l)�\a�����b���wGGUR����7������[�ca�=g�Gv�K2��p�e�������6~<r4za6<$�������
������	�KL�H,�S���*@S5��2��)�8/�hE����@�
��E}-
���������B�7/����Md�_/��^�^�0xP�_�q�h���r��s����Kn�G6v���5/���|��7��}����(�P�_�{no&?���X��a�r�V��y �h��L�(�2�����l�����f|���c6j��L���43��xl>����0�[�%_�Q!�S���Cc��~)�aP�������"�����v������2)^�i{���iKZe��Q53�A���U*h�%)���F;u�q/�5�s-]��l0e3���p��������/���)ve��P�n�������������}a5w��_��k���Y���c�D��i��{^]�mg�We���I|�1�D��
e���b���
��cN���WVNXC:]��O���>#�*����Np/�3$���[�u7D��Z'U�L�Y):�#��7�(1
I���H��H~<�L��RE�2�qxi�J.2��x/�>FhY�R���������>���_��-��.%P:����0R^h�>S�H�d�i����(�p�X���Fh-f�� ]�M0�I�pLg�:+;&��;4��L���Z?F�
�{oV���x�?�h���"N��q�8w��C.i��m�-��1]�X9�w�� �����s�a0����T��[���c�.��A]���1�W����!��~:�-��C�h�K�iI�i:�fm���j�r�?[���k�	�����f�}��:�2��X����@���8�4��sHf>�S9S��(Z�`�F��TXE��	VX���&�[�����0��Z�2N�h(�rR� 1��1&g���{����D`:�`ID5��Zd-�!)cLO�����'��8BU�5�9��UX�t�N��T��aJ�������h�
��`���dMT-���B���1:+l�+��x&��c�54�\��*�*��[8H\�����w+g��g�>x*��$p��KsM"�F����q�2����B�z?Is��k�Y4I��yrL�������=�>��H�_S ���p�������cv]�f�1�YAG��Q2+�C}�[:����B�����/|���2�R}������:��d��|cR%k�5������?������2$!�����&��bl��+��8�|�)�P[����`�1lQ|�4E*�������c�����e ���dT��ZS~ cs,�3#�8w,�dz�i���V����E��
l1����o�����r%���l�T����2�#�L�Ib�Awf;W�gu>]�Ab}i^Vr[���&�i��3B��Y��00A�$�>���X�����h��cy�AJ��\����D��F(f���Q>�k�kr�Z�A���� .�=Ga�:�JlO��i_����	e�-s$�"��_H^u��,0������&A.���/>��-��nDa�q�D�n#xO��R�p���X�Xi��2�0&G�����z���x�<�%��(���^����)��|��=nra�C�����:�����"J<L��\����6ae��<,Q��q�o'u;.D�j/�3L��|���(F���������2�1E��������9;`�j�*�)��[�y�xt�+���/��r �D����%�������m�#�p5B�)��jmR9���d/��]EXY�I�Ry�^�w��{G��S#��vOd:z�P?�����.���p�����j�x p�^�r��B��Po��~�\8#���]b��y���\A��F���m����/�i�"�,J-��v)����
�gbn=�����o��Y��rS���9)�5�>�sbU����!�|
c��P2�����,��T%�<��"w���%y8�=T�S�r��S��HO�c�X�(�hJE.�"��Op9�����eS<>qY>���|lT�c�\��P�_�Y;Fv�|N]��%	�W)�&��-�R�M1�^t}+�_��P�eza���Z��z^r��I��&2*c��@�����21R}=��|Ml�]pt.�S�*Y`n��'��
u���+RK9�RU��IY�@�r��(��'�`�CJ�����_4�A�����������&8�[	���I�V;l������y 
���U�.���ktyZ���������'^:�V\��4�k_o�/�����������5��j��v�MwuE�&�����m�����b�}���zF���_�O����z������=syx���S��b�R�OO�R�iwY�=��9�����G2�#��3�E��}Iipn� �d����/7�\����,ot��,������EYj�D��Sm����P��p����x�FH]S\wCg*���G+�����gn;g��C���p��a�wr�X+�����!��d�����z���[?���q�Wy�s[8�*�lEKj����hFvS�b�H,iL�������
��v���I��5�!��~i�%�(������*&�t���5
}�=�m�v'x@<�c�AL�s���Sj��.��vq�V�]M��������_G��	?��T��!X*nK�*��)~h��k�z/���/�Z��k�'��z���;����8�t��>D.�T�CP #�$TQ���~�c���U*����B$UT��/iD'"�^1��;2��cdg�&�|���V������L������E�����}��t�[	�V���\��OE��9�]X2���������R�S��0���<wN''%\)6�Q��P�mj�X]��l�s'Wp����$h��]��E�L
����D�OuT3u}}��\`s���u�"-���En����J�������%��Bl������EG��w�����J�	�:t��)���SS����V����4���I6d�/����0@GyJ=r8����S�K�'�g�l� �Z�������i'�2II�4jN#5���B|�h�`��x�C�o
�/�~0C���p&�/>���1L�P��x/�VQs��!�F� �+r��_y�D[�=)�d\�$�N����faa,���5kd�����H|�1��\B������j�$\s�������<��
gj�fl���@i-!��}s�k�j�&��8+�G�����<0�`�#�M!���d��r�P�FY��F����?��L�[�dR���Pz�\:�eN{+��+k�#:���'�+�rQs�:�eW�w�~���(R_�r>���O9�3i���e����)�����JW�=��,o3E�
���R�#U��/8��9��L��j��|NJ��)X@��3'��d��8)(h�������:���r��X�R@�?�1w��������8��:�����]Cs���6�(L��������\?�Q�<����FSG=��L�
����\
X��:�#�5��T���b�Gta;��Hq5��C;,*~�A�%��Z����d���tr���El�������d*���|��8R�?!y���-
���,X�=���Q�1��Ouk|����/�������W?#U?(�<bu�E_���N"�l��V/T�=u����Yi�dQ�Vq&Y~�I'�_��a��nL*��_���}UK��?Rhp=����|�Z�(^)�\^U4>���)�K1��ri	e�|��TSB"��+h�_�<W���x�RXL--z�m^�)"P���B��A�P�&�����S��.���r�����������������p�����������6`�s���������d�n
����s�������v��_{�n���x�����2F�!��N�'�������vA k1�@�)�|������Q�?����O\���?q�}�.���l��/\��k)�����m7$M���%(7�,���Y��v�=WF��$���Y��1GJZVQ��	P�Q���Oe��(u�>��\����V�ia��h��td�u�0+'Q-W���r�T'3��qMX���CM���'�o���Xs^��*f��mF��������$B[���yv��j��o/�=mW1���{��B96��"�|��h\`���D���d�Y�H��i�D�y���>��`f���&$�1�}�y�y�Z�����j=�����kF�.�R�W���FQ��$O\58�"d�0CQ���2'\����\LfS<��k����{�}x�VK�Q���b�/p��_wC&����V�j�yx2����F~a��$��X�K5ug�-��{����ho|���9��8�1����:�����SL�?`��z�"1�k'���p7���W-wPR~VJ�r9�Mra<����@Y%):x:�T�e4s-:��[6a!�+���]=��H���(J\���v2�4�#�#@��I��d���	�[Y��:���58�k�[I0�'����x�95����]_F����-|��3ct^����@A�~�#2:��]P�|Q�w�V;m=��B,q|0TVI��=��$��`N����Y5����a
��1� �J�%�$��N����l��<)������S�n���7��69;�|�&�����,�����X�9�����Z����(�R������V�s�i�Gy���|�1qRz:��[��^P���G�
�$
���r�{aL��^Yzq���OK��	�ste����u����x�����d��vj�y>	u�_�12��"��l	^��c�$��<�i��]���s9t�th�m�bTJ6������S�6���#�*�8�HnR���ZV<��Q�-,l��o�W���[N�������~L2�� �������s:)�����-������1�O���P�����[sb\ �.�2Qi�(�� 
�}S���|pWh&�����.4UX'(��,���E���b`V����
O��(<��|�����������7��B,Y+�Y��l����{ ������i����SJO��	U�SA��f��V���a�������&��������C�����4j��3�;��>�3{���PFA�9yv-J|���4�(��T��9c����za�����=�'�k)<�q=e}Y�������4�	��m?up;84�ME�0T#�a�-�c�x���{2�z��� �n����g��Vj��<�|��� ��O�$�t��{�c=�����II�aw����
�l�4R�f(G��@��I4]"��#_V���f�Vs���j@��������v�8����s�jT�g��t�w_���,�8�� '��[y����o�x�;�����s��j._�����;�|�lZ��s[��_�7����	_���n�[����8F&�;?��?�����0�}���"r�Lb:����l�^��7��'���nd�E�c7�C9�lY��������$=�g�������$�%��m�ZEl]�������$`�|�Z���A����~k���X$��O�V�'���f-c3�
�����*���p��	�%&i,1�����ha��B��8ul�]F��h�S��{Nw���i����&fE���8m��,���y��e���yU�+
��}����E$��:�}on�
���DEs��^on6�T�<����a�=hH�������8N�\��_���������2W�QY�'Sx���Y������VQg�0����e0���p�����A�W��l�(|a�: o�yrAx��sX�`L�sxP��'�k+�����N�8O"<�At���|�n�����6@�`'�
% 4���p����X��FQ�;<H���L�TH�N��b�V@�~U��
�������xzf?����S,@-�\�Vw���5�U���+�%��_s��d���e�orP��j��
��-Q��JM�Fa�E
�r����U�����q�8��o�KE�#@�
�W���W�������J�s ����������[H�J�<h|2��d�+�R ������	`��JG�^Q����k��r~����
V0.b�V�����X��!�)���r"
�����'��%OW��%��bw�l����������2�v��l�3����U���h�5P���U��r����9<y����D*�J���67*�n����_'�Y�?��D������x�������
�+@���~u�����1�_��&��>J	/�a�W����������CN����*�d��*���m���:��X����@�����^�,�(pN��=�m���O�zp�_*����v +�>~�"|8�0{��Z4�_����a��/C$���Df�"�9����������D���e�>,�����\g ?^���h�67�?8����N��� j���juM�o�J��@�Q�����P��s�h���$�,a1������3�&)��1�����x8'|���v
.�IC������S�[@�6���E�[�F��``
�;��A[zN� �)=�^���{uYJU< �����{����������
~���Y�Z�������j�I~1��V��k	t���2�
�t,d�^���^��l�4�v������y�"��X�-�$�4(�_DP�:�
]MQ��!�`U$C � ���!5 �="��Vk��"������@u������Qv�H"����R��0�L��q���f\V�����&t���5,�{����	"1�
$�Z}�m�3�M=X��f�F�7M�h3����J���CY0�����67h�H@������"�k6p\��sS��K������"���l[��x��V���b��H98zS�_��6��5�����?���=�p�5N���.�]YP`-�(�EZ/r���[l��Z�3$�	E�|�����hgop��
����N������Sq�=�>������������v����>�����T}i�p6�����z���R��Mx���*c:+&8�������Q��Q�=�=Z�o9��A5�R�������P���O�:�$�^��3���10��I0TtHSA�~��~�����Ghw����A��7~x�=98:=��8�+Z(��#��4q-X�M=qK������-��^8��u��fg��L���~[<��9p��
U����#�sD���Z�\I[!X�W#:�s��7�"�jYJ�F[�-�����JsQ;�~� ;��X�8��y�;���
�=�y�w�����77oA)s�M�J��U�]�R�.��`�a!���h�\��k�7��������f��7+���M[���l��47K6�T�[��2�P���lUZ�F2�Jk��H�����Bx)8U`�����+H)�
m}���7��8��>*]��m�{Y���#��@��S��
��Yp��e&J$>��3
�"N��A������S���`�a�U�J�'�2��m;%�j�`�@�����fX��������U�*�}YZ�4������_$�+��O�pwZ+_%��\q`�S���@RUe�����Vm�Q��r��u�l
!������V��i����D
��g���[�8���Y$I��Z�r\�U�k��p-fn9�����Y�[�Jp+��+��+%q�fXMdXe������r�iv�p��L��������4�j�����>p��������Q*���Y&wW�o�d���|���2}�bk��^���"r���f�����lx��t���L��X����TV�2�.<:��X0��C<����9HR�qS}*{�\����E<��	@�z�9U�����'O�~�W����?���������^�j���R�)Q������b�4�{����"��������Ia�'+
��nG��\��{S�8�H-Lb�
d��
�Nu�V�4g������]�Yr��5������������&����������,�[mQ�@��%i���y+}�]�4���zm'�d������k�lv�%�e���R�g���;�[w���������K���1����H|�w�������w|vf��^rx���^��T]�;�p������8�E#�usn2?���c37��M���'�L���n8/t�j��F���s�h�F���0����P��8�
�$���Y��ZE����@l7!��_�������n7���
�I���0������9l�O���Y_]]�=)r2n�0H�4��n����C��Fk��#��/��~�������\�t�8�+QE��j���k.=���P*�
�K�/��c�G���ks��u���6}@-�-6��q&~#����<��2�.���<����in�;�Mg��Pkp0[&�
4��<�L7�U� A���OB�������=��p���)(]�D�5�����<��M��#�8�z��f��s��5"G"��~�������n��99���?������nb���
����</|�������x�;�2U��_�����^f:vN���`@������.�4Ky���n��\XKl���/1�,m�����f����c�NC�������M#P�Pn�����_��V��W�m���Sp������p����p����x�Y���*X�{
na���P���b��m��������!>�?���b$>��H���;3=��z�5
��m�;|-�mX�|7�dn	_����|��E�v9$n!|���������4�a�C�\��
�N���|��)�v(L�D��x�����]������b/���[�����O^����A�v�lC����{g�M���r��;o����x����r�)��s�f[�^I����zyK;,`L���B�x�zq���Q����%�t��_?o����.9�km�c=�zIC�{�V*�CD|��R����h�?����Ix���
��[�[�$H/Jx�v���e>|�m��"�-��mH%�^��������<2�h1��,��e��*���h����c��2 �����HQyX�;�@\���/�a	�]�o)vN>�2�� ����l������\�-�)��Q�F����s��k�J|BQ�xH�8]��������;�������]�p�"J��	�����-�����
cS�Ia	E ��g7���TGJX��p��>Fa���F<�z[�������K�PCb�$u� �x�J��n����A��e+-���n���X���l��o�^:�wl�^:�wl�^:�wl�^:��`	_:�wl�9�HK�����������?�%}���c�����[K����[�����{K��A�]-���-PK���-�9���U����|����|���e|�����{���e�$���g��=��J��s�t�����t����psb~�4,��p=��C8�
���!�
�}�%���{�R���|��7h{����r��q3�l+����}�&���2I���w�w��u	M���8���e�!����/5^2w5�>�����������NNw��G����i�I�pW����	��D^z��X��-�i,@�q:�G�������V[������[�7A��l9�&����I�u��n����t�\��ndh����5�y���/��hI��/���E��.*m~Gb�F��FnEh[R�F2��\����/�P���3��
H_�����{����7�-�?{%����4yU�	V����x�w��)�h�Q��zS�6����(���k�������o�Y��������yM��X����
������d\�/�B���[�
�^�(�=���fJ��wKVd	2� �
(�#��`�"������Mhs�is���is�R��EZ��y]�hbc�!�5Jy<����(5"��}m2�YD_��s��
:�>�S���)���?7�_�8��C�\�������FA�2����43=m����5I�'A.+��
�}y�	S�	��;�j�h�E�F���A����0�����n������1�lg�zP���j��������\����t��:���lzxP���*�-f�O����r������%Q������o�]�������b:�jw>��z{���i9��i8���G�`z�iOEuOSL���L�����w?��l���8Z�����k���
|X~�Lz��C�eBd��<��d�#rt@7']�|��M�%���D��k���q��g�L�y������J;W0=Oj*�"�L��?���n���3�^F��U_���m�$�Kd3����7��aGYN�����9�%~�����eM��8���_��`b	�29���{>~
_����:/�������A0��E9d����(%��-�" �0�Wnm��`Us��,����r�����
iz3{B�&����_����/k(��r��mU�n�h&���}��w���8�B3�y���
���`�z�G���g��������������=�4��fI���-�"�.��T��=��e�|?��P��N�l��cc����O��>qku�?c��*[���V�H�honl�����2a0���,�w��67�p��_�pi:��T��wO��w�)��(���h���z��h��U,�:����.K������Y�\��,��7��j�������������M�?_���R�wk�6��~AN?&�E���\�����������;���[7�'�@��P=�.����.��-w^�e��CY|�hW�nt�����.�3�Sq�a�� kQs�����~���~j���f���i,������4���;�;�oX��Q�W�6���ov�l��%�`R�7���5'����HrR�H�*9����;5�x��"�ZH��%����4�uHJB>j�!�n���`nU<��<�s}��6ii��r��@C�c&�L��s������n^(�7�A\��������9L���t��'��;-���%�������m�����,:h�������G�C�M*e\S<zDE�'I8���IHb��TX���N�r���{R7�
��k����~��%�|_���CT�e����y�\�,��-V��"�,Ev�����.E��[{Ps�P�]��<h���z�����)���<:�����n�7���N�[��`(@hu�[���^_�D��H���M����qx.������F�Q�w��_����?�9AC�`�4�;�����Dcf�7QU��\�[>�S�5�G�~��o���Q8H&c����u2�(b����9m��������`-����qo8��k�x4
��d�2��f��"��m|����Yon6���_`'��T�a���.���g���0@������zv��R��%��N`p�-�j�LA#�w������|D�;�k��>$,��f�A8���'������,����T�XlH$i���*������4H�4�?j��+�_4�;�>���Ex�31K�!
��;�s��������3;��6I��7
a4��j���K}U��������A,���`������� ���B�����q2W���~���L����7'���w�@�a�;�n����>?����,�x*�D0���h���4Y���kf��oC#�'������~����]B!�3�N�`�I�j�_�����0�P�y�e�#��"�����<�>������E���_0�=	�!L�e�~�`��&��`�'�y
$��j�bf]~���^�:�	���t<��$��*���,�MJ��1�2Z<O�q���w����Sm��$�9���"��y<zyL,���7���D|�[�d��`�O�;�O�J9�QR~c
�n
Ll-�o�a�9F�
;�E)��(��j�; �������>���L����'�j�����Fn�����
p����p�s��6�4���������^?�"�8��O���G������{�{�����x��v��'6��c��?E�<��}�%~����\�k����d	���g�����=�����aX�MK/�j7A,h��!^F��
�3��H� J��w�a"�$�]� @@S�-P�����.=BJ�Wp��.0�)j�����_�WMC�=U�U)n����-�E-��y��'�����
?' #���FQ���o'1[�.b|��@T�c<��=`�����7�0��		��O�0��b�W����A�4��Fo8K����0�#�������sk��������iR0+fP�y�A{3��N��Iq�]��p;���g��b^���"&��%������E�mS�
�naGV�9]�)]��T��-7wAk���_4������	m�:N&��46����6�V�4��LD��p
8�V�g���PaL�>#�/�����^�����K�A?JR����qW]�������PT��g��M�0�O��f]ty����<�f��V��Z�����kg���{�#�`����&%P�).F1�:Z�\�yrC�pf��D�x}�������X����|8L��� �U�,<��C������n�4_����`�=�g����}�����7*_���m��m���S���m3S�F���n�6FL�jX),��H��"��#��'�����*��^�wh����������rs����g����c�_������Ou����I�����5[������N�{Q����I(�y$@���h��LA���!���l�AB�wN\�������y��r�5�W�]|:~M����HW�I����.����xJ�+��yH��|H��#��s��O��$�n:5�{tP^���~�<�&�� �C�'S�����bf'��z�gg��2�����*��{���^7�0�?��n?1RJ�y�����h��=��CX,"�\�UMS���A���!����������M�
Ur�X�>��?��z����?��v=�YR����*8�v^����m��o��}�YV��5\<�y����������y��g��vs��?D�`��v���\��N�����N���w�=����iWz��F<�3������p����{_��������/Z��%g����O;��=������;��S�����[�����Vto���@����� ��EA�V����:����;XE����v����	<Z�o|�����������_����>�.�a���Qo|�����Uc�d8,a�R;�
���D���m*�����{��������M.7Y�
#16Noah Misch
noah@leadboat.com
In reply to: Alexander Korotkov (#15)
1 attachment(s)
Re: Collect frequency statistics for arrays

On Mon, Jan 23, 2012 at 01:21:20AM +0400, Alexander Korotkov wrote:

Updated patch is attached. I've updated comment
of mcelem_array_contained_selec with more detailed description of
probability distribution assumption. Also, I found that "rest" behavious
should be better described by Poisson distribution, relevant changes were
made.

Thanks. That makes more of the math clear to me. I do not follow all of it,
but I feel that the comments now have enough information that I could go about
doing so.

+ 	/* Take care about events with low probabilities. */
+ 	if (rest > DEFAULT_CONTAIN_SEL)
+ 	{

Why the change from "rest > 0" to this in the latest version?

+ 		/* emit some statistics for debug purposes */
+ 		elog(DEBUG3, "array: target # mces = %d, bucket width = %d, "
+ 			 "# elements = %llu, hashtable size = %d, usable entries = %d",
+ 			 num_mcelem, bucket_width, element_no, i, track_len);

That should be UINT64_FMT. (I introduced that error in v0.10.)

I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files. I see no
other issues precluding commit, so I am marking the patch Ready for Committer.
If I made any of the comments worse, please post another update.

Thanks,
nm

Attachments:

arrayanalyze-0.13.patch.gzapplication/x-gunzipDownload
#17Alexander Korotkov
aekorotkov@gmail.com
In reply to: Noah Misch (#16)
Re: Collect frequency statistics for arrays

On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch <noah@leadboat.com> wrote:

+     /* Take care about events with low probabilities. */
+     if (rest > DEFAULT_CONTAIN_SEL)
+     {

Why the change from "rest > 0" to this in the latest version?

Ealier addition of "rest" distribution require O(m) time. Now there is a
more accurate and proved estimate, but it takes O(m^2) time.It doesn't make
general assymptotical time worse, but it significant. That's why I decided
to skip for low values of "rest" which don't change distribution
significantly.

+             /* emit some statistics for debug purposes */
+             elog(DEBUG3, "array: target # mces = %d, bucket width =

%d, "

+ "# elements = %llu, hashtable size = %d, usable

entries = %d",

+ num_mcelem, bucket_width, element_no, i,

track_len);

That should be UINT64_FMT. (I introduced that error in v0.10.)

I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files. I see no
other issues precluding commit, so I am marking the patch Ready for
Committer.

Great!

If I made any of the comments worse, please post another update.

Changes looks reasonable for me. Thanks!

------
With best regards,
Alexander Korotkov.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#17)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch <noah@leadboat.com> wrote:

I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files. I see no
other issues precluding commit, so I am marking the patch Ready for
Committer.
If I made any of the comments worse, please post another update.

Changes looks reasonable for me. Thanks!

I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff. The pg_statistic rows for array columns
tend to be unreasonably wide already, and as-is this patch will make
them quite a lot wider. I think it requires more than a little bit of
evidence to continue storing stats that seem to have only small
probability of usefulness.

In particular, if we didn't store that stuff, we'd not need to widen the
number of columns in pg_statistic, which would noticeably reduce both
the footprint of the patch and the probability of breaking external
code.

regards, tom lane

#19Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#18)
Re: Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff. The pg_statistic rows for array columns
tend to be unreasonably wide already, and as-is this patch will make
them quite a lot wider. I think it requires more than a little bit of
evidence to continue storing stats that seem to have only small
probability of usefulness.

In particular, if we didn't store that stuff, we'd not need to widen the
number of columns in pg_statistic, which would noticeably reduce both
the footprint of the patch and the probability of breaking external
code.

Initially, I used existing slots for new statistics, but I've changed this
after the first review:
http://archives.postgresql.org/pgsql-hackers/2011-07/msg00780.php

Probably, btree statistics really does matter for some sort of arrays? For
example, arrays representing paths in the tree. We could request a subtree
in a range query on such arrays.

------
With best regards,
Alexander Korotkov.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#19)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,

Probably, btree statistics really does matter for some sort of arrays? For
example, arrays representing paths in the tree. We could request a subtree
in a range query on such arrays.

That seems like a pretty narrow, uncommon use-case. Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms. I doubt that the existing parameters for histogram size
will permit meaningful estimation of more than the first array entry
(since we don't make the histogram any larger than we do for a scalar
column).

The real point here is that the fact that we're storing btree-style
stats for arrays is an accident, backed into by having added btree
comparators for arrays plus analyze.c's habit of applying default
scalar-oriented analysis functions to any type without an explicit
typanalyze entry. I don't recall that we ever thought hard about
it or showed that those stats were worth anything.

regards, tom lane

#21Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#20)
Re: Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Mar 1, 2012 at 12:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now. I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,

Probably, btree statistics really does matter for some sort of arrays?

For

example, arrays representing paths in the tree. We could request a

subtree

in a range query on such arrays.

That seems like a pretty narrow, uncommon use-case. Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms. I doubt that the existing parameters for histogram size
will permit meaningful estimation of more than the first array entry
(since we don't make the histogram any larger than we do for a scalar
column).

The real point here is that the fact that we're storing btree-style
stats for arrays is an accident, backed into by having added btree
comparators for arrays plus analyze.c's habit of applying default
scalar-oriented analysis functions to any type without an explicit
typanalyze entry. I don't recall that we ever thought hard about
it or showed that those stats were worth anything.

OK. I don't object to removing btree stats from arrays.
What do you thinks about pg_stats view in this case? Should it combine
values histogram and array length histogram in single column like do for
MCV and MCELEM?

------
With best regards,
Alexander Korotkov.

#22Nathan Boley
npboley@gmail.com
In reply to: Tom Lane (#18)
Re: Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Jan 23, 2012 at 7:58 PM, Noah Misch <noah@leadboat.com> wrote:

I've attached a new version that includes the UINT64_FMT fix, some edits of
your newest comments, and a rerun of pgindent on the new files.  I see no
other issues precluding commit, so I am marking the patch Ready for
Committer.
If I made any of the comments worse, please post another update.

Changes looks reasonable for me. Thanks!

I am starting to look at this patch now.  I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.

If I understand you're suggestion, queries of the form

SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
AND x <=ARRAY[ 1, 2, 3, 1000];

would no longer use an index. Is that correct?

Are you suggesting removing MCV's in lieu of MCE's as well?

-Nathan

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Boley (#22)
Re: Collect frequency statistics for arrays

Nathan Boley <npboley@gmail.com> writes:

On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now. �I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.

If I understand you're suggestion, queries of the form

SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
AND x <=ARRAY[ 1, 2, 3, 1000];

would no longer use an index. Is that correct?

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions. Do you think that
such applications are so common as to justify bloating pg_statistic for
everybody that uses arrays?

regards, tom lane

#24Nathan Boley
npboley@gmail.com
In reply to: Tom Lane (#23)
Re: Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Boley <npboley@gmail.com> writes:

On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now.  I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.

If I understand you're suggestion, queries of the form

SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
     AND x <=ARRAY[ 1, 2, 3, 1000];

would no longer use an index. Is that correct?

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions.

Which, currently, would mean queries of that form would typically use
a table scan, right?

Do you think that
such applications are so common as to justify bloating pg_statistic for
everybody that uses arrays?

I have no idea, but it seems like it will be a substantial regression
for the people that are.

What about MCV's? Will those be removed as well?

Best,
Nathan

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Boley (#24)
Re: Collect frequency statistics for arrays

Nathan Boley <npboley@gmail.com> writes:

On Wed, Feb 29, 2012 at 2:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Boley <npboley@gmail.com> writes:

If I understand you're suggestion, queries of the form
SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
� � �AND x <=ARRAY[ 1, 2, 3, 1000];
would no longer use an index. Is that correct?

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions.

Which, currently, would mean queries of that form would typically use
a table scan, right?

No, it doesn't.

What about MCV's? Will those be removed as well?

Sure. Those seem even less useful.

regards, tom lane

#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#21)
1 attachment(s)
Re: Collect frequency statistics for arrays

On Thu, Mar 1, 2012 at 1:19 AM, Alexander Korotkov <aekorotkov@gmail.com>wrote:

On Thu, Mar 1, 2012 at 1:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That seems like a pretty narrow, uncommon use-case. Also, to get
accurate stats for such queries that way, you'd need really enormous
histograms. I doubt that the existing parameters for histogram size
will permit meaningful estimation of more than the first array entry
(since we don't make the histogram any larger than we do for a scalar
column).

The real point here is that the fact that we're storing btree-style
stats for arrays is an accident, backed into by having added btree
comparators for arrays plus analyze.c's habit of applying default
scalar-oriented analysis functions to any type without an explicit
typanalyze entry. I don't recall that we ever thought hard about
it or showed that those stats were worth anything.

OK. I don't object to removing btree stats from arrays.
What do you thinks about pg_stats view in this case? Should it combine
values histogram and array length histogram in single column like do for
MCV and MCELEM?

Btree statistics for arrays and additional statistics slot are removed from
attached version of patch. pg_stats view is untouched for while.

------
With best regards,
Alexander Korotkov.

Attachments:

arrayanalyze-0.13.patch.gzapplication/x-gzip; name=arrayanalyze-0.13.patch.gzDownload
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#23)
Re: Collect frequency statistics for arrays

On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Nathan Boley <npboley@gmail.com> writes:

On Wed, Feb 29, 2012 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I am starting to look at this patch now.  I'm wondering exactly why the
decision was made to continue storing btree-style statistics for arrays,
in addition to the new stuff.

If I understand you're suggestion, queries of the form

SELECT * FROM rel
WHERE ARRAY[ 1,2,3,4 ] <= x
     AND x <=ARRAY[ 1, 2, 3, 1000];

would no longer use an index. Is that correct?

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions.  Do you think that
such applications are so common as to justify bloating pg_statistic for
everybody that uses arrays?

I confess I am nervous about ripping this out. I am pretty sure we
will get complaints about it. Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Alvaro Herrera
alvherre@commandprompt.com
In reply to: Robert Haas (#27)
Re: Collect frequency statistics for arrays

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:

On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, just that we'd no longer have statistics relevant to that, and would
have to fall back on default selectivity assumptions.  Do you think that
such applications are so common as to justify bloating pg_statistic for
everybody that uses arrays?

I confess I am nervous about ripping this out. I am pretty sure we
will get complaints about it. Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

Maybe we should keep it as an option. I do think it's quite uncommon,
but for those rare users, it'd be good to provide the capability while
not bloating everyone else's stat catalog. The thing is, people using
arrays as path indicators and such are likely using relatively small
arrays; people storing real data are likely to store much bigger arrays.
Just a hunch though.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#28)
Re: Collect frequency statistics for arrays

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:

On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I confess I am nervous about ripping this out. I am pretty sure we
will get complaints about it. Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

Maybe we should keep it as an option.

How would we make it optional? There's noplace I can think of to stick
such a knob ...

regards, tom lane

#30Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#29)
Re: Collect frequency statistics for arrays

Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Robert Haas's message of jue mar 01 12:00:08 -0300 2012:

On Wed, Feb 29, 2012 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I confess I am nervous about ripping this out. I am pretty sure we
will get complaints about it. Performance optimizations that benefit
group A at the expense of group B are always iffy, and I'm not sure
the case of using an array as a path indicator is as uncommon as you
seem to think.

Maybe we should keep it as an option.

How would we make it optional? There's noplace I can think of to stick
such a knob ...

Uhm, attoptions?

"alter table foo alter column bar set extended_array_stats to on"
or something like that?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#31Nathan Boley
npboley@gmail.com
In reply to: Tom Lane (#25)
Re: Collect frequency statistics for arrays

What about MCV's? Will those be removed as well?

Sure.  Those seem even less useful.

Ya, this will destroy the performance of several queries without some
heavy tweaking.

Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I looked through my
code this morning, and I only have one 'range' query ( of the form
described up-thread ), but there are tons of the form

SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

I can provide some examples if that would make my argument more compelling.

Sorry to be difficult,
Nathan

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Boley (#31)
Re: Collect frequency statistics for arrays

Nathan Boley <npboley@gmail.com> writes:

Maybe this is bad design, but I've gotten in the habit of storing
sequences as arrays and I commonly join on them. I looked through my
code this morning, and I only have one 'range' query ( of the form
described up-thread ), but there are tons of the form

SELECT att1, attb2 FROM rela, relb where rela.seq_array_1 = relb.seq_array;

What do you mean by "storing sequences as arrays"? Can you demonstrate
that the existing stats are relevant at all to the query you're worried
about?

regards, tom lane

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#30)
Re: Collect frequency statistics for arrays

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of jue mar 01 18:51:38 -0300 2012:

How would we make it optional? There's noplace I can think of to stick
such a knob ...

Uhm, attoptions?

Oh, I had forgotten we had that mechanism already. Yeah, that might
work. I'm a bit tempted to design the option setting so that you can
select whether to keep the btree stats, the new stats, or both or
neither --- after all, there are probably plenty of databases where
nobody cares about the array-containment operators either.

That leaves the question of which setting should be the default ...

regards, tom lane

#34Nathan Boley
npboley@gmail.com
In reply to: Tom Lane (#33)
Re: Collect frequency statistics for arrays

[ sorry Tom, reply all this time... ]

What do you mean by "storing sequences as arrays"?

So, a simple example is, for transcripts ( sequences of DNA that are
turned into proteins ), we store each of the connected components as
an array of the form:

exon_type in [1,6]
splice_type = [1,3]

and then the array elements are

[ exon_type, splice_type, exon_type ]

~ 99% of the elements are of the form [ [1,3], 1, [1,3] ],

so I almost always get a hash or merge join ( correctly ) but for the
rare junction types ( which are usually more interesting as well ) I
correctly get nest loops with an index scan.

Can you demonstrate
that the existing stats are relevant at all to the query you're worried
about?

Well, if we didn't have mcv's and just relied on ndistinct to estimate
the '=' selectivities, either my low selectivity quals would use the
index, or my high selectivity quals would use a table scan, either of
which would be wrong.

I guess I could wipe out the stats and get some real numbers tonight,
but I can't see how the planner would be able to distinguish *without*
mcv's...

Best,
Nathan

#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#26)
Re: Collect frequency statistics for arrays

Still working through this patch ... there are some things that bother
me about the entries being made in pg_statistic:

1. You re-used STATISTIC_KIND_MCELEM for something that, while similar
to tsvector's usage, is not the same. In particular, tsvector adds two
extra elements to the stanumbers array, but this patch adds four (and
doesn't bother documenting that in pg_statistic.h). I don't think it's
acceptable to re-use the same stakind value for something that's not
following the same specification. I see Nathan complained of this way,
way upthread, but nothing was done about it.

I think we should either assign a different stakind code for this
definition, or change things around so that tsvector actually is using
the same stats kind definition as arrays are. (We could get away with
redefining the tsvector stats format, because pg_upgrade doesn't try to
copy pg_statistic rows to the new database.) Now, of the two new
elements added by the patch, it seems to me to be perfectly reasonable
to add a null-element frequency to the kind specification; the fact that
it wasn't there to start with is kind of an oversight born of the fact
that tsvectors don't contain any null lexemes. But the other new
element is the average distinct element count, which really does not
belong here at all, as it is *entirely* unrelated to element
frequencies. It seems to me that that more nearly belongs in the
element-count histogram slot. So my preference is to align the two
definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency
to tsvector's usage (where it'll always be zero) and getting rid of the
average distinct element count here.

2. I think STATISTIC_KIND_LENGTH_HISTOGRAM is badly named and
confusingly documented. The stats are not about anything I would call a
"length" --- rather we're considering the counts of numbers of distinct
element values present in each array value. An ideal name perhaps would
be STATISTIC_KIND_DISTINCT_ELEMENTS_COUNT_HISTOGRAM, but of course
that's unreasonably long. Considering the way that the existing stats
kind names are abbreviated, maybe STATISTIC_KIND_DECHIST would do.
Anybody have a better idea?

3. I also find it a bit odd that you chose to store the length (count)
histogram as an integer array in stavalues. Usually we've put such data
in stanumbers. That would make the entries float4 not integer, but that
doesn't seem unreasonable to me --- values would still be exact up to
2^24 or so on typical machines, and if we ever do have values larger
than that, it seems to me that having headroom to go above 2^32 would
be a good thing. In any case, if we're going to move the average
distinct-element count over here, that would have to go into stanumbers.

Comments?

regards, tom lane

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#35)
Re: Collect frequency statistics for arrays

I wrote:

... So my preference is to align the two
definitions of STATISTIC_KIND_MCELEM by adding a null-element frequency
to tsvector's usage (where it'll always be zero) and getting rid of the
average distinct element count here.

Actually, there's a way we can do this without code changes in the
tsvector stuff. Since the number of MCELEM stanumber items that provide
frequencies of stavalue items is obviously equal to the length of
stavalues, we could define stanumbers as containing those matching
entries, then two min/max entries, then an *optional* entry for the
frequency of null elements (with the frequency presumed to be zero if
omitted). This'd be non-ambiguous given access to stavalues. I'm not
sure though if making the null frequency optional wouldn't introduce
complexity elsewhere that outweighs not having to touch the tsvector
code.

regards, tom lane

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#36)
Re: Collect frequency statistics for arrays

... BTW, could you explain exactly how that "Fill histogram by hashtab"
loop works? It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch. I've reworked it to this:

/* Fill histogram by hashtab. */
delta = analyzed_rows - 1;
count_item_index = 0;
frac = sorted_count_items[0]->frequency * (num_hist - 1);
for (i = 0; i < num_hist; i++)
{
while (frac <= 0)
{
count_item_index++;
Assert(count_item_index < count_items_count);
frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
}
hist[i] = sorted_count_items[count_item_index]->count;
frac -= delta;
}
Assert(count_item_index == count_items_count - 1);

The asserts don't fire in any test case I've tried, which seems to
indicate that it *does* work in the sense that the first histogram entry
is always the smallest count and the last histogram entry is always
the largest one. But it's extremely unclear why it manages to stop
exactly at the last count_items array entry, or for that matter why it's
generating a representative histogram at all. I'm suspicious that the
"-1" bits represent off-by-one bugs.

I also don't especially like the fact that "frac" is capable of
overflowing (since worst case frequency is 300 * 10000 and worst case
num_hist is 10000, with the current limits on statistics_target).
We could work around that by doing the frac arithmetic in int64, but I
wonder whether that couldn't be avoided. In any case, first I'd like
an explanation why this code works at all.

regards, tom lane

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#26)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

[ array statistics patch ]

I've committed this after a fair amount of editorialization. There are
still some loose ends to deal with, but I felt it was ready to go into
the tree for wider testing.

The main thing I changed that wasn't in the nature of cleanup/bugfixing
was that I revised the effort-limiting logic in
mcelem_array_contained_selec. The submitted code basically just punted
if the estimated work was too much, but as was already noted in
http://archives.postgresql.org/pgsql-hackers/2011-10/msg01349.php
that can result in really bad estimates. What I did instead is
something closer to Robert's original suggestion: trim the number of
element values taken into consideration from the array constant to a
value that fits within the desired effort limit. If we consider just
the N most common values from the array constant, we still get a pretty
good estimate (since the trimmed N will still be close to 100 for the
values we're talking about).

I redid the tests in the above-mentioned message and see no cases where
the estimate is off by more than a factor of 2, and very few where it's
off by more than 20%, so this seems to work pretty well now.

The remaining loose ends IMO are:

1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today. It at least needs a decent comment and some
overflow protection, and I'm not entirely convinced that it doesn't have
more bugs than the overflow issue.

2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the "rough
estimate", the resulting rowcount estimate is just 1, ie we are coming
out with very small selectivities. Although that path will now only be
taken when there are no stats, it seems like we'd be better off to
return DEFAULT_CONTAIN_SEL instead of what it's doing. I think there
must be something wrong with the "rough estimate" logic. Could you
recheck that?

3. As I mentioned yesterday, I think it'd be a good idea to make some
provisions to reduce the width of pg_statistic rows for array columns
by not storing the scalar-style and/or array-style stats, if the DBA
knows that they're not going to be useful for a particular column.
I have not done anything about that.

regards, tom lane

#39Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#38)
1 attachment(s)
Re: Collect frequency statistics for arrays

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today. It at least needs a decent comment and some
overflow protection, and I'm not entirely convinced that it doesn't have
more bugs than the overflow issue.

Attached patch is focused on fixing this. The "frac" variable overflow is
evaded by making it int64. I hope comments is clarifying something. In
general this loop copies behaviour of histogram constructing loop of
compute_scalar_stats function. But instead of values array we've array of
unique DEC and it's frequency.

------
With best regards,
Alexander Korotkov.

Attachments:

histogram_fix.patchtext/x-patch; charset=US-ASCII; name=histogram_fix.patchDownload
*** a/src/backend/utils/adt/array_typanalyze.c
--- b/src/backend/utils/adt/array_typanalyze.c
***************
*** 581,587 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
  			DECountItem **sorted_count_items;
  			int			count_item_index;
  			int			delta;
! 			int			frac;
  			float4	   *hist;
  
  			/* num_hist must be at least 2 for the loop below to work */
--- 581,587 ----
  			DECountItem **sorted_count_items;
  			int			count_item_index;
  			int			delta;
! 			int64		frac;
  			float4	   *hist;
  
  			/* num_hist must be at least 2 for the loop below to work */
***************
*** 612,633 **** compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
  			hist[num_hist] = (double) element_no / (double) nonnull_cnt;
  
  			/*
! 			 * Construct the histogram.
! 			 *
! 			 * XXX this needs work: frac could overflow, and it's not clear
! 			 * how or why the code works.  Even if it does work, it needs
! 			 * documented.
  			 */
  			delta = analyzed_rows - 1;
  			count_item_index = 0;
! 			frac = sorted_count_items[0]->frequency * (num_hist - 1);
  			for (i = 0; i < num_hist; i++)
  			{
  				while (frac <= 0)
  				{
  					count_item_index++;
  					Assert(count_item_index < count_items_count);
! 					frac += sorted_count_items[count_item_index]->frequency * (num_hist - 1);
  				}
  				hist[i] = sorted_count_items[count_item_index]->count;
  				frac -= delta;
--- 612,642 ----
  			hist[num_hist] = (double) element_no / (double) nonnull_cnt;
  
  			/*
! 			 * Construct the histogram of DECs. The object of this loop is to
! 			 * copy the max and min DECs and evenly-spaced DECs in between
! 			 * ("space" here is number of arrays corresponding to DEC). If we
! 			 * imagine ordered array of DECs where each input array have a
! 			 * corresponding DEC item, i'th value of histogram will be 
! 			 * DECs[i * (analyzed_rows - 1) / (num_hist - 1)]. But instead
! 			 * of such array we've sorted_count_items which holds unique DEC
! 			 * values with their frequencies. We can imagine "frac" variable as
! 			 * an (index in DECs corresponding to next sorted_count_items
! 			 * element - index in DECs corresponding to last histogram value) *
! 			 * (num_hist - 1). In this case negative fraction leads us to
! 			 * iterate over sorted_count_items. 
  			 */
  			delta = analyzed_rows - 1;
  			count_item_index = 0;
! 			frac = (int64)sorted_count_items[0]->frequency * 
! 				   (int64)(num_hist - 1);
  			for (i = 0; i < num_hist; i++)
  			{
  				while (frac <= 0)
  				{
  					count_item_index++;
  					Assert(count_item_index < count_items_count);
! 					frac += (int64)sorted_count_items[count_item_index]->frequency * 
! 						    (int64)(num_hist - 1);
  				}
  				hist[i] = sorted_count_items[count_item_index]->count;
  				frac -= delta;
#40Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#38)
Re: Collect frequency statistics for arrays

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the "rough
estimate", the resulting rowcount estimate is just 1, ie we are coming
out with very small selectivities. Although that path will now only be
taken when there are no stats, it seems like we'd be better off to
return DEFAULT_CONTAIN_SEL instead of what it's doing. I think there
must be something wrong with the "rough estimate" logic. Could you
recheck that?

I think the wrong think with "rough estimate" is that assumption about
independent occurrences of items is very unsuitable even for "rough
estimate". The following example shows that "rough estimate" really works
in the case of independent occurrences of items.

Generate test table where item occurrences are really independent.

test=# create table test as select ('{'||(select string_agg(s,',') from
(select case when (t*0 + random()) < 0.1 then i::text else null end from
generate_series(1,100) i) as x(s))||'}')::int[] AS val from
generate_series(1,10000) t;

SELECT 10000

test=# analyze test;
ANALYZE

Do some test.

test=# explain analyze select * from test where val <@
array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60];

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..239.00 rows=151 width=61) (actual
time=0.325..32.556 rows=163 loops=1
)
Filter: (val <@
'{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60}'::integer[])
Rows Removed by Filter: 9837
Total runtime: 32.806 ms
(4 rows)

Delete DECHIST statistics.

test=# update pg_statistic set stakind1 = 0, staop1 = 0, stanumbers1 =
null, stavalues1 = null where starelid = (select oid from pg_class where
relname = 'test') and stakind1 = 5;
UPDATE 0
test=# update pg_statistic set stakind2 = 0, staop2 = 0, stanumbers2 =
null, stavalues2 = null where starelid = (select oid from pg_class where
relname = 'test') and stakind2 = 5;
UPDATE 0
test=# update pg_statistic set stakind3 = 0, staop3 = 0, stanumbers3 =
null, stavalues3 = null where starelid = (select oid from pg_class where
relname = 'test') and stakind3 = 5;
UPDATE 0
test=# update pg_statistic set stakind4 = 0, staop4 = 0, stanumbers4 =
null, stavalues4 = null where starelid = (select oid from pg_class where
relname = 'test') and stakind4 = 5;
UPDATE 1
test=# update pg_statistic set stakind5 = 0, staop5 = 0, stanumbers5 =
null, stavalues5 = null where starelid = (select oid from pg_class where
relname = 'test') and stakind5 = 5;
UPDATE 0

Do another test.

test=# explain analyze select * from test where val <@
array[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60];

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..239.00 rows=148 width=61) (actual
time=0.332..32.952 rows=163 loops=1)
Filter: (val <@
'{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60}'::integer[])
Rows Removed by Filter: 9837
Total runtime: 33.225 ms
(4 rows)

It this particular case "rough estimate" is quite accurate. But in most
part of cases it behaves really bad. It is why I started to invent
calc_distr and etc. So, I think return DEFAULT_CONTAIN_SEL is OK unless
we've some better ideas.

------
With best regards,
Alexander Korotkov.

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#40)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the "rough
estimate", the resulting rowcount estimate is just 1, ie we are coming
out with very small selectivities. Although that path will now only be
taken when there are no stats, it seems like we'd be better off to
return DEFAULT_CONTAIN_SEL instead of what it's doing. I think there
must be something wrong with the "rough estimate" logic. Could you
recheck that?

I think the wrong think with "rough estimate" is that assumption about
independent occurrences of items is very unsuitable even for "rough
estimate". The following example shows that "rough estimate" really works
in the case of independent occurrences of items. ...
It this particular case "rough estimate" is quite accurate. But in most
part of cases it behaves really bad. It is why I started to invent
calc_distr and etc. So, I think return DEFAULT_CONTAIN_SEL is OK unless
we've some better ideas.

OK. Looking again at that code, I notice that it also punts and returns
DEFAULT_CONTAIN_SEL if it's not given MCELEM stats, which it more or
less has to because without even a minfreq the whole calculation is just
hot air. And there are no plausible scenarios where compute_array_stats
would produce an MCELEM slot but no count histogram. So that says there
is no point in sweating over this case, unless you have an idea how to
produce useful results without MCELEM.

So I think it's sufficient to punt at the top of the function if no
histogram, and take out the various attempts to cope with the case.

regards, tom lane

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#39)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Sun, Mar 4, 2012 at 5:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today. It at least needs a decent comment and some
overflow protection, and I'm not entirely convinced that it doesn't have
more bugs than the overflow issue.

Attached patch is focused on fixing this. The "frac" variable overflow is
evaded by making it int64. I hope comments is clarifying something. In
general this loop copies behaviour of histogram constructing loop of
compute_scalar_stats function. But instead of values array we've array of
unique DEC and it's frequency.

OK, I reworked this a bit and committed it. Thanks.

regards, tom lane

#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#42)
Re: Collect frequency statistics for arrays

BTW, one other thing about the count histogram: seems like we are
frequently generating uselessly large ones. For instance, do ANALYZE
in the regression database and then run

select tablename,attname,elem_count_histogram from pg_stats
where elem_count_histogram is not null;

You get lots of entries that look like this:

pg_proc | proallargtypes | {1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,6,6,2.80556}
pg_proc | proargmodes | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.61111}
pg_proc | proargnames | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,7,7,7,7,8,8,8,14,14,15,16,3.8806}
pg_proc | proconfig | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
pg_class | reloptions | {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

which seems to me to be a rather useless expenditure of space.
Couldn't we reduce the histogram size when there aren't many
different counts?

It seems fairly obvious to me that we could bound the histogram
size with (max count - min count + 1), but maybe something even
tighter would work; or maybe I'm missing something and this would
sacrifice accuracy.

regards, tom lane

#44Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#43)
Re: Collect frequency statistics for arrays

On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, one other thing about the count histogram: seems like we are
frequently generating uselessly large ones. For instance, do ANALYZE
in the regression database and then run

select tablename,attname,elem_count_histogram from pg_stats
where elem_count_histogram is not null;

You get lots of entries that look like this:

pg_proc | proallargtypes |
{1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,6,6,6,2.80556}
pg_proc | proargmodes |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.61111}
pg_proc | proargnames |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,7,7,7,7,8,8,8,14,14,15,16,3.8806}
pg_proc | proconfig |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
pg_class | reloptions |
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

which seems to me to be a rather useless expenditure of space.
Couldn't we reduce the histogram size when there aren't many
different counts?

It seems fairly obvious to me that we could bound the histogram
size with (max count - min count + 1), but maybe something even
tighter would work; or maybe I'm missing something and this would
sacrifice accuracy.

True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large, we can run out of
statistics_target with such representation. We can use same representation
of count distribution as for scalar column value: MCV and HISTOGRAM, but it
would require additional statkind and statistics slot. Probably, you've
better ideas?

------
With best regards,
Alexander Korotkov.

#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#44)
Re: Collect frequency statistics for arrays

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Mon, Mar 5, 2012 at 1:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Couldn't we reduce the histogram size when there aren't many
different counts?

It seems fairly obvious to me that we could bound the histogram
size with (max count - min count + 1), but maybe something even
tighter would work; or maybe I'm missing something and this would
sacrifice accuracy.

True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large, we can run out of
statistics_target with such representation. We can use same representation
of count distribution as for scalar column value: MCV and HISTOGRAM, but it
would require additional statkind and statistics slot. Probably, you've
better ideas?

I wasn't thinking of introducing two different representations,
but just trimming the histogram length when it's larger than necessary.

On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which might accurately
capture the set of lengths present but fails to show that 1 is much more
common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
would capture the situation perfectly in one-tenth the space that the
current logic does.

More generally, by limiting the histogram to statistics_target entries,
we are already accepting errors of up to 1/(2*statistics_target) in the
accuracy of the bin-boundary representation. What the above example
shows is that sometimes we could meet the same accuracy requirement with
fewer entries. I'm not sure how this could be mechanized but it seems
worth thinking about.

regards, tom lane

#46Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#45)
Re: Collect frequency statistics for arrays

On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large, we can run out of
statistics_target with such representation. We can use same representation
of count distribution as for scalar column value: MCV and HISTOGRAM, but it
would require additional statkind and statistics slot. Probably, you've
better ideas?

I wasn't thinking of introducing two different representations,
but just trimming the histogram length when it's larger than necessary.

On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which might accurately
capture the set of lengths present but fails to show that 1 is much more
common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
would capture the situation perfectly in one-tenth the space that the
current logic does.

Granted. When the next sample finds 899/101 instead, though, the optimization
vanishes. You save 90% of the space, perhaps 10% of the time. If you want to
materially narrow typical statistics, Alexander's proposal looks like the way
to go. I'd guess array columns always having DEC <= default_statistics_target
are common enough to make that representation the dominant representation, if
not the only necessary representation.

#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#46)
Re: Collect frequency statistics for arrays

Noah Misch <noah@leadboat.com> writes:

On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:

On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which might accurately
capture the set of lengths present but fails to show that 1 is much more
common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
would capture the situation perfectly in one-tenth the space that the
current logic does.

Granted. When the next sample finds 899/101 instead, though, the optimization
vanishes.

No, you missed my next point. That example shows that sometimes a
smaller histogram can represent the situation with zero error, but in
all cases a smaller histogram can represent the situation with perhaps
more error than a larger one. Since we already have a defined error
tolerance, we should try to generate a histogram that is as small as
possible while still not exceeding the error tolerance.

Now, it might be that doing that is computationally impractical, or
too complicated to be reasonable. But it seems to me to be worth
looking into.

If you want to materially narrow typical statistics, Alexander's
proposal looks like the way to go. I'd guess array columns always
having DEC <= default_statistics_target are common enough to make that
representation the dominant representation, if not the only necessary
representation.

Well, I don't want to have two representations; I don't think it's worth
the complexity. But certainly we could consider switching to a
different representation if it seems likely to usually be smaller.

regards, tom lane

#48Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#47)
Re: Collect frequency statistics for arrays

On Thu, Mar 08, 2012 at 11:30:52AM -0500, Tom Lane wrote:

Noah Misch <noah@leadboat.com> writes:

On Wed, Mar 07, 2012 at 07:51:42PM -0500, Tom Lane wrote:

On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which might accurately
capture the set of lengths present but fails to show that 1 is much more
common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
would capture the situation perfectly in one-tenth the space that the
current logic does.

Granted. When the next sample finds 899/101 instead, though, the optimization
vanishes.

No, you missed my next point. That example shows that sometimes a
smaller histogram can represent the situation with zero error, but in
all cases a smaller histogram can represent the situation with perhaps
more error than a larger one. Since we already have a defined error
tolerance, we should try to generate a histogram that is as small as
possible while still not exceeding the error tolerance.

Now, it might be that doing that is computationally impractical, or
too complicated to be reasonable. But it seems to me to be worth
looking into.

Yes, I did miss your point.

One characteristic favoring this approach is its equal applicability to both
STATISTIC_KIND_HISTOGRAM and STATISTIC_KIND_DECHIST.

If you want to materially narrow typical statistics, Alexander's
proposal looks like the way to go. I'd guess array columns always
having DEC <= default_statistics_target are common enough to make that
representation the dominant representation, if not the only necessary
representation.

Well, I don't want to have two representations; I don't think it's worth
the complexity. But certainly we could consider switching to a
different representation if it seems likely to usually be smaller.

Perhaps some heavy array users could provide input: what are some typical
length ranges among arrays in your applications on which you use "arr &&
const", "arr @> const" or "arr <@ const" searches?

#49Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#45)
Re: Collect frequency statistics for arrays

On Thu, Mar 8, 2012 at 4:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

True. If (max count - min count + 1) is small, enumerating of frequencies
is both more compact and more precise representation. Simultaneously,
if (max count - min count + 1) is large, we can run out of
statistics_target with such representation. We can use same

representation

of count distribution as for scalar column value: MCV and HISTOGRAM, but

it

would require additional statkind and statistics slot. Probably, you've
better ideas?

I wasn't thinking of introducing two different representations,
but just trimming the histogram length when it's larger than necessary.

On reflection my idea above is wrong; for example assume that we have a
column with 900 arrays of length 1 and 100 arrays of length 2. Going by
what I said, we'd reduce the histogram to {1,2}, which might accurately
capture the set of lengths present but fails to show that 1 is much more
common than 2. However, a histogram {1,1,1,1,1,1,1,1,1,2} (ten entries)
would capture the situation perfectly in one-tenth the space that the
current logic does.

More generally, by limiting the histogram to statistics_target entries,
we are already accepting errors of up to 1/(2*statistics_target) in the
accuracy of the bin-boundary representation. What the above example
shows is that sometimes we could meet the same accuracy requirement with
fewer entries. I'm not sure how this could be mechanized but it seems
worth thinking about.

I can propose following representation of histogram.

If (max_count - min_count + 1) <= statistics_target then
1) store max_count and min_count in stavalues
2) store frequencies from min_count ot max_count in numvalues

If (max_count - min_count + 1) > statistics_target then
store histogram in current manner. I think in this case it's unlikely to be
many repeating values.

I can propose patch which change histogram representation to this.

Comments?

------
With best regards,
Alexander Korotkov.