WIP: expression evaluation improvements

Started by Andres Freundabout 6 years ago25 messages
#1Andres Freund
andres@anarazel.de
32 attachment(s)

Hi,

TL;DR: Some performance figures at the end. Lots of details before.

For a while I've been on and off (unfortunately more the latter), been
hacking on improving expression evaluation further.

This is motivated by mainly two factors:
a) Expression evaluation is still often a very significant fraction of
query execution time. Both with and without jit enabled.
b) Currently caching for JITed queries is not possible, as the generated
queries contain pointers that change from query to query

but there are others too (e.g. using less memory, reducing
initialization time).

The main reason why the JITed code is not faster, and why it cannot
really be cached, is that ExprEvalStep's point to memory that's
"outside" of LLVMs view, e.g. via ExprEvalStep->resvalue and the various
FunctionCallInfos. That's currently done by just embedding the raw
pointer value in the generated program (which effectively prevents
caching). LLVM will not really optimize through these memory references,
having difficulty determining aliasing and lifetimes. The fix for that
is to move for on-stack allocations for actually temporary stuff, llvm
can convert that into SSA form, and optimize properly.

In the attached *prototype* patch series there's a lot of incremental
improvements (and some cleanups) (in time, not importance order):

1) A GUC that enables iterating in reverse order over items on a page
during sequential scans. This is mainly to make profiles easier to
read, as the cache misses are otherwise swamping out other effects.

2) A number of optimizations of specific expression evaluation steps:
- reducing the number of aggregate transition steps by "merging"
EEOP_AGG_INIT_TRANS, EEOP_AGG_STRICT_TRANS_CHECK with EEOP_AGG_PLAIN_TRANS{_BYVAL,}
into special case versions for each combination.
- introducing special-case expression steps for common combinations
of steps (EEOP_FUNCEXPR_STRICT_1, EEOP_FUNCEXPR_STRICT_2,
EEOP_AGG_STRICT_INPUT_CHECK_ARGS_1, EEOP_DONE_NO_RETURN).

3) Use NullableDatum for slots and expression evaluation.

This is a small performance win for expression evaluation, and
reduces the number of pointers for each step. The latter is important
for later steps.

4) out-of-line int/float error functions

Right now we have numerous copies of float/int/... error handling
elog()s. That's unnecessary. Instead add functions that issue the
error, not allowing them to be inlined. This is a small win without
jit, and a bigger win with.

5) During expression initialization, compute allocations to be in a
"relative" manner. Each allocation is tracked separately, and
primarily consists out of an 'offset' that initially starts out at
zero, and is increased by the size of each allocation.

For interpreted evaluation, all the memory for these different
allocations is allocated as part of the allocation of the ExprState
itself, following the steps[] array (which now is also
inline). During interpretation it is accessed by basically adding the
offset to a base pointer.

For JIT compiled interpetation the memory is allocated using LLVM's
alloca instruction, which llvm can optimize into SSA form (using the
Mem2Reg or SROA passes). In combination with operator inlining that
enables LLVM to optimize PG function calls away entirely, even
performing common subexpression elimination in some cases.

There's also a few changes that are mainly done as prerequisites:
A) expression eval: Decouple PARAM_CALLBACK interface more strongly from execExpr.c
otherwise too many implementation details are exposed

B) expression eval: Improve ArrayCoerce evaluation implementation.

the recursive evaluation with memory from both the outer and inner
expression step being referenced at the same time makes improvements
harder. And it's not particularly fast either.

C) executor: Move per-call information for aggregates out of AggState.

Right now AggState has elements that we set for each transition
function invocation. That's not particularly fast, requires more
bookkeeping, and is harder for compilers to optimize. Instead
introduce a new AggStatePerCallContext that's passed for each
transition invocation via FunctionCallInfo->context.

D) Add "builder" state objects for ExecInitExpr() and
llvm_compile_expr(). That makes it easier to pass more state around,
and have different representations for the expression currently being
built, and once ready. Also makes it more realistic to break up
llvm_compile_expr() into smaller functions.

E) Improving the naming of JITed basic blocks, including the textual
ExprEvalOp value. Makes it a lot easier to understand the generated
code. Should be used to add a function for some minimal printing of
ExprStates.

F) Add minimal (and very hacky) DWARF output for the JITed
programs. That's useful for debugging, but more importantly makes it
a lot easier to interpret perf profiles.

The patchset leaves a lot of further optimization potential for better
code generation on the floor, but this seems a good enough intermediate
point. The generated code is not *quite* cachable yet,
FunctionCallInfo->{flinfo, context} still point to a pointer constant. I
think this can be solved in the same way as the rest, I just didn't get
to it yet.

Attached is a graph of tpch query times. branch=master/dev is master
(with just the seqscan patch applied), jit=0/1 is jit enabled or not,
seq=0/1 is whether faster seqscan ordering is enabled or not.

This is just tpch, with scale factor 5, on my laptop. I.e. not to be
taken too serious. I've started a scale 10, but I'm not going to wait
for the results.

Obviously the results are nice for some queries, and meh for others.

For Q01 we get:
time time time time time time time time
branch master dev master dev master dev master dev
jit 0 0 0 0 1 1 1 1
seq 0 0 1 1 0 0 1 1
query
q01 11965.224 10434.316 10309.404 8205.922 7918.81 6661.359 5653.64 4573.794

Which imo is pretty nice. And that's with quite some pessimizations in
the code, without those (which can be removed with just a bit of elbow
grease), the benefit is noticably bigger.

FWIW, for q01 the profile after these changes is:
-   94.29%     2.16%  postgres  postgres             [.] ExecAgg
   - 98.97% ExecAgg
      - 35.61% lookup_hash_entries
         - 95.08% LookupTupleHashEntry
            - 60.44% TupleHashTableHash.isra.0
               - 99.91% FunctionCall1Coll
                  + hashchar
            + 23.34% evalexpr_0_4
            + 11.67% ExecStoreMinimalTuple
            + 4.49% MemoryContextReset
           3.64% tts_minimal_clear
           1.22% ExecStoreVirtualTuple
      + 34.17% evalexpr_0_7
      - 29.38% fetch_input_tuple
         - 99.98% ExecSeqScanQual
            - 58.15% heap_getnextslot
               - 72.70% heapgettup_pagemode
                  - 99.25% heapgetpage
                     + 79.08% ReadBufferExtended
                     + 7.08% LockBuffer
                     + 6.78% CheckForSerializableConflictOut
                     + 3.26% UnpinBuffer.constprop.0
                     + 1.94% heap_page_prune_opt
                       1.80% ReleaseBuffer
                  + 0.66% ss_report_location
               + 27.22% ExecStoreBufferHeapTuple
            + 33.00% evalexpr_0_0
            + 5.16% ExecRunCompiledExpr
            + 3.65% MemoryContextReset
      + 0.84% MemoryContextReset

I.e. we spend a significant fraction of the time doing hash computations
(TupleHashTableHash, which is implemented very inefficiently), hash
equality checks (evalexpr_0_4, which is inefficiently done because we do
not cary NOT NULL upwards), the aggregate transition (evalexpr_0_7, now most
bottlenecked by float8_combine()), and fetching/filtering the tuple
(with buffer lookups taking the majority of the time, followed by qual
evaluation (evalexpr_0_0)).

Greetings,

Andres Freund

Attachments:

compare.pngimage/pngDownload
v2-0001-WIP-GUC-for-enabling-heap-seqscans-to-scan-pages-.patch.gzapplication/x-patch-gzipDownload
v2-0002-WIP-jit-makefile-improvements.patch.gzapplication/x-patch-gzipDownload
v2-0003-jit-Minor-code-cleanups.patch.gzapplication/x-patch-gzipDownload
v2-0004-expression-eval-Minor-code-cleanups.patch.gzapplication/x-patch-gzipDownload
v2-0005-expression-eval-Don-t-redundantly-keep-track-of-A.patch.gzapplication/x-patch-gzipDownload
v2-0006-jit-Reference-functions-by-name-in-IOCOERCE-steps.patch.gzapplication/x-patch-gzipDownload
v2-0007-jit-Don-t-unnecessarily-build-constant-pointer-wh.patch.gzapplication/x-patch-gzipDownload
v2-0008-expression-eval-Reduce-number-of-operations-for-a.patch.gzapplication/x-patch-gzipDownload
v2-0009-WIP-expression-eval-Keep-only-necessary-data-in-f.patch.gzapplication/x-patch-gzipDownload
v2-0010-expression-eval-Add-additional-fast-path-expressi.patch.gzapplication/x-patch-gzipDownload
v2-0011-WIP-expression-eval-Decouple-PARAM_CALLBACK-inter.patch.gzapplication/x-patch-gzipDownload
v2-0012-WIP-expression-eval-Improve-ArrayCoerce-evaluatio.patch.gzapplication/x-patch-gzipDownload
���]v2-0012-WIP-expression-eval-Improve-ArrayCoerce-evaluatio.patch�<kwG��������f@ 	��`	'���.B�w}|8�h��a��$+N�������r������tWWw�����:����wz����n��m�1�6�wu���Q���N�X�'�){�{����q�t}@�������f��������s��&}�������/���vaF|��k��N�]��7N�q4���.k���^�Y[�r;�����������a����2�0�y�CV���������l�����S�������j�����{�V��{?�&�����i�z�.��^����U88<\�at�yx�<��2�O+3���V�Uca`Z��=����:��0��m&�����	k�?-�g����`(~gFW���e8~u����aI?�K<?.��Ys3���5x��L��2�,zZ�G�<=��_G�"<4���D��������zG���[x\�^�^B��d���
PEZ(�� ]E.��k,�@?\�)��q�����B|b0�y����
��'��.ZZ�Z�q�s�j��3�	��
��z���S��wd;�G��i���X����cf�z��%t�l5 �3��#k]���&��3h��s#�p������;��������i5�����~���T�1m��/Y���`��L��r�	��^D@�%��Y��]����hZ&(�������"��_}2��������2[?��}���������U���zH�NG��|��O��
0��dr~u1��/�
��/f�����h:�������a0d��_D�L1�sC #l�c$�/���%h#~�v"k0�&�V+ei�~��{�V<hqaMS��,Fa�#���w��t����b�l{��<�����2p\s�FO��/��0!��"2�{b����y�=����
����o&�0C>�!��a��wG�5������l�S(��X�@2��'~I�P�j���Z? A6BmE�H���e��a�f�g.�6i����x��5���1T+���%*��\,|���q.�%�"A�F,��/�H��p(E��E��ib��O��:��X�U������z�59�-��^�d����� ��m�Z�������|x3�MG7SP�F�����a�����h��"'C3"
�Gq�8w@�2�����	*��r�.#q��o������� ��9�q�^*�L�C�������dn��\K�:C�x�r!F{b������@x�L���<H��B��=r�;m6�6����ZAXh2>M�x��
*�C�)���b�	�
'�����F���l2z3|5�i�8m�q���YJ�r����!y��i������a-C����@�����
a]z�}��!��_�� W�~��%���Z��K}qu9"?��Q9��5��\J�/.,�����E�\�N�tm��/�M��{�Q��C)�V��e���V
�����F��[�
��)���`�APx���l��C�X�FL�Q���\I�o&aK��5D���7���+�&�����6;[�%�P���s�Q�^W�>��H�3��VNf/( ������C����r��I��w��(r��*����@�#�c��`w�H/h�"�AxbO���0n������j�b���7�S)T�$n��aB�8b��;56��$+�y����X�1f��%��
�"�S)N�[��<@�w��RO�^��Z_52m6Dqm�s�@)a�����,�uE����5V����(j���f��8@�Na(_P*�)�C8���34�/�g��4��TN9��L�,��E�����M+���D$��O"���g�I��������)��e������j������=��g�Lb�;$NbXJ	�N({�(y��qm&�e����sc�r��|e�p!Y�->�	�d.�k�#������g_���&�W�����OPD��q�V���k���x���+�$�7��8��
�5j��?�E��?�0��w�!�&��t"��"��*�F
����ud8���Fq��!,`_}���Q�4�h|���~�"@F����|��z2��N��S����m��;Cv������Fo�0jS��~S?�DY��"RK�`���S1��A�|!�=4`GzO;M����R��&8���}�����d2��^�FZi�����)e;K{�o��VI��h���Sy��0�#n��M�~)k;�z{=��@,F���_�/��0�\��5���G'ZW����������T%:a+����,%�*{E	�8�BX�>�������q���x�N����9�tR���+���P^�Y?����d�����v�~d<�@_h�gn�)��%K���cj�2�Z"���Bd_��
#�����u�_�~(��D��	���_��u���.w��:)�T����2e�����sG��Y~��d_/���:��v����+c�*��e	���
��ET5���9��0g�^a&7q������:Wg�!f�Lt��\;b�ch�p�v���x����U��^Nln(��c�`?�"����k;������ln���.�i-�,z��JdJ�S_E*o��*��L�G7�g���p`s8��"���,-9���p����dq�m���#0a�-�LC��9f��_�� Lw' 4p(����=����R����s��w���M������) (��m3\,�����
���W^H('-j{�6��c4�v�Mu�6y8�~&?~�����f��v��`������DECRnY�3�,x5`�����F<nn.Dh�����L�	�b%:���X�ON����m4=7d,DA5]���rF����A���.�iG�ze�bS�!6H+���P^�.�?��V]���Q��X���������o��XP������EM�!@9r�����UAIpE�"Z�8ZB�/�M�
9���Q�F�n�����$*�zZ��%��:)�B�T_q'k��s��@@Q�k�KzK�q�O�����%9kY��2,����t����XfO@�-^�m�>��u�vl�p;E��+��>���E.�<��S{�u��c���]������p�x3�J���XP��u�)��N�R�#Hx�w}S�L
y�"�������_�#����Gu�����W�Y��xLk�8\1�6��2O�K�-+r�[��M�>{��]�{b��]'5zV��O�����8<K���f\Elb9��`	���I��F��>�h}w����z�h���f�����^������S=�O����2��^q@*�O����������K`s��`��G�B���v��Mk��d��~���9m�������c�Vu&�U1�]������&�����7����
O��a�v��B;pWx�
����O�H��B���x��pu+��Z?��������B�w�&P����B4$�,�1��h���~�N}"�O��H��#]�F�p����`K�=�
�dL�>�!���;�`T��Y��[��S~D_���p��/(A��i���8n#n��g��6b����+��~�}�]0l��6\Z*2K.E���O��xe���[��������;�����7��v[��s~lt:=��-R��$Iy�~���5�C���
����lE�a��
�o�
�Znk�����
N�Mf|T�T��"
3�I��Z�6�����X+��"u��a&���,9�=���J���e���Z�U+U���p`_���y��C�cR@��� �r�g�NCt������1M���p���;�n/�Jl6��p5��.���u����h��b������ZQ9%E"xwB��C��-HL��KE�j�?����5��w!yzM,^��G�#$=uhOW7��w�8�*v�a]�,���s|�\!Jl{������,DO<�p�n������P�[�����
�\~��?JO�|l��
�OW��\�2�Ni������cvw��
��+������~��:����������ub��p�
��_����O4|]��m
��t�6������ �@��S����Y��C]�j��X�o�d[������U	N�g���xcg�$yy��ZF�2{z�m�u�[K?�T��]���������2���q=�Q=�����X=�b������YK��VN�X
JD�$�������Q�����{�`�Q��"R�
�&2��<�(a��PQ�
�|��K����������
��h������&�b�,�!
�_M��k��<���P���3���2�d�W����;�'��'k�V��;�rb���@����uf�f���]�/����`�w��b`�+nK|�b��3fH$�����9qR#`��R�)c��q����}$���kG}dW�'�c�	~��W@��<��pOk2q��S��$�$��z\/I�I������G���v~��^�E���t��K���R*'�%_��T���d������[��	�*-�9��d�8�K!qB��,��o �|rE�K��r%+H�PI��	���b�&5�@]0���^1��R,~H�d�����)-]wX�R4:0���$8YR��00�zv��q|��wWvu;�g=E<��k���^>yO��O.��L����������#��^&��6�����=���Vs��y�:�X5e2�%�������=����xdJ��4 �S9�	R�t&����L�����+����J~�,.W�Pw���!��a�0S^z �A&�����[��j��L2�8����PG$z'�7���',���*$�%h����;�/��*zJn����D�=bC�<�����"(�n�����h�z�<�!3R���!��lU���Ba������]p�xQ�n
AT�q3�g��Mm���B��K��)Q�E3�X&���(�n�!1��K&e'���5N��9��X��[��V[F��<ieN�qJ**%��x_.��SE��$9	+'*^��ec�c��t��������*y�\�D=_n�������)
�@�v�Rc�E��^���3i��������YnBh��-Mq������Q��Ss��QM����C���*s��4?�o�!Z�j��x<(����t�
$�X����[w�z�u \JsOz��yL��44�3��4���!�J���#j�y�N.}�x@P��F�N��JR����kk\[����1T�s�%A�TN�w��2G�����2B��h���Lk��
���������x���IA��:KIN��5F8��YR��"�Q����������nnI�l�]��?�z�E�.�5���f�D[��Y�����j���7�g�l��(�(���>!�$�$�����z����);�����kdRm?���q�^J�/M����b]�'�L��)����=Z���,�L^�y�7k�U;�J�[����T��e��jm�T,�5���
����V|�s��c�z&�LA.����x��/��H�$��V�C����+���[r���ecX�"U.�#�����eK��$�����������a��*���q�N�mB�Y�E���GeHd��.��K��3(��7���Ho$o���]��W����b8���_7T��,����8�!���[�K���X�P��.$v�f�u]QU���������2�X���,S�2��i��K|~����o����4*�|�?R�In�?0A����������;@�H�
q����f�h��=��l1�����p���)"f�����)�#�c��(��@��\0��������b��x�3���<�JOJ��}T��%[�/Y:*J��I����SR��$��+%\fI��`��>�@x�����H����d6qy�l�!�����t�0��+$��_����w��������`S��	("�D�9��l�G�d��K���[���z��G~y<��h��g�N.5��������x�X��1��7����|D)4S#�J�4:=�z;e�%���)��������$q����
*�B�	���
�5+���'�w����\Ko�@>����(4�0X9T��J=�=D�������&
i�J��<va_`;������egv��dzfG����W~��X���fa��JZ����*6�k���mK����~�j�K��6x�����N��������t�\��&R�������x������$�!��p�5L�ylv����5
;����>TC�	�eY��X,�:�����e�,���N���S�8mB�i$������';#�=`������]���'<������5��h�R�a�Ry�W��ux3}�/���h���P5�B�+����(U����9a M^T]v�%�b�tu.������H��)�i�H��kf���b�6�edvx��I���	Vv��G�.OC��Q&Rg�o���	���QH~m����#G$�D�X8�qZ�!,�,�l�@C� 
�^�lT@�t��03B���?����4��Ge���c=
<�dv� �����PrX�i�����Up��x[�4����|n���>��3���Q�g&�^��I=����/�t��(���E�/�y�!X����,wi��O`Wl�HS=\��u���.x�'4�	'��x��"��@�jU�E~�j����1������_�������Z�q����u�MXH�w����r0����c{�������;0���[�/���T�RT46\����xbN��U.5K��a��-����O��|b)�N~W�`�WAK9�;���e'��U5(����?*F�$Y�_I�.���G�j��3r��[�2#|��0B�Z8u��E
��
�/f7�x���X�X���c�R�4���|R���aZ$��}���|$�"�>����ym�a�m��s�A��aE�fG��\������w��z\s���AlR�+���k��
��c�B�V}j���jQFb����������c�]��>�2���W��?�m�I�f��n��� �����K���Y%iGi�Gw��<YYU.g�����rd
v2-0013-executor-Move-per-call-information-for-aggregates.patch.gzapplication/x-patch-gzipDownload
���]v2-0013-executor-Move-per-call-information-for-aggregates.patch�]}s�6��[�Ho�T�(Y���&W�qR_��c��s��h(��P�����r����I�"%�qrm��6�����b���z�{��{��t65;�N�g�V�w&�Y���C��������&��s�_2}�������^}��#��y�^�|�Z����h��o��;-�?�>7C~�.W\c���LC(����?h�.k��v�b5��O�������'vm0���1�1��OW��@_�9[r�95�����fhC�3��|~�U��B������EOZ���*�C�s~m�n5��T���tP����28��_zAx��x����>3���f8�T��f��t��k�G��'�~k����0���F�����r	����B�����Y��%;=]�l������a���s�����Zc]��L>��8�|8b��-���%��aDm���YY<����KF�5��+�����lq�T��������z�������;P���R���Zo}�6��`3��Y5����4�����Q��Z���Cfq��'�z�j��k6������u���j���F�a��F������'#k�����v�������Z��D�?����h���
�`t<<�g+��@���	���5��I�'-��g�?��
�[��
����,�������a����|���8~�h�����E��:�������I�����w��f��b�NQR�`�N�n����)1� *�|J��\�����R�����
���i��
����4�O���WD�z�8���`�W���=
M�
{���W�h�yU#^�P&�xX�T���4{��Cl�z���c�^���x�����9�z+�$�s6]�>�Dc�-�d���ZI�M��Sw�
�`:5�e+p���k����edM��^��<u�0K	l3k{j���?������j#���h����U
d���k760���,��o��e����d�B�`(��=�����ZXf��f����l{���c���_��
:U�Xld��B�&e��h`X;��H����d�����N���G�&9�n������M'=Q6�J������C���jd�T�JE~�F�{��u-�h�0U��)4�IlX�$���>�
[;��Q�Z/�Y+Ga�*��2Mb�����"������k��m��:���RjH�^*g`:V��������+�9��o�O�|8`Ijh�hB
�^����1hz?Wq�5�N*cF�={��4�a���'e�eE�/0C���l�14��@������c�b��M�g�Ve@��vnS��z�*��O�}��I�z�O��2�T�[��0��'�����w��/��GP��Kv��s&sa������-a���e�^,��Lf�%4���4}
^�f��CCcY	�>�"���r
"�~���9�|9>{ut�z|y~��b����'/h}�$�J\�jA�cC+Q������0����bN����iY~L�|:s���Mq��86���t���0�{�r�h���Z���O�0�}���??9?y.9�����_T%.]����W'��P���gdm	������*����Pf'�b>>�P7K��yJ���!��Ne��
����-�s+_Z�
�)=�ZFK�	��g�`~�eM�	Y����vp��P�G,��`���;�Y�s����hh��CC����5�pcZ4�z��]����#d�=e��A��pw����r/������f�a�uM��\*���	����; 96��  �"]^����*�c��1o���:�D �6�?`���v��?���t��,�hJd������i��Cf4��Q
���P
��_�'�
v�,��jw��I
7w�j%��;.��g����:B(���B���|i���L3t�m��>�&+�Q��������G�H�v����hz�0:���
X�����I-
��>�_\��_2������N��)r�`���)L-��&+��)	�����5o��D%��)�������%��h�Z��!�b��������.Y
��&Tiw"�\��Q��4(;�����vP@�wR�C�n�6�W�B��t&pt|�x-��Bv�C6���^�y���E!��iE��04�0�e\U]
{HC��0	��L��(%3XZ��Y�$v(

�U���E���������,4���>��#�<������Z�C�����_��2!����m�N�
R�W� sW`��s�V�r"������z��(X�2�m`�f�h9�H/%{�/U5�X��X�)�)����w��u������-w�#�Cox��!3��5��k�}*�a}4a��MH��(�yG�0�j���@��
6��|k� �m���0}�}��]����������O^<L�;O�G��������!�d����'���dCc(`�P��|2���5�ZO��������O�`��M���h5�3����}����<���xPS�[���}���Szu�]������
����;Ud� �K��ew�Mg��7<����-�;g+LK	�Y��I0V�M�dk|���L�d<�Kmwn�,�*����[�I��6*75IR,���t�� 3��QV��/<�6���X�q�H���u��;����O��pn�$i���@�x���������6��bi���������m�cOT�3�j��p����:�����{��$�.�8q5O��^-�l�)�~����������Y���������O�P��i�0�Z����o�w-��-�d�[�mD��ye1�f����U��N���2����T�T��gu"
��u�A�cA�#�F�"�:����Dwsj���J*�_�I�G�rwb�E
�s�'v��h�VR2SBf��T"���%b��
)eh��*m�U�V4O�bH��[&*sH	_��gO@g��	��rS����+���q
l7��};��D����7t��sT2/^�Q��I�&{�8�j��N7,�������gB5������Ogq�0To>��4�.<��E8$�.fa8)����BI�4�:����P�h�
�P�7"����:TK�\�rU��v��<�e"����-Z�)����76�N�+2i^������Ve������$+Y�|��@��WrC���:	��6J�{� �"A!��n��O!9^-���0}�`,N�d�w�d�HZO��t�d��]R������S�����n<M����[������"�?�Mh
O��$'~!��B��R+�p�1?k�X�����g�����r&�=CU�$�����'EOf�D`%,hGx�}���4���l�v���I{���;
����p��(@��O��=W��H�m����!�<z�,( 0�Z,&��E�L�4�T&��|/Ng��%a��]i�:C�f}���,�zr?����U����������f���N{��*�Zw���2�`����S����e(�`�%���0�����a���_�t�?��i�����',�*�Uu:�'
�y���AN(�n"�X(U�@�XZ,i��^n���������b
N^���W���Y���|�j�&o��a62K�J��p�+!��>:#�����q�*���*�}�
m'�'��5���jS.X4�r�Y�N*���Q�1f>��?�-��5�u�\�@����K�`���7�`P��V#��
���UFy�H�#gE����Rt�����#p�Nm���MO���t���������>K�^6#>4��#����R"E;���r��2�E{%qG��(�l���My��I�q�2W�R��vt�NW ��M�T�b��\I^,VNhoeEIN������
�0�����8�4����(�Ql�X�tS��h��C>@>�2��� ^���-Z�r��X��a��B�0P$��H��P���h�0B�e������Qo��x��l���lYp
���y���:O�zQ����Ti����������5�z���$�����y��e�"Mmmh\/�r����8�������7�o�r��x���RT��q~���B+6����6!d�$c�c�k>�l�����Xjf;�V��;�zxK���h}��r�<�=�q�97=���%x���6�H.�6���Mpz.���FI����2s6�����ZT8�����~
��������G���/h�w�]7s�]`v�1�"=�+&�j���.VP�,�[��|q��X�P�8� ��nj��m-j���,����y�]DX����w����n�6�Z����}�>�p�f��tV�>an�N5Em�zLo�����z�^���[��n�!��>��0�m�!��2�1��y�4�1@`��}m[+�kc�F���x���U@#,���s�[���������nE(�����%���]c�I�y�,�{+�Bn"k0���_O(P�B�u�(��E������Y^	]����*��+�f�?s�:�<\o�P.7A���^\��>���+k�������������� y���&y3T���jla�
i�{��E%�0Z�WC�Hd
�x��������� �S]������\{GVE�<���S������6���K�!��� .f$�t�"��1�{����}�����g�	l���AZG�=�_k�'Z��1����a�@A�e��	�f,6��w`��0�]������R�R����&��>MAH�������r<��������&����k��]������97�yV���n�\
($SO���:��������n������g��
P��D�{��dIj��B�Me8&�FN��NUr��dB�V:���r�C��Y��w5�]$�"70{rz�S6(�E2����FHP���w6K8���9�P7sKs�,��~aj=}��G~���Qc���Z�-W������9�[a5�}���MX�O������������S�Z��4g�
V}D�q�	�	^�&����Dki���aq-UaUFX��(��4�Z���f�y��^sq6t�s�3-��v����KO�0��t�g������|-��n�qn�V�_�00p���{=��5z����;�:�Za�Y��e|�u�Rl��OJ�EX
m�a��]��%�[1���7�"!*�/�+<���v�I����{y���B���=<dB�j��b�s!���MZ�A���6}>�*Q5"����bpR����(�T`��hD����PQKV �j�q%��R~�ky�������L�^��-�}���@1F=��]1�I-��[d��(��	�[�1=_*���g�w�����t��2�x.n����w�9�'�
&�������?����t��W�GS�JO�x�\������������3����>DI$>W�v��5�R/��vNu������������G�<
	��>������JY���"��=:�yZ0���1w4��pw0�����T&H���Dt���� :o
�T��3N��6�jMp�2�,8�*M@��V{�1���%�����Q�z����QwSuc�s�60�[1z�2�����/�Q_2F����3]~[t-v)��8��p�����_��w�����t7����;��#�N���'���6�$(v"��^`��=�.qM&>���}2���!�����?tS�T�Q<���}�(W�������K�r����k�y�����A�F�cy������(Y�`������"&3�'|6�8
�W���}A�Y-��O��E��X[���$�Jh��C����SN1�47�L���s����-"�t��>h�c��^@��[q�&���k�\�M'���9�cE��a�������L1#4�X����W�d�/�D��y ���=���e$*E*�V���N\�M}������+�j��_���By����7q=JV����r�9�MB��{ki�b��K�h0������������s���6���I5w�8��Ac�=������N1��M�b����N����Yk9F�CLH����$#|���K�sT�F��u�jIb���4�O�\.�2�6QI;�_���i/|'�e7�?+C��,
?�����3�=SQ�$��JI��m�^y�xz����k���^���"�4L�~�("�<<�������w����`�H/��a�/�����k�����F��&��	y�E���#��9'N� .�4����d�+���L��,���,�����FT�&�:9�y������������g�����.E���-�e��N�6�p��O�����r���������g-
h.������x> �8��7����~��O��>�owO�����g�W���io��d�v�U$AR��Cr(q�$e�7�]G�=�����X��LwOuwUuu����]������kb�V;�L��G�� �hM��B�K9�0Wps���{����l����|�X�U�
���k���E�=
Mw��e�F��K�(�0��0/�:�j���@�
/��Xcm������o�aYL�h�#[49��`������$�C�T����O)�R42�����������������E����|���/���ww����:J5}�D��"��C�do�����(������m&g�<�4��a_���
 ��y<�������V�>*���	)�`M�E�<�&����2"#F����*�O���,�4R�3���:9�e�b'��M�u�A��*&:���.�f?�4���^U8��x`�AT�yy������%�j�.��\o��W��������1/�:`g�3�����+��d|�H�P��?{qI���lL7�B��o^��v�[�����H��Q���w�'����-R�j}<G����9C��Z]�c������X��J���l>�}{z2��j��^�g)E)�K���"e�;���r$���s���d������@�!0��EJ3��C[�'�<9���$�(U�
�����
*��q�5���djR���@2�\�U�Ym��5�h4��|j!D�����+�
(�+��hu<Y�?�A�T�_�����0��w��0������9�wZX��?������O��?`dPj]A����!2�
�g�O�.j���%U/|�gb�������d���.���gKu	fi2k���F�|�yU w��gw������^����p�uG����$��������(M��Vq�+�3QOuC����r����OP��u������9�~3��;����������(��K�n�}�>��2(������ zJ=G3����I���k(�pws	��qq������!5q����E�ti)�@��KU���O�F�����z��������e[5�9����9*25 Z��]��W�[��F����.��?�1g�h�-���
��h�Ie����v��2L����$�%EK+Y�3j�F<h5����I��Of��
�)�K[�����I):!i��M�u���<���������VW����PR��j��G�C����^B�V����J��<OP��j^G
�?������y��b�-c����`��O\��g������Z��`�k����RW:��������G������-X�d�����+U�C�u8�}i��O��.�<��(D�H�4���C��|j���J�eX��������3b�����P9#�D(F�����S��v��n
�;Q�%#E6�����]���t%�h�6�J�aE�&��Hu���+{�Or
B�@��'h
� W��r�;=�q�;}.�K�B\�����0�K:Qn����)�t����c���������B|�[��x\�*/�j*6�!�����xzi/���F-�����Bq�N!�q?���PxyQ<����Y�Y����Rj/���n�o@��/,k'�����z�_��J�������9-X��-�����uY�B��,W�'��15��l�$8}�xg����NG�}�lE�^�e[<�=�*�
���'�-<���~�!6)���h�����"�����RM���S`"m��	,��������{GW:�����B0r��	����z�`�{'�v�H/��c6�n����^T�$�J�AP5�	wi.�3[ng���"��^ //���g��������>GZ-��_�_me��s����GO� �0�)^����"�	�)8zR6�z��w�F=������DL1�;�����-����!�P�9ix(�*!ml��Wj_+�dU��G�	����;� �(�������3s��Aj&��LS'���P�T�m���n1�z ���5���'.O���[�H�sq�x�L���X��
#1�,	��C���f����lI@k�gj��GJ����;��������D ����wQ ��-�]Q�2�P���|��[m���<0`Wd��oOjhR�[a�RYN���~pj5[$�F�*>�
<�)�>��q^:���d��uHr����fshJm�u��:x����Ub�7�7�1G������d���Qb?)�{I�
����/{�j�9��0.��������?���bs��,5��'U"�P�:D��n��Q�<����/�Y�w\-����t�Z�f�t*������hg�����mbW��UB9 >��
yU�U�d�������/X�����&>�^�@�}� ������[��=�):��W���B���Z�!,1���sq.�4Y���|����dg�<�5���)~t�l������c�#�{:Tk�x�'.GJe
���`�I�����/^��2�e��lG��O��x�(��3��u��`7��E\���i��D�����������.	c1]�r%�Nn��4�����M
:=��1�&�q��
>�����};x'~�@�D�f��UR���a�����.�|�7�����h�2�x���d���8<,3`�bf���z3���{����np,IN����G��<�Y��s���J��s���D�"�R�vG���6�@������+i��%�E�z��g�r�	9*�3�xA�����i�_�����\�
����H�H@�G9���u$g[N�H~��
K
<zQay@�wz�%�4:�*��h�
4�����H8a[/&��i���s��_�m�������������0oZvo�D�2A�)���fe-_s�����b�J ;��1N��m~���}�?���
�E\p{	��F�0U�L���)Ox_��Pc]�
��CQ�j\6h�[X���l5{���m
z<�������4CiC��=&&Pz����n��a����)�;����n7��Ko�1����;}+|��!�B%J�����y�����p ��P�C	{{�c���^\�� ��$9���CsU���>�3<����7�D�3���n\m��t��<d�f*V�P�JN}���9_g?IV
��W�2Z�q���4`,�7&c���<����N������$��K/��v�qN�*�T�
~��������P�����a��B^^=[x�n�Eci�,��f||l�
����Hi!�7�@i�>���0*:��)GvZ�X�5dW�F���=5��P�R��/I�0��|lC�VXz�StX%S.A�di�`�gj\z����L3�L#E�z��,���"L��)��;�e�xe�
��%j��S8r6������^��HO��Y���u��G�� �&��.H4JR�Ok��_���t�F�N;�Mc_[��V��t�kwZuv�<|���H�����Q�/'YD��f��|N�xv��s���K���U��!�C8�\��-<a]��>����%��-�e�-&�n�A�RA���GD{�>� /�TM���ni����.0/~�v6�l�
�{�}�Ot�.-���;���1�nn��,������|�a��x���\��3�;��>�G�Z���F��d��d���4�����	G//����:�7������#�������V�oo�ID���@�9/�W�������Os�O�9hq�r�-�O�5������d��mi)3T��t���P'@���.]G58�h�8|+>�Faj�������Wn���(���5�x�!�Guzx�h����/V������jVm�����NA���l��x�}�K�n�{����&���l�q�J4��+�������k��m�C%Y��%��	�*k���jevW�NVO�����sb�v�&�N5�L�X��T��Ah��{q+>����4���F;��Z�~������V��H���D�-�5��`�C!�j���{�[��^J�I!$`�������������������������~���������h�*��Dt�����O�N��&�����t���3f��a(��O��u����C���������f8=�����:�Ckv���r���Cv�{��V�i��W��8���N�k������9I���i�#��h_�,9�L���V�(`K���K���m�#�z��7��%f��`��|b�K�!l�OVu�8��E:�e��^%���ZW�>��4�ymmkJ%#�kN����(*��.L<��h���@iq^�5��L\2����	l~�$���������<x�V��p87��/H*���9"
�i�(e�K�����������	S3�0���v�vFOu��f������s]d�����/�m4?(�%�P!#z��wX[���OV����o����!Mf��`g���'����=
�5���3'Y������+���?��./�7��w��?��9�_�Qj�&������!)H���������<b]b^�c��)�]R��5��C���W/��|%�P]k9��
���4Z���lT���2��)���[�j����_�7\-�'����	��h��5f��
�&)�������4�Da��q])�53���C��%�8
bj����AC5�G����>���.���d� �P	���kb��1�0 �C�N����|`/p%\v�� ��)��V]�e���r
��|\ w7p@U��Y���1H���b�r�/F�b�fa�Me��f�aO����+����x{;���\����(&'�d�?D���'r���8`����
�8���������	Q�r��(�oU��	�j��CS���
)�f��{����t���;A���;	]�#k���@��`
h�
j�����)��S�$���&�3I�2q*����o�0��U�/�Z�O��uj�e�!��	��PO������["����E�����\HO�^�����P����
U3h��~���p����[���l�,�P�,��C��.�,{�j�^_�G�d���[~@�rVO����Q�X��;9�n��m8�k��0,���K��'zG�Y��mS�
�ee4�0��A0��Wk!�Bm�h�������qa�'4f�l�U����������u�<�\��JGC`���|�5��?������Iq��8[���c�o
��7���4�:,���2�
��`���������Le����w����v�v��R��l�CV�o���L��u@kL������TiK������_?��XA�����qg��k�f;i���h�����P��h9�
m�/����%~����N�����a;U�eT�yh������)l�
�� P%�6���4����V�\#�i���B�����$}���{��?�K���3��Q���{I����ji�7mv�I'��h�7����VTss_V��*�&��!}NdZ�0T���I��
����t�!��m�_\AvS�~��?����?�]������02�����?���xs9��g|���C��m�^�3,!�L���E�����t9L&�
=�����d`��&������J��UE�U�D��F��M�d;xM�,��Z�n+���n���J�V�Yb�u��~~\�����O�/\�X�F/W�^
r�q[��:u����Hq�ljM��A3��ciFyu�2��y��p�<��J��
��Q".gv�M����%gv2b�h0*����B���+�!�S�V�^��%EN��(��&*����n�?d �I�}�����7���i�U��jI������T���#F�s��F����][L��:|��]TX<� �5� U��d�c
����@"3�
�����e��b�0p�M����1�����F1/�e c��
����������h�4~������-
����(A�@�Mm���#���fU��������l@��4\�
�@gp����[��J��~K�g\�� ��c�1&Y�m�gR9�5�����A�3^+/jN���&]�d��f�����|���Ix	��Ta���]mf��-^G��Yy��(�P��p�drd�`�;��Uq/�g!Upi�
�������������XNs�[K>{��AaT;�]���Z
����_3c��
Hp[�O���:���m/�J�0�#��>@������N��"mI
2�s����@@$rm��l��j��)�����F������d���@l�w4�Cyq�'i��m���Zm4|z����$�8����/���a���CYoE; �#��C���mJt!����t��NZ���QbJ�)�?1LT��V���d})�F�b�����f�)ID�/B��P2�j�R����o�})�C,�����X�1���Q�g1��r,�K�b����0�HlE3E�L���p�v�z'=�Fqg��4���V���N����^6~���hb��[(bK�������feF�kK���([��`)9��;�����Ki^C�u5�����rj�.En'����P��m�)u��0��"�MP6Q����<���3q������;�f�V�6{�����Kn6q�#����<��
,�o��5\/����
��^�Os��<2������~R���}�)�+��L�(�2?�	�'������.�Cfv�3���H��sJQ�D!������;,^�O7U����o���n��#{�c���U[>2*`o��������
u~��D�*%	����PN�:Cn����ww��GO&TMF`�J�[u�p%��[B����=_#��B�L���kr��jv�h)�7��N�3r�� y�J��(��_�CQ�_�iD2������.�l���=�6�����F��s����^8OIE7X��8a|'���� �P�zY��+������n0Q��������F)�J���7-C����JBq��g�E����Kl,�-$:�&~"�������4��l����|>�F����x�;����y�Z[���k;��1c�[w�����������&'�l���{���)��L7D�	�:�o�
\}=��[>"X��Q��_���6/&��K�
:�l���T0R	C�V��o��t�1�Q�������(;�-'���~���;v�����U^��}BO��c;�q������D����$��:�i��%K������2��>%*'k���=� �����9H�	p6����"�-�AG�?v��R��7E�W/��R�2�?{w����n.���������]t�Nv���$���G�jt��5��z��k�F������^�������I�
v2-0014-WIP-allow-pg_detoast_datum-to-be-inlined.patch.gzapplication/x-patch-gzipDownload
v2-0015-WIP-jit-increase-inlining-budget.patch.gzapplication/x-patch-gzipDownload
v2-0016-WIP-Use-NullableDatum-for-slots-and-expression-ev.patch.gzapplication/x-patch-gzipDownload
v2-0017-WIP-expression-eval-ExprStateBuilder.patch.gzapplication/x-patch-gzipDownload
v2-0018-WIP-expression-eval-Move-ExprState-into-execExpr..patch.gzapplication/x-patch-gzipDownload
v2-0019-WIP-out-of-line-int-float-error-functions.patch.gzapplication/x-patch-gzipDownload
v2-0020-WIP-expression-eval-Add-ExprOpToString.patch.gzapplication/x-patch-gzipDownload
v2-0021-WIP-jit-Add-ExprCompileState.patch.gzapplication/x-patch-gzipDownload
v2-0022-WIP-jit-improve-basic-block-naming.patch.gzapplication/x-patch-gzipDownload
v2-0023-WIP-jit-noalias.patch.gzapplication/x-patch-gzipDownload
v2-0024-WIP-jit-Use-ExprCompileState-more-widely.patch.gzapplication/x-patch-gzipDownload
v2-0025-WIP-jit-debuginfo.patch.gzapplication/x-patch-gzipDownload
v2-0026-WIP-expression-eval-relative-pointer-suppport.patch.gzapplication/x-patch-gzipDownload
v2-0027-WIP-jit-Initialize-functions-with-local-target-in.patch.gzapplication/x-patch-gzipDownload
v2-0028-WIP-jit-add-a-few-additional-noinline-markers.patch.gzapplication/x-patch-gzipDownload
v2-0029-jit-initialization-improvements.patch.gzapplication/x-patch-gzipDownload
v2-0030-WIP-faster-isinf-check.patch.gzapplication/x-patch-gzipDownload
v2-0031-executor-replace-most-of-ExecScan-with-custom-rou.patch.gzapplication/x-patch-gzipDownload
#2Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Andres Freund (#1)
Re: WIP: expression evaluation improvements

Hey Andres,

After looking at
v2-0006-jit-Reference-functions-by-name-in-IOCOERCE-steps.patch, I was
wondering
about other places in the code where we have const pointers to functions
outside
LLVM's purview: specially EEOP_FUNCEXPR* for any function call expressions,
EEOP_DISTINCT and EEOP_NULLIF which involve operator specific comparison
function call invocations, deserialization and trans functions for
aggregates
etc. All of the above cases involve to some degree some server functions
that
can be inlined/optimized.

If we do go down this road, the most immediate solution that comes to mind
would
be to populate referenced_functions[] with these. Also, we can replace all
l_ptr_const() calls taking function addresses with calls to
llvm_function_reference() (this is safe as it falls back to a l_pt_const()
call). We could do the l_ptr_const() -> llvm_function_reference() even if we
don't go down this road.

One con with the approach above would be bloating of llvmjit_types.bc but we
would be introducing @declares instead of @defines in the IR...so I think
that
is fine.

Let me know your thoughts. I would like to submit a patch here in this
thread or
elsewhere.

--
Soumyadeep

#3Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#2)
Re: WIP: expression evaluation improvements

Hi,

On 2019-10-24 14:59:21 -0700, Soumyadeep Chakraborty wrote:

After looking at
v2-0006-jit-Reference-functions-by-name-in-IOCOERCE-steps.patch, I was
wondering
about other places in the code where we have const pointers to functions
outside
LLVM's purview: specially EEOP_FUNCEXPR* for any function call expressions,
EEOP_DISTINCT and EEOP_NULLIF which involve operator specific comparison
function call invocations, deserialization and trans functions for
aggregates
etc. All of the above cases involve to some degree some server functions
that
can be inlined/optimized.

I don't think there's other cases like this, except when we don't have a
symbol name. In the normal course that's "just" EEOP_PARAM_CALLBACK
IIRC.

For EEOP_PARAM_CALLBACK one solution would be to not use a callback
specified by pointer, but instead use an SQL level function taking an
INTERNAL parameter (to avoid it being called via SQL).

There's also a related edge-case where are unable to figure out a symbol
name in llvm_function_reference(), and then resort to creating a global
variable pointing to the function. This is a somewhat rare case (IIRC
it's mostly if not solely around language PL handlers), so I don't think
it matters *too* much.

We probably should change that to not initialize the global, and instead
resolve the symbol during link time. As long as we generate a symbol
name that llvm_resolve_symbol() can somehow resolve, we'd be good. I
was a bit wary of doing syscache lookups from within
llvm_resolve_symbol(), otherwise we could just look look up the function
address from within there. So if we went this route I'd probably go for
a hashtable of additional symbol resolutions, which
llvm_resolve_symbol() would consult.

If indeed the only case this is being hit is language PL handlers, it
might be better to instead work out the symbol name for that handler -
we should be able to get that via pg_language.lanplcallfoid.

If we do go down this road, the most immediate solution that comes to mind
would
be to populate referenced_functions[] with these. Also, we can replace all
l_ptr_const() calls taking function addresses with calls to
llvm_function_reference() (this is safe as it falls back to a l_pt_const()
call). We could do the l_ptr_const() -> llvm_function_reference() even if we
don't go down this road.

Which cases are you talking about here? Because I don't think there's
any others where would know a symbol name to add to referenced_functions
in the first place?

I'm also not quite clear what adding to referenced_functions would buy
us wrt constants. The benefit of adding a function there is that we get
the correct signature of the function, which makes it much harder to
accidentally screw up and call with the wrong signature. I don't think
there's any benefits around symbol names?

I do want to benefit from getting accurate signatures for patch
[PATCH v2 26/32] WIP: expression eval: relative pointer suppport
I had a number of cases where I passed the wrong parameters, and llvm
couldn't tell me...

Greetings,

Andres Freund

#4Andreas Karlsson
andreas@proxel.se
In reply to: Andres Freund (#1)
Re: WIP: expression evaluation improvements

On 10/23/19 6:38 PM, Andres Freund wrote:

In the attached *prototype* patch series there's a lot of incremental
improvements (and some cleanups) (in time, not importance order):

You may already know this but your patch set seems to require clang 9.

I get the below compilation error which is probably cause by
https://github.com/llvm/llvm-project/commit/90868bb0584f first being
committed for clang 9 (I run "clang version 7.0.1-8
(tags/RELEASE_701/final)").

In file included from gistutil.c:24:
../../../../src/include/utils/float.h:103:7: error: invalid output
constraint '=@ccae' in asm
: "=@ccae"(ret), [clobber_reg]"=&x"(clobber_reg)
^
1 error generated.

#5Andres Freund
andres@anarazel.de
In reply to: Andreas Karlsson (#4)
Re: WIP: expression evaluation improvements

Hi,

On 2019-10-25 00:43:37 +0200, Andreas Karlsson wrote:

On 10/23/19 6:38 PM, Andres Freund wrote:

In the attached *prototype* patch series there's a lot of incremental
improvements (and some cleanups) (in time, not importance order):

You may already know this but your patch set seems to require clang 9.

I didn't, so thanks!

I get the below compilation error which is probably cause by
https://github.com/llvm/llvm-project/commit/90868bb0584f first being
committed for clang 9 (I run "clang version 7.0.1-8
(tags/RELEASE_701/final)").

In file included from gistutil.c:24:
../../../../src/include/utils/float.h:103:7: error: invalid output
constraint '=@ccae' in asm
: "=@ccae"(ret), [clobber_reg]"=&x"(clobber_reg)
^
1 error generated.

I'll probably just drop this patch for now, it's not directly related. I
kind of wanted it on the list, so I have I place I can find it if I
forget :).

I think what really needs to happen instead is to improve the code
generated for __builtin_isinf[_sign]() by gcc/clang. They should proce
the constants like I did, instead of loading from the constant pool
every single time. That adds a fair bit of latency...

Greetings,

Andres Freund

#6Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Andres Freund (#3)
3 attachment(s)
Re: WIP: expression evaluation improvements

Hi Andres,

Apologies, I realize my understanding of symbol resolution and the
referenced_functions mechanism wasn't correct. Thank you for your very
helpful
explanations.

There's also a related edge-case where are unable to figure out a symbol
name in llvm_function_reference(), and then resort to creating a global
variable pointing to the function.

Indeed.

If indeed the only case this is being hit is language PL handlers, it
might be better to instead work out the symbol name for that handler -
we should be able to get that via pg_language.lanplcallfoid.

I took a stab at this (on top of your patch set):
v1-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patch

Which cases are you talking about here? Because I don't think there's
any others where would know a symbol name to add to referenced_functions
in the first place?

I had misunderstood the intent of referenced_functions.

I do want to benefit from getting accurate signatures for patch
[PATCH v2 26/32] WIP: expression eval: relative pointer suppport
I had a number of cases where I passed the wrong parameters, and llvm
couldn't tell me...

I took a stab:
v1-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patch

On a separate note, I had submitted a patch earlier to optimize functions
earlier
in accordance to the code comment:
/*
* Do function level optimization. This could be moved to the point where
* functions are emitted, to reduce memory usage a bit.
*/
LLVMInitializeFunctionPassManager(llvm_fpm);
Refer:
/messages/by-id/CAE-ML+_OE4-sHvn0AA_qakc5qkZvQvainxwb1ztuuT67SPMegw@mail.gmail.com
I have rebased that patch on top of your patch set. Here it is:
v2-0001-Optimize-generated-functions-earlier-to-lower-mem.patch

--
Soumyadeep

Attachments:

v2-0001-Optimize-generated-functions-earlier-to-lower-mem.patchapplication/octet-stream; name=v2-0001-Optimize-generated-functions-earlier-to-lower-mem.patchDownload
From 7094935d2dd7334922a782d8ffa8c08b9bcbb7fc Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Sun, 6 Oct 2019 17:47:08 -0700
Subject: [PATCH v2] Optimize generated functions earlier to lower memory usage

We can optimize functions as soon as they are generated as opposed to
optimizing during module optimization, in order to lower memory usage.
This commit introduces `llvm_optimize_function()`, which will execute
function level passes in accordance with the optimization settings
inside `LLVMJitContext`.
---
 src/backend/jit/llvm/llvmjit.c        | 82 ++++++++++++++++-----------
 src/backend/jit/llvm/llvmjit_deform.c |  3 +
 src/backend/jit/llvm/llvmjit_expr.c   |  3 +
 src/include/jit/llvmjit.h             |  2 +
 4 files changed, 57 insertions(+), 33 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 65f8273d30..cdac7b1c5d 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -122,7 +122,7 @@ static void llvm_release_context(JitContext *context);
 static void llvm_session_initialize(void);
 static void llvm_shutdown(int code, Datum arg);
 static void llvm_compile_module(LLVMJitContext *context);
-static void llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module);
+static void llvm_optimize_module(LLVMJitContext *context);
 
 static void llvm_create_types(void);
 static uint64_t llvm_resolve_symbol(const char *name, void *ctx);
@@ -435,14 +435,52 @@ llvm_function_reference(LLVMJitContext *context,
 static LLVMPassManagerBuilderRef llvm_pmb;
 
 /*
- * Optimize code in module using the flags set in context.
+ * Optimize code at the function-level for func using the flags set in context.
+ */
+void
+llvm_optimize_function(LLVMJitContext *context, LLVMValueRef func)
+{
+	LLVMPassManagerRef llvm_fpm;
+	int compile_optlevel = context->base.flags & PGJIT_OPT3;
+	instr_time starttime;
+	instr_time endtime;
+
+	INSTR_TIME_SET_CURRENT(starttime);
+
+	if (llvm_pmb == NULL)
+	{
+		llvm_pmb = LLVMPassManagerBuilderCreate();
+		LLVMPassManagerBuilderUseLibraryInfo(llvm_pmb, llvm_targetlibraryinfo);
+	}
+
+	LLVMPassManagerBuilderSetOptLevel(llvm_pmb, compile_optlevel);
+	llvm_fpm = LLVMCreateFunctionPassManagerForModule(context->module);
+	LLVMAddAnalysisPasses(llvm_opt3_targetmachine, llvm_fpm);
+	if (compile_optlevel == 0)
+	{
+		/* we rely on mem2reg heavily, so emit even in the O0 case */
+		LLVMAddPromoteMemoryToRegisterPass(llvm_fpm);
+	}
+	LLVMPassManagerBuilderPopulateFunctionPassManager(llvm_pmb, llvm_fpm);
+
+	LLVMRunFunctionPassManager(llvm_fpm, func);
+
+	LLVMFinalizeFunctionPassManager(llvm_fpm);
+	LLVMDisposePassManager(llvm_fpm);
+
+	INSTR_TIME_SET_CURRENT(endtime);
+	INSTR_TIME_ACCUM_DIFF(context->base.instr.optimization_counter,
+						  endtime, starttime);
+}
+
+/*
+ * Optimize code at the module-level for the current module using the flags set
+ * in context.
  */
 static void
-llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
+llvm_optimize_module(LLVMJitContext *context)
 {
 	LLVMPassManagerRef llvm_mpm;
-	LLVMPassManagerRef llvm_fpm;
-	LLVMValueRef func;
 	int			compile_optlevel;
 
 	if (context->base.flags & PGJIT_OPT3)
@@ -450,11 +488,11 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	else
 		compile_optlevel = 0;
 
-	if (llvm_pmb == NULL)
-	{
-		llvm_pmb = LLVMPassManagerBuilderCreate();
-		LLVMPassManagerBuilderUseLibraryInfo(llvm_pmb, llvm_targetlibraryinfo);
-	}
+	/*
+	 * llvm_pmb must have been initialized by now while
+	 * optimizing generated functions.
+	 */
+	Assert(llvm_pmb != NULL);
 
 	/*
 	 * Have to create a inliner every pass, as it gets "consumed" by
@@ -462,33 +500,11 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 	 */
 	LLVMPassManagerBuilderSetOptLevel(llvm_pmb, compile_optlevel);
 
-	llvm_fpm = LLVMCreateFunctionPassManagerForModule(module);
-	LLVMAddAnalysisPasses(llvm_opt3_targetmachine, llvm_fpm);
-
 	if (context->base.flags & PGJIT_OPT3)
 	{
 		/* TODO: Unscientifically determined threshold */
 		LLVMPassManagerBuilderUseInlinerWithThreshold(llvm_pmb, 1024);
 	}
-	else
-	{
-		/* we rely on mem2reg heavily, so emit even in the O0 case */
-		LLVMAddPromoteMemoryToRegisterPass(llvm_fpm);
-	}
-
-	LLVMPassManagerBuilderPopulateFunctionPassManager(llvm_pmb, llvm_fpm);
-
-	/*
-	 * Do function level optimization. This could be moved to the point where
-	 * functions are emitted, to reduce memory usage a bit.
-	 */
-	LLVMInitializeFunctionPassManager(llvm_fpm);
-	for (func = LLVMGetFirstFunction(context->module);
-		 func != NULL;
-		 func = LLVMGetNextFunction(func))
-		LLVMRunFunctionPassManager(llvm_fpm, func);
-	LLVMFinalizeFunctionPassManager(llvm_fpm);
-	LLVMDisposePassManager(llvm_fpm);
 
 	/*
 	 * Perform module level optimization. We do so even in the non-optimized
@@ -579,7 +595,7 @@ llvm_compile_module(LLVMJitContext *context)
 
 	/* optimize according to the chosen optimization settings */
 	INSTR_TIME_SET_CURRENT(starttime);
-	llvm_optimize_module(context, context->module);
+	llvm_optimize_module(context);
 	INSTR_TIME_SET_CURRENT(endtime);
 	INSTR_TIME_ACCUM_DIFF(context->base.instr.optimization_counter,
 						  endtime, starttime);
diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 533a4aa117..6ff0821072 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -759,5 +759,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc,
 
 	LLVMDisposeBuilder(b);
 
+	/* optimize according to the chosen optimization settings */
+	llvm_optimize_function(context, v_deform_fn);
+
 	return v_deform_fn;
 }
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index d527dd02b0..bbe0fd0cb3 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -2576,6 +2576,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 	LLVMDisposeDIBuilder(ecs.b_d);
 	LLVMDisposeBuilder(b);
 
+	/* optimize according to the chosen optimization settings */
+	llvm_optimize_function(ecs.context, ecs.fn);
+
 	/*
 	 * Don't immediately emit function, instead do so the first time the
 	 * expression is actually evaluated. That allows to emit a lot of
diff --git a/src/include/jit/llvmjit.h b/src/include/jit/llvmjit.h
index b7b6ed9b34..751bb69e25 100644
--- a/src/include/jit/llvmjit.h
+++ b/src/include/jit/llvmjit.h
@@ -115,6 +115,8 @@ extern LLVMValueRef llvm_function_reference(LLVMJitContext *context,
 						LLVMBuilderRef builder,
 						LLVMModuleRef mod,
 						FunctionCallInfo fcinfo);
+extern void llvm_optimize_function(LLVMJitContext *context,
+						LLVMValueRef func);
 
 extern void llvm_inline(LLVMModuleRef mod);
 
-- 
2.23.0

v1-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patchapplication/octet-stream; name=v1-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patchDownload
From 07c9833945e3f762f5d5cffc7272b6477bd14291 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Sun, 27 Oct 2019 23:35:45 -0700
Subject: [PATCH v1] Rely on llvmjit_types for building EvalFunc calls

Rely on llvmjit_types's referenced_functions for building calls to EvalFuncs.
This ensures signature checks for the calls constructed.

Discussion:
https://postgr.es/m/20191024224303.jvdx3hq3ak2vbit3%40alap3.anarazel.de:wq
---
 src/backend/jit/llvm/llvmjit.c       | 60 +++++++++++++++++++
 src/backend/jit/llvm/llvmjit_expr.c  | 88 +++++++++++-----------------
 src/backend/jit/llvm/llvmjit_types.c | 31 +++++++++-
 src/include/jit/llvmjit.h            | 30 +++++++++-
 4 files changed, 153 insertions(+), 56 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index cdac7b1c5d..7a66d48645 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -98,6 +98,37 @@ LLVMValueRef FuncExecAggInitGroup;
 LLVMValueRef FuncExecEvalArrayCoerceRelabel;
 LLVMValueRef FuncExecEvalArrayCoerceUnpack;
 LLVMValueRef FuncExecEvalArrayCoercePack;
+LLVMValueRef FuncExecAggInitGroup;
+LLVMValueRef FuncExecAggTransReparent;
+LLVMValueRef FuncExecEvalAggOrderedTransDatum;
+LLVMValueRef FuncExecEvalAggOrderedTransTuple;
+LLVMValueRef FuncExecEvalAlternativeSubPlan;
+LLVMValueRef FuncExecEvalArrayExpr;
+LLVMValueRef FuncExecEvalConstraintCheck;
+LLVMValueRef FuncExecEvalConstraintNotNull;
+LLVMValueRef FuncExecEvalConvertRowtype;
+LLVMValueRef FuncExecEvalCurrentOfExpr;
+LLVMValueRef FuncExecEvalFieldSelect;
+LLVMValueRef FuncExecEvalFieldStoreDeForm;
+LLVMValueRef FuncExecEvalFieldStoreForm;
+LLVMValueRef FuncExecEvalFuncExprFusage;
+LLVMValueRef FuncExecEvalFuncExprStrictFusage;
+LLVMValueRef FuncExecEvalGroupingFunc;
+LLVMValueRef FuncExecEvalMinMax;
+LLVMValueRef FuncExecEvalNextValueExpr;
+LLVMValueRef FuncExecEvalParamExec;
+LLVMValueRef FuncExecEvalParamExtern;
+LLVMValueRef FuncExecEvalRow;
+LLVMValueRef FuncExecEvalRowNotNull;
+LLVMValueRef FuncExecEvalRowNull;
+LLVMValueRef FuncExecEvalScalarArrayOp;
+LLVMValueRef FuncExecEvalSQLValueFunction;
+LLVMValueRef FuncExecEvalSubPlan;
+LLVMValueRef FuncExecEvalSubscriptingRefAssign;
+LLVMValueRef FuncExecEvalSubscriptingRefFetch;
+LLVMValueRef FuncExecEvalSubscriptingRefOld;
+LLVMValueRef FuncExecEvalWholeRowVar;
+LLVMValueRef FuncExecEvalXmlExpr;
 
 
 static bool llvm_session_initialized = false;
@@ -928,6 +959,35 @@ llvm_create_types(void)
 	FuncExecEvalArrayCoerceRelabel = LLVMGetNamedFunction(mod, "ExecEvalArrayCoerceRelabel");
 	FuncExecEvalArrayCoerceUnpack = LLVMGetNamedFunction(mod, "ExecEvalArrayCoerceUnpack");
 	FuncExecEvalArrayCoercePack = LLVMGetNamedFunction(mod, "ExecEvalArrayCoercePack");
+	FuncExecEvalFuncExprFusage = LLVMGetNamedFunction(mod, "ExecEvalFuncExprFusage");
+	FuncExecEvalFuncExprStrictFusage = LLVMGetNamedFunction(mod, "ExecEvalFuncExprStrictFusage");
+	FuncExecEvalRowNull = LLVMGetNamedFunction(mod, "ExecEvalRowNull");
+	FuncExecEvalRowNotNull = LLVMGetNamedFunction(mod, "ExecEvalRowNotNull");
+	FuncExecEvalParamExec = LLVMGetNamedFunction(mod, "ExecEvalParamExec");
+	FuncExecEvalParamExtern = LLVMGetNamedFunction(mod, "ExecEvalParamExtern");
+	FuncExecEvalSubscriptingRefOld = LLVMGetNamedFunction(mod, "ExecEvalSubscriptingRefOld");
+	FuncExecEvalSubscriptingRefAssign = LLVMGetNamedFunction(mod, "ExecEvalSubscriptingRefAssign");
+	FuncExecEvalSubscriptingRefFetch = LLVMGetNamedFunction(mod, "ExecEvalSubscriptingRefFetch");
+	FuncExecEvalSQLValueFunction = LLVMGetNamedFunction(mod, "ExecEvalSQLValueFunction");
+	FuncExecEvalCurrentOfExpr = LLVMGetNamedFunction(mod, "ExecEvalCurrentOfExpr");
+	FuncExecEvalNextValueExpr = LLVMGetNamedFunction(mod, "ExecEvalNextValueExpr");
+	FuncExecEvalArrayExpr = LLVMGetNamedFunction(mod, "ExecEvalArrayExpr");
+	FuncExecEvalRow = LLVMGetNamedFunction(mod, "ExecEvalRow");
+	FuncExecEvalMinMax = LLVMGetNamedFunction(mod, "ExecEvalMinMax");
+	FuncExecEvalFieldSelect = LLVMGetNamedFunction(mod, "ExecEvalFieldSelect");
+	FuncExecEvalFieldStoreDeForm = LLVMGetNamedFunction(mod, "ExecEvalFieldStoreDeForm");
+	FuncExecEvalFieldStoreForm = LLVMGetNamedFunction(mod, "ExecEvalFieldStoreForm");
+	FuncExecEvalConstraintNotNull = LLVMGetNamedFunction(mod, "ExecEvalConstraintNotNull");
+	FuncExecEvalConstraintCheck = LLVMGetNamedFunction(mod, "ExecEvalConstraintCheck");
+	FuncExecEvalConvertRowtype = LLVMGetNamedFunction(mod, "ExecEvalConvertRowtype");
+	FuncExecEvalScalarArrayOp = LLVMGetNamedFunction(mod, "ExecEvalScalarArrayOp");
+	FuncExecEvalXmlExpr = LLVMGetNamedFunction(mod, "ExecEvalXmlExpr");
+	FuncExecEvalGroupingFunc = LLVMGetNamedFunction(mod, "ExecEvalGroupingFunc");
+	FuncExecEvalSubPlan = LLVMGetNamedFunction(mod, "ExecEvalSubPlan");
+	FuncExecEvalAlternativeSubPlan = LLVMGetNamedFunction(mod, "ExecEvalAlternativeSubPlan");
+	FuncExecEvalAggOrderedTransDatum = LLVMGetNamedFunction(mod, "ExecEvalAggOrderedTransDatum");
+	FuncExecEvalAggOrderedTransTuple = LLVMGetNamedFunction(mod, "ExecEvalAggOrderedTransTuple");
+	FuncExecEvalWholeRowVar = LLVMGetNamedFunction(mod, "ExecEvalWholeRowVar");
 
 	/*
 	 * Leave the module alive, otherwise references to function would be
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index bbe0fd0cb3..9284e32bde 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -131,12 +131,12 @@ static LLVMValueRef BuildV1CallFC(ExprCompileState *ecs,
 								  LLVMValueRef v_fcinfo,
 								  LLVMValueRef *v_fcinfo_isnull);
 
-#define build_EvalXFunc(ecs, funcname, opno, ...) \
-	build_EvalXFuncInt(ecs, funcname, opno, \
+#define build_EvalXFunc(ecs, v_fn, opno, ...) \
+	build_EvalXFuncInt(ecs, v_fn, opno, \
 					   lengthof(((LLVMValueRef[]){__VA_ARGS__})), \
 					   ((LLVMValueRef[]){__VA_ARGS__}))
 
-static void build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname,
+static void build_EvalXFuncInt(ExprCompileState *ecs, LLVMValueRef v_fn,
 							   int opno, int natts, LLVMValueRef v_args[]);
 
 /*
@@ -561,7 +561,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_WHOLEROW:
-				build_EvalXFunc(&ecs, "ExecEvalWholeRowVar", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalWholeRowVar, opno,
 								ecs.v_econtext,	v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -798,7 +798,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_FUNCEXPR_FUSAGE:
-				build_EvalXFunc(&ecs, "ExecEvalFuncExprFusage", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalFuncExprFusage, opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.func.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -806,7 +806,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 
 
 			case EEOP_FUNCEXPR_STRICT_FUSAGE:
-				build_EvalXFunc(&ecs, "ExecEvalFuncExprStrictFusage", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalFuncExprStrictFusage, opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.func.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -1190,13 +1190,13 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_NULLTEST_ROWISNULL:
-				build_EvalXFunc(&ecs, "ExecEvalRowNull", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalRowNull, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_NULLTEST_ROWISNOTNULL:
-				build_EvalXFunc(&ecs, "ExecEvalRowNotNull", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalRowNotNull, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1269,13 +1269,13 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_PARAM_EXEC:
-				build_EvalXFunc(&ecs, "ExecEvalParamExec", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalParamExec, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_PARAM_EXTERN:
-				build_EvalXFunc(&ecs, "ExecEvalParamExtern", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalParamExtern, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1322,21 +1322,21 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SBSREF_OLD:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefOld", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalSubscriptingRefOld, opno,
 								v_resultp,
 								expr_nullable_datum_ref(&ecs, op->d.sbsref.prev));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SBSREF_ASSIGN:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefAssign", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalSubscriptingRefAssign, opno,
 								v_resultp,
 								expr_nullable_datum_ref(&ecs, op->d.sbsref.replace));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SBSREF_FETCH:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefFetch", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalSubscriptingRefFetch, opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1755,24 +1755,24 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SQLVALUEFUNCTION:
-				build_EvalXFunc(&ecs, "ExecEvalSQLValueFunction", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalSQLValueFunction, opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_CURRENTOFEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalCurrentOfExpr", opno);
+				build_EvalXFunc(&ecs, FuncExecEvalCurrentOfExpr, opno);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_NEXTVALUEEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalNextValueExpr", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalNextValueExpr, opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_ARRAYEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalArrayExpr", opno, v_resultp,
+				build_EvalXFunc(&ecs, FuncExecEvalArrayExpr, opno, v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.arrayexpr.elements));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1842,7 +1842,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_ROW:
-				build_EvalXFunc(&ecs, "ExecEvalRow", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalRow, opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.row.elements));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -2003,7 +2003,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_MINMAX:
-				build_EvalXFunc(&ecs, "ExecEvalMinMax", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalMinMax, opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.minmax.arguments),
 								expr_fcinfo_ref(&ecs, op->d.minmax.fcinfo_data, NULL));
@@ -2011,20 +2011,20 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				break;
 
 			case EEOP_FIELDSELECT:
-				build_EvalXFunc(&ecs, "ExecEvalFieldSelect", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalFieldSelect, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_FIELDSTORE_DEFORM:
-				build_EvalXFunc(&ecs, "ExecEvalFieldStoreDeForm", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalFieldStoreDeForm, opno,
 								ecs.v_econtext, v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.fieldstore.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_FIELDSTORE_FORM:
-				build_EvalXFunc(&ecs, "ExecEvalFieldStoreForm", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalFieldStoreForm, opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.fieldstore.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -2101,32 +2101,32 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_DOMAIN_NOTNULL:
-				build_EvalXFunc(&ecs, "ExecEvalConstraintNotNull", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalConstraintNotNull, opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_DOMAIN_CHECK:
-				build_EvalXFunc(&ecs, "ExecEvalConstraintCheck", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalConstraintCheck, opno,
 								expr_nullable_datum_ref(&ecs, op->d.domaincheck.check));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_CONVERT_ROWTYPE:
-				build_EvalXFunc(&ecs, "ExecEvalConvertRowtype", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalConvertRowtype, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SCALARARRAYOP:
-				build_EvalXFunc(&ecs, "ExecEvalScalarArrayOp", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalScalarArrayOp, opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.scalararrayop.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_XMLEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalXmlExpr", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalXmlExpr, opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.xmlexpr.args),
 								expr_nullable_datum_array_ref(&ecs, op->d.xmlexpr.named_args));
@@ -2162,7 +2162,7 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_GROUPING_FUNC:
-				build_EvalXFunc(&ecs, "ExecEvalGroupingFunc", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalGroupingFunc, opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -2196,13 +2196,13 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SUBPLAN:
-				build_EvalXFunc(&ecs, "ExecEvalSubPlan", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalSubPlan, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_ALTERNATIVE_SUBPLAN:
-				build_EvalXFunc(&ecs, "ExecEvalAlternativeSubPlan", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalAlternativeSubPlan, opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -2551,13 +2551,13 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_AGG_ORDERED_TRANS_DATUM:
-				build_EvalXFunc(&ecs, "ExecEvalAggOrderedTransDatum", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalAggOrderedTransDatum, opno,
 								expr_nullable_datum_array_ref(&ecs, op->d.agg_trans_ordered.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_AGG_ORDERED_TRANS_TUPLE:
-				build_EvalXFunc(&ecs, "ExecEvalAggOrderedTransTuple", opno,
+				build_EvalXFunc(&ecs, FuncExecEvalAggOrderedTransTuple, opno,
 								expr_nullable_datum_array_ref(&ecs, op->d.agg_trans_ordered.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -3115,37 +3115,17 @@ BuildV1CallFC(ExprCompileState *ecs,
  * Implement an expression step by calling the function funcname.
  */
 static void
-build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname, int opno,
+build_EvalXFuncInt(ExprCompileState *ecs, LLVMValueRef v_fn, int opno,
 				   int nargs, LLVMValueRef v_args[])
 {
 	LLVMValueRef v_opno = l_int32_const(opno);
 	LLVMValueRef v_opp;
-	LLVMTypeRef sig;
-	LLVMValueRef v_fn;
 	LLVMValueRef *params;
-	LLVMTypeRef *param_types;
 
 	params = palloc(sizeof(LLVMValueRef) * (2 + nargs));
-	param_types = palloc(sizeof(LLVMValueRef) * (2 + nargs));
 
 	v_opp = LLVMBuildGEP(ecs->b, ecs->v_steps,
 						 (LLVMValueRef[]){l_int32_const(0), v_opno}, 2, "");
-	v_fn = LLVMGetNamedFunction(ecs->mod, funcname);
-	if (!v_fn)
-	{
-		int argno = 0;
-
-		param_types[argno++] = l_ptr(StructExprState);
-		param_types[argno++] = l_ptr(StructExprEvalStep);
-
-		for (int i = 0; i < nargs; i++)
-			param_types[argno++] = LLVMTypeOf(v_args[i]);
-
-		sig = LLVMFunctionType(LLVMVoidType(),
-							   param_types, argno,
-							   false);
-		v_fn = LLVMAddFunction(ecs->mod, funcname, sig);
-	}
 
 	{
 		int argno = 0;
@@ -3156,6 +3136,6 @@ build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname, int opno,
 		for (int i = 0; i < nargs; i++)
 			params[argno++] = v_args[i];
 
-		LLVMBuildCall(ecs->b, v_fn, params, argno, "");
+		LLVMBuildCall(ecs->b, llvm_get_decl(ecs->mod, v_fn), params, argno, "");
 	}
 }
diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c
index 082b312060..d414e60b99 100644
--- a/src/backend/jit/llvm/llvmjit_types.c
+++ b/src/backend/jit/llvm/llvmjit_types.c
@@ -114,5 +114,34 @@ void	   *referenced_functions[] =
 	ExecAggInitGroup,
 	ExecEvalArrayCoerceRelabel,
 	ExecEvalArrayCoerceUnpack,
-	ExecEvalArrayCoercePack
+	ExecEvalArrayCoercePack,
+	ExecEvalFuncExprFusage,
+	ExecEvalFuncExprStrictFusage,
+	ExecEvalRowNull,
+	ExecEvalRowNotNull,
+	ExecEvalParamExec,
+	ExecEvalParamExtern,
+	ExecEvalSubscriptingRefOld,
+	ExecEvalSubscriptingRefAssign,
+	ExecEvalSubscriptingRefFetch,
+	ExecEvalSQLValueFunction,
+	ExecEvalCurrentOfExpr,
+	ExecEvalNextValueExpr,
+	ExecEvalArrayExpr,
+	ExecEvalRow,
+	ExecEvalMinMax,
+	ExecEvalFieldSelect,
+	ExecEvalFieldStoreDeForm,
+	ExecEvalFieldStoreForm,
+	ExecEvalConstraintNotNull,
+	ExecEvalConstraintCheck,
+	ExecEvalConvertRowtype,
+	ExecEvalScalarArrayOp,
+	ExecEvalXmlExpr,
+	ExecEvalGroupingFunc,
+	ExecEvalSubPlan,
+	ExecEvalAlternativeSubPlan,
+	ExecEvalAggOrderedTransDatum,
+	ExecEvalAggOrderedTransTuple,
+	ExecEvalWholeRowVar
 };
diff --git a/src/include/jit/llvmjit.h b/src/include/jit/llvmjit.h
index 751bb69e25..4fd0dc7072 100644
--- a/src/include/jit/llvmjit.h
+++ b/src/include/jit/llvmjit.h
@@ -97,7 +97,35 @@ extern LLVMValueRef FuncExecAggInitGroup;
 extern LLVMValueRef FuncExecEvalArrayCoerceRelabel;
 extern LLVMValueRef FuncExecEvalArrayCoerceUnpack;
 extern LLVMValueRef FuncExecEvalArrayCoercePack;
-
+extern LLVMValueRef FuncExecEvalFuncExprFusage;
+extern LLVMValueRef FuncExecEvalFuncExprStrictFusage;
+extern LLVMValueRef FuncExecEvalRowNull;
+extern LLVMValueRef FuncExecEvalRowNotNull;
+extern LLVMValueRef FuncExecEvalParamExec;
+extern LLVMValueRef FuncExecEvalParamExtern;
+extern LLVMValueRef FuncExecEvalSubscriptingRefOld;
+extern LLVMValueRef FuncExecEvalSubscriptingRefAssign;
+extern LLVMValueRef FuncExecEvalSubscriptingRefFetch;
+extern LLVMValueRef FuncExecEvalSQLValueFunction;
+extern LLVMValueRef FuncExecEvalCurrentOfExpr;
+extern LLVMValueRef FuncExecEvalNextValueExpr;
+extern LLVMValueRef FuncExecEvalArrayExpr;
+extern LLVMValueRef FuncExecEvalRow;
+extern LLVMValueRef FuncExecEvalMinMax;
+extern LLVMValueRef FuncExecEvalFieldSelect;
+extern LLVMValueRef FuncExecEvalFieldStoreDeForm;
+extern LLVMValueRef FuncExecEvalFieldStoreForm;
+extern LLVMValueRef FuncExecEvalConstraintNotNull;
+extern LLVMValueRef FuncExecEvalConstraintCheck;
+extern LLVMValueRef FuncExecEvalConvertRowtype;
+extern LLVMValueRef FuncExecEvalScalarArrayOp;
+extern LLVMValueRef FuncExecEvalXmlExpr;
+extern LLVMValueRef FuncExecEvalGroupingFunc;
+extern LLVMValueRef FuncExecEvalSubPlan;
+extern LLVMValueRef FuncExecEvalAlternativeSubPlan;
+extern LLVMValueRef FuncExecEvalAggOrderedTransDatum;
+extern LLVMValueRef FuncExecEvalAggOrderedTransTuple;
+extern LLVMValueRef FuncExecEvalWholeRowVar;
 
 extern void llvm_enter_fatal_on_oom(void);
 extern void llvm_leave_fatal_on_oom(void);
-- 
2.23.0

v1-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchapplication/octet-stream; name=v1-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchDownload
From 07c7ff996706c6f71e00d76894845c1f87956472 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Sun, 27 Oct 2019 17:42:53 -0700
Subject: [PATCH v1] Resolve PL handler names for JITed code instead of using
 const pointers

Using const pointers to PL handler functions prevents optimization
opportunities in JITed code. Now fmgr_symbol() resolves PL function
references to the corresponding language's handler.
llvm_function_reference() now no longer needs to create the global to
such a function.

Discussion: https://postgr.es/m/20191024224303.jvdx3hq3ak2vbit3%40alap3.anarazel.de:wq
---
 src/backend/jit/llvm/llvmjit.c | 29 +++--------------------------
 src/backend/utils/fmgr/fmgr.c  | 30 +++++++++++++++++++++++-------
 2 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 82c4afb701..69a4167ac9 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -369,38 +369,15 @@ llvm_function_reference(LLVMJitContext *context,
 
 	fmgr_symbol(fcinfo->flinfo->fn_oid, &modname, &basename);
 
-	if (modname != NULL && basename != NULL)
+	if (modname != NULL)
 	{
 		/* external function in loadable library */
 		funcname = psprintf("pgextern.%s.%s", modname, basename);
 	}
-	else if (basename != NULL)
-	{
-		/* internal function */
-		funcname = psprintf("%s", basename);
-	}
 	else
 	{
-		/*
-		 * Function we don't know to handle, return pointer. We do so by
-		 * creating a global constant containing a pointer to the function.
-		 * Makes IR more readable.
-		 */
-		LLVMValueRef v_fn_addr;
-
-		funcname = psprintf("pgoidextern.%u",
-							fcinfo->flinfo->fn_oid);
-		v_fn = LLVMGetNamedGlobal(mod, funcname);
-		if (v_fn != 0)
-			return LLVMBuildLoad(builder, v_fn, "");
-
-		v_fn_addr = l_ptr_const(fcinfo->flinfo->fn_addr, TypePGFunction);
-
-		v_fn = LLVMAddGlobal(mod, TypePGFunction, funcname);
-		LLVMSetInitializer(v_fn, v_fn_addr);
-		LLVMSetGlobalConstant(v_fn, true);
-
-		return LLVMBuildLoad(builder, v_fn, "");
+		/* internal function or a PL handler */
+		funcname = psprintf("%s", basename);
 	}
 
 	/* check if function already has been added */
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 099ebd779b..71398bb3c1 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -265,11 +265,9 @@ fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
 /*
  * Return module and C function name providing implementation of functionId.
  *
- * If *mod == NULL and *fn == NULL, no C symbol is known to implement
- * function.
- *
  * If *mod == NULL and *fn != NULL, the function is implemented by a symbol in
- * the main binary.
+ * the main binary. If the function being looked up is not a C language
+ * function, it's language handler name is returned.
  *
  * If *mod != NULL and *fn !=NULL the function is implemented in an extension
  * shared object.
@@ -285,6 +283,11 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 	bool		isnull;
 	Datum		prosrcattr;
 	Datum		probinattr;
+	Oid			language;
+	HeapTuple	languageTuple;
+	Form_pg_language languageStruct;
+	HeapTuple	plHandlerProcedureTuple;
+	Form_pg_proc plHandlerProcedureStruct;
 
 	/* Otherwise we need the pg_proc entry */
 	procedureTuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(functionId));
@@ -304,8 +307,9 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 		return;
 	}
 
+	language = procedureStruct->prolang;
 	/* see fmgr_info_cxt_security for the individual cases */
-	switch (procedureStruct->prolang)
+	switch (language)
 	{
 		case INTERNALlanguageId:
 			prosrcattr = SysCacheGetAttr(PROCOID, procedureTuple,
@@ -342,9 +346,21 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 			break;
 
 		default:
+			languageTuple = SearchSysCache1(LANGOID,
+											 ObjectIdGetDatum(language));
+			if (!HeapTupleIsValid(languageTuple))
+				elog(ERROR, "cache lookup failed for language %u", language);
+			languageStruct = (Form_pg_language) GETSTRUCT(languageTuple);
+			plHandlerProcedureTuple = SearchSysCache1(PROCOID,
+													  ObjectIdGetDatum(
+														  languageStruct->lanplcallfoid));
+			if (!HeapTupleIsValid(plHandlerProcedureTuple))
+				elog(ERROR, "cache lookup failed for function %u", functionId);
+			plHandlerProcedureStruct = (Form_pg_proc) GETSTRUCT(plHandlerProcedureTuple);
 			*mod = NULL;
-			*fn = NULL;			/* unknown, pass pointer */
-			break;
+			*fn = pstrdup(NameStr(plHandlerProcedureStruct->proname));
+			ReleaseSysCache(languageTuple);
+			ReleaseSysCache(plHandlerProcedureTuple);
 	}
 
 	ReleaseSysCache(procedureTuple);
-- 
2.23.0

#7Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#6)
Re: WIP: expression evaluation improvements

Hi,

On 2019-10-27 23:46:22 -0700, Soumyadeep Chakraborty wrote:

Apologies, I realize my understanding of symbol resolution and the
referenced_functions mechanism wasn't correct. Thank you for your very
helpful
explanations.

No worries! I was just wondering whether I was misunderstanding you.

If indeed the only case this is being hit is language PL handlers, it
might be better to instead work out the symbol name for that handler -
we should be able to get that via pg_language.lanplcallfoid.

I took a stab at this (on top of your patch set):
v1-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patch

I think I'd probably try to apply this to master independent of the
larger patchset, to avoid a large dependency.

From 07c7ff996706c6f71e00d76894845c1f87956472 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Sun, 27 Oct 2019 17:42:53 -0700
Subject: [PATCH v1] Resolve PL handler names for JITed code instead of using
const pointers

Using const pointers to PL handler functions prevents optimization
opportunities in JITed code. Now fmgr_symbol() resolves PL function
references to the corresponding language's handler.
llvm_function_reference() now no longer needs to create the global to
such a function.

Did you check whether there's any cases this fails in the tree with your
patch applied? The way I usually do that is by running the regression
tests like
PGOPTIONS='-cjit_above_cost=0' make -s -Otarget check-world

(which will take a bit longer if use an optimized LLVM build, and a
*lot* longer if you use a debug llvm build)

Discussion: /messages/by-id/20191024224303.jvdx3hq3ak2vbit3@alap3.anarazel.de:wq
---
src/backend/jit/llvm/llvmjit.c | 29 +++--------------------------
src/backend/utils/fmgr/fmgr.c | 30 +++++++++++++++++++++++-------
2 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 82c4afb701..69a4167ac9 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -369,38 +369,15 @@ llvm_function_reference(LLVMJitContext *context,

fmgr_symbol(fcinfo->flinfo->fn_oid, &modname, &basename);

-	if (modname != NULL && basename != NULL)
+	if (modname != NULL)
{
/* external function in loadable library */
funcname = psprintf("pgextern.%s.%s", modname, basename);
}
-	else if (basename != NULL)
-	{
-		/* internal function */
-		funcname = psprintf("%s", basename);
-	}
else
{
-		/*
-		 * Function we don't know to handle, return pointer. We do so by
-		 * creating a global constant containing a pointer to the function.
-		 * Makes IR more readable.
-		 */
-		LLVMValueRef v_fn_addr;
-
-		funcname = psprintf("pgoidextern.%u",
-							fcinfo->flinfo->fn_oid);
-		v_fn = LLVMGetNamedGlobal(mod, funcname);
-		if (v_fn != 0)
-			return LLVMBuildLoad(builder, v_fn, "");
-
-		v_fn_addr = l_ptr_const(fcinfo->flinfo->fn_addr, TypePGFunction);
-
-		v_fn = LLVMAddGlobal(mod, TypePGFunction, funcname);
-		LLVMSetInitializer(v_fn, v_fn_addr);
-		LLVMSetGlobalConstant(v_fn, true);
-
-		return LLVMBuildLoad(builder, v_fn, "");
+		/* internal function or a PL handler */
+		funcname = psprintf("%s", basename);
}

Hm. Aren't you breaking things here? If fmgr_symbol returns a basename
of NULL, as is the case for all internal functions, you're going to
print a NULL pointer, no?

/* check if function already has been added */
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 099ebd779b..71398bb3c1 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -265,11 +265,9 @@ fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
/*
* Return module and C function name providing implementation of functionId.
*
- * If *mod == NULL and *fn == NULL, no C symbol is known to implement
- * function.
- *
* If *mod == NULL and *fn != NULL, the function is implemented by a symbol in
- * the main binary.
+ * the main binary. If the function being looked up is not a C language
+ * function, it's language handler name is returned.
*
* If *mod != NULL and *fn !=NULL the function is implemented in an extension
* shared object.
@@ -285,6 +283,11 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
bool		isnull;
Datum		prosrcattr;
Datum		probinattr;
+	Oid			language;
+	HeapTuple	languageTuple;
+	Form_pg_language languageStruct;
+	HeapTuple	plHandlerProcedureTuple;
+	Form_pg_proc plHandlerProcedureStruct;

/* Otherwise we need the pg_proc entry */
procedureTuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(functionId));
@@ -304,8 +307,9 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
return;
}

+ language = procedureStruct->prolang;
/* see fmgr_info_cxt_security for the individual cases */
- switch (procedureStruct->prolang)
+ switch (language)
{
case INTERNALlanguageId:
prosrcattr = SysCacheGetAttr(PROCOID, procedureTuple,
@@ -342,9 +346,21 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
break;

default:
+			languageTuple = SearchSysCache1(LANGOID,
+											 ObjectIdGetDatum(language));
+			if (!HeapTupleIsValid(languageTuple))
+				elog(ERROR, "cache lookup failed for language %u", language);
+			languageStruct = (Form_pg_language) GETSTRUCT(languageTuple);
+			plHandlerProcedureTuple = SearchSysCache1(PROCOID,
+													  ObjectIdGetDatum(
+														  languageStruct->lanplcallfoid));
+			if (!HeapTupleIsValid(plHandlerProcedureTuple))
+				elog(ERROR, "cache lookup failed for function %u", functionId);
+			plHandlerProcedureStruct = (Form_pg_proc) GETSTRUCT(plHandlerProcedureTuple);
*mod = NULL;
-			*fn = NULL;			/* unknown, pass pointer */
-			break;
+			*fn = pstrdup(NameStr(plHandlerProcedureStruct->proname));
+			ReleaseSysCache(languageTuple);
+			ReleaseSysCache(plHandlerProcedureTuple);
}

I do want to benefit from getting accurate signatures for patch
[PATCH v2 26/32] WIP: expression eval: relative pointer suppport
I had a number of cases where I passed the wrong parameters, and llvm
couldn't tell me...

I took a stab:
v1-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patch

Cool! I'll probably merge that into my patch (with attribution of
course).

I wonder if it'd nicer to not have separate C variables for all of
these, and instead look them up on-demand from the module loaded in
llvm_create_types(). Not sure.

On a separate note, I had submitted a patch earlier to optimize functions
earlier
in accordance to the code comment:
/*
* Do function level optimization. This could be moved to the point where
* functions are emitted, to reduce memory usage a bit.
*/
LLVMInitializeFunctionPassManager(llvm_fpm);
Refer:
/messages/by-id/CAE-ML+_OE4-sHvn0AA_qakc5qkZvQvainxwb1ztuuT67SPMegw@mail.gmail.com
I have rebased that patch on top of your patch set. Here it is:
v2-0001-Optimize-generated-functions-earlier-to-lower-mem.patch

Sorry for not replying to that earlier. I'm not quite sure it's
actually worthwhile doing so - did you try to measure any memory / cpu
savings?

The magnitude of wins aside, I also have a local patch that I'm going to
try to publish this or next week, that deduplicates functions more
aggressively, mostly to avoid redundant optimizations. It's quite
possible that we should run that before the function passes - or even
give up entirely on the function pass optimizations. As the module pass
manager does the same optimizations it's not that clear in which cases
it'd be beneficial to run it, especially if it means we can't
deduplicate before optimizations.

Greetings,

Andres Freund

#8Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Andres Freund (#7)
3 attachment(s)
Re: WIP: expression evaluation improvements

Hi Andres,

I think I'd probably try to apply this to master independent of the
larger patchset, to avoid a large dependency.

Awesome! +1. Attached 2nd version of patch rebased on master.
(v2-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patch)

Did you check whether there's any cases this fails in the tree with your
patch applied? The way I usually do that is by running the regression
tests like
PGOPTIONS='-cjit_above_cost=0' make -s -Otarget check-world

(which will take a bit longer if use an optimized LLVM build, and a
*lot* longer if you use a debug llvm build)

Great suggestion! I used:
PGOPTIONS='-c jit_above_cost=0' gmake installcheck-world
It all passed except a couple of logical decoding tests that never pass
on my machine for any tree (t/006_logical_decoding.pl and
t/010_logical_decoding_timelines.pl) and point (which seems to be failing
even
on master as of: d80be6f2f) I have attached the regression.diffs which
captures
the point failure.

Hm. Aren't you breaking things here? If fmgr_symbol returns a basename
of NULL, as is the case for all internal functions, you're going to
print a NULL pointer, no?

For internal functions, it is supposed to return modname = NULL but basename
will be non-NULL right? As things stand, fmgr_symbol can never return a
null
basename. I have added an Assert to make that even more explicit.

Cool! I'll probably merge that into my patch (with attribution of
course).

I wonder if it'd nicer to not have separate C variables for all of
these, and instead look them up on-demand from the module loaded in
llvm_create_types(). Not sure.

Great! It is much nicer indeed. Attached version 2 with your suggested
changes.
(v2-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patch)
Used the same testing method as above.

Sorry for not replying to that earlier. I'm not quite sure it's
actually worthwhile doing so - did you try to measure any memory / cpu
savings?

No problem, thanks for the reply! Unfortunately, I did not do anything
significant in terms of mem/cpu measurements. However, I have noticed
non-trivial
differences between optimized and unoptimized .bc files that were dumped
from
time to time.

The magnitude of wins aside, I also have a local patch that I'm going to
try to publish this or next week, that deduplicates functions more
aggressively, mostly to avoid redundant optimizations. It's quite
possible that we should run that before the function passes - or even
give up entirely on the function pass optimizations. As the module pass
manager does the same optimizations it's not that clear in which cases
it'd be beneficial to run it, especially if it means we can't
deduplicate before optimizations.

Agreed, excited to see the patch!

--
Soumyadeep

Attachments:

v2-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patchapplication/octet-stream; name=v2-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patchDownload
From 31e27a06c265abcaf8bc991b691194fc6b7f90a0 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Mon, 28 Oct 2019 22:25:20 -0700
Subject: [PATCH v2] Rely on llvmjit_types for building EvalFunc calls

Rely on llvmjit_types's referenced_functions for building calls to EvalFuncs.
This ensures signature checks for the calls constructed.
---
 src/backend/jit/llvm/llvmjit.c       |  84 +++++++--------
 src/backend/jit/llvm/llvmjit_expr.c  | 146 +++++++++++++++++----------
 src/backend/jit/llvm/llvmjit_types.c |  31 +++++-
 src/include/jit/llvmjit.h            |   3 +-
 4 files changed, 166 insertions(+), 98 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index cdac7b1c5d..be86ef2054 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -49,6 +49,7 @@ typedef struct LLVMJitHandle
 	LLVMOrcModuleHandle orc_handle;
 } LLVMJitHandle;
 
+LLVMModuleRef LLVMJITPGTypesModule = NULL;
 
 /* types & functions commonly needed for JITing */
 LLVMTypeRef TypeSizeT;
@@ -866,7 +867,6 @@ llvm_create_types(void)
 	char		path[MAXPGPATH];
 	LLVMMemoryBufferRef buf;
 	char	   *msg;
-	LLVMModuleRef mod = NULL;
 
 	snprintf(path, MAXPGPATH, "%s/%s", pkglib_path, "llvmjit_types.bc");
 
@@ -878,7 +878,7 @@ llvm_create_types(void)
 	}
 
 	/* eagerly load contents, going to need it all */
-	if (LLVMParseBitcode2(buf, &mod))
+	if (LLVMParseBitcode2(buf, &LLVMJITPGTypesModule))
 	{
 		elog(ERROR, "LLVMParseBitcode2 of %s failed", path);
 	}
@@ -888,46 +888,46 @@ llvm_create_types(void)
 	 * Load triple & layout from clang emitted file so we're guaranteed to be
 	 * compatible.
 	 */
-	llvm_triple = pstrdup(LLVMGetTarget(mod));
-	llvm_layout = pstrdup(LLVMGetDataLayoutStr(mod));
-
-	TypeSizeT = load_type(mod, "TypeSizeT");
-	TypeParamBool = load_return_type(mod, "FunctionReturningBool");
-	TypeStorageBool = load_type(mod, "TypeStorageBool");
-	TypePGFunction = load_type(mod, "TypePGFunction");
-	TypeExecEvalParamCallback = load_type(mod, "TypeExecEvalParamCallback");
-	StructNode = load_type(mod, "StructNode");
-	StructNullableDatum = load_type(mod, "StructNullableDatum");
-	StructExprContext = load_type(mod, "StructExprContext");
-	StructExprEvalStep = load_type(mod, "StructExprEvalStep");
-	StructExprEvalStepParamCallback = load_type(mod, "StructExprEvalStepParamCallback");
-	StructExprState = load_type(mod, "StructExprState");
-	StructFmgrInfo = load_type(mod, "StructFmgrInfo");
-	StructFunctionCallInfoData = load_type(mod, "StructFunctionCallInfoData");
-	StructMemoryContextData = load_type(mod, "StructMemoryContextData");
-	StructTupleTableSlot = load_type(mod, "StructTupleTableSlot");
-	StructHeapTupleTableSlot = load_type(mod, "StructHeapTupleTableSlot");
-	StructMinimalTupleTableSlot = load_type(mod, "StructMinimalTupleTableSlot");
-	StructHeapTupleData = load_type(mod, "StructHeapTupleData");
-	StructTupleDescData = load_type(mod, "StructTupleDescData");
-	StructAggState = load_type(mod, "StructAggState");
-	StructAggStatePerCallContext = load_type(mod, "StructAggStatePerCallContext");
-	StructAggStatePerGroupData = load_type(mod, "StructAggStatePerGroupData");
-	StructAggStatePerTransData = load_type(mod, "StructAggStatePerTransData");
-
-	AttributeTemplate = LLVMGetNamedFunction(mod, "AttributeTemplate");
-	FuncStrlen = LLVMGetNamedFunction(mod, "strlen");
-	FuncVarsizeAny = LLVMGetNamedFunction(mod, "varsize_any");
-	FuncSlotGetsomeattrsInt = LLVMGetNamedFunction(mod, "slot_getsomeattrs_int");
-	FuncSlotGetmissingattrs = LLVMGetNamedFunction(mod, "slot_getmissingattrs");
-	FuncMakeExpandedObjectReadOnlyInternal = LLVMGetNamedFunction(mod, "MakeExpandedObjectReadOnlyInternal");
-	FuncExecEvalSubscriptingRef = LLVMGetNamedFunction(mod, "ExecEvalSubscriptingRef");
-	FuncExecEvalSysVar = LLVMGetNamedFunction(mod, "ExecEvalSysVar");
-	FuncExecAggTransReparent = LLVMGetNamedFunction(mod, "ExecAggTransReparent");
-	FuncExecAggInitGroup = LLVMGetNamedFunction(mod, "ExecAggInitGroup");
-	FuncExecEvalArrayCoerceRelabel = LLVMGetNamedFunction(mod, "ExecEvalArrayCoerceRelabel");
-	FuncExecEvalArrayCoerceUnpack = LLVMGetNamedFunction(mod, "ExecEvalArrayCoerceUnpack");
-	FuncExecEvalArrayCoercePack = LLVMGetNamedFunction(mod, "ExecEvalArrayCoercePack");
+	llvm_triple = pstrdup(LLVMGetTarget(LLVMJITPGTypesModule));
+	llvm_layout = pstrdup(LLVMGetDataLayoutStr(LLVMJITPGTypesModule));
+
+	TypeSizeT = load_type(LLVMJITPGTypesModule, "TypeSizeT");
+	TypeParamBool = load_return_type(LLVMJITPGTypesModule, "FunctionReturningBool");
+	TypeStorageBool = load_type(LLVMJITPGTypesModule, "TypeStorageBool");
+	TypePGFunction = load_type(LLVMJITPGTypesModule, "TypePGFunction");
+	TypeExecEvalParamCallback = load_type(LLVMJITPGTypesModule, "TypeExecEvalParamCallback");
+	StructNode = load_type(LLVMJITPGTypesModule, "StructNode");
+	StructNullableDatum = load_type(LLVMJITPGTypesModule, "StructNullableDatum");
+	StructExprContext = load_type(LLVMJITPGTypesModule, "StructExprContext");
+	StructExprEvalStep = load_type(LLVMJITPGTypesModule, "StructExprEvalStep");
+	StructExprEvalStepParamCallback = load_type(LLVMJITPGTypesModule, "StructExprEvalStepParamCallback");
+	StructExprState = load_type(LLVMJITPGTypesModule, "StructExprState");
+	StructFmgrInfo = load_type(LLVMJITPGTypesModule, "StructFmgrInfo");
+	StructFunctionCallInfoData = load_type(LLVMJITPGTypesModule, "StructFunctionCallInfoData");
+	StructMemoryContextData = load_type(LLVMJITPGTypesModule, "StructMemoryContextData");
+	StructTupleTableSlot = load_type(LLVMJITPGTypesModule, "StructTupleTableSlot");
+	StructHeapTupleTableSlot = load_type(LLVMJITPGTypesModule, "StructHeapTupleTableSlot");
+	StructMinimalTupleTableSlot = load_type(LLVMJITPGTypesModule, "StructMinimalTupleTableSlot");
+	StructHeapTupleData = load_type(LLVMJITPGTypesModule, "StructHeapTupleData");
+	StructTupleDescData = load_type(LLVMJITPGTypesModule, "StructTupleDescData");
+	StructAggState = load_type(LLVMJITPGTypesModule, "StructAggState");
+	StructAggStatePerCallContext = load_type(LLVMJITPGTypesModule, "StructAggStatePerCallContext");
+	StructAggStatePerGroupData = load_type(LLVMJITPGTypesModule, "StructAggStatePerGroupData");
+	StructAggStatePerTransData = load_type(LLVMJITPGTypesModule, "StructAggStatePerTransData");
+
+	AttributeTemplate = LLVMGetNamedFunction(LLVMJITPGTypesModule, "AttributeTemplate");
+	FuncStrlen = LLVMGetNamedFunction(LLVMJITPGTypesModule, "strlen");
+	FuncVarsizeAny = LLVMGetNamedFunction(LLVMJITPGTypesModule, "varsize_any");
+	FuncSlotGetsomeattrsInt = LLVMGetNamedFunction(LLVMJITPGTypesModule, "slot_getsomeattrs_int");
+	FuncSlotGetmissingattrs = LLVMGetNamedFunction(LLVMJITPGTypesModule, "slot_getmissingattrs");
+	FuncMakeExpandedObjectReadOnlyInternal = LLVMGetNamedFunction(LLVMJITPGTypesModule, "MakeExpandedObjectReadOnlyInternal");
+	FuncExecEvalSubscriptingRef = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSubscriptingRef");
+	FuncExecEvalSysVar = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSysVar");
+	FuncExecAggTransReparent = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecAggTransReparent");
+	FuncExecAggInitGroup = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecAggInitGroup");
+	FuncExecEvalArrayCoerceRelabel = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalArrayCoerceRelabel");
+	FuncExecEvalArrayCoerceUnpack = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalArrayCoerceUnpack");
+	FuncExecEvalArrayCoercePack = LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalArrayCoercePack");
 
 	/*
 	 * Leave the module alive, otherwise references to function would be
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index bbe0fd0cb3..d7e528f032 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -131,12 +131,12 @@ static LLVMValueRef BuildV1CallFC(ExprCompileState *ecs,
 								  LLVMValueRef v_fcinfo,
 								  LLVMValueRef *v_fcinfo_isnull);
 
-#define build_EvalXFunc(ecs, funcname, opno, ...) \
-	build_EvalXFuncInt(ecs, funcname, opno, \
+#define build_EvalXFunc(ecs, v_fn, opno, ...) \
+	build_EvalXFuncInt(ecs, v_fn, opno, \
 					   lengthof(((LLVMValueRef[]){__VA_ARGS__})), \
 					   ((LLVMValueRef[]){__VA_ARGS__}))
 
-static void build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname,
+static void build_EvalXFuncInt(ExprCompileState *ecs, LLVMValueRef v_fn,
 							   int opno, int natts, LLVMValueRef v_args[]);
 
 /*
@@ -561,7 +561,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_WHOLEROW:
-				build_EvalXFunc(&ecs, "ExecEvalWholeRowVar", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalWholeRowVar"),
+								opno,
 								ecs.v_econtext,	v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -798,7 +800,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_FUNCEXPR_FUSAGE:
-				build_EvalXFunc(&ecs, "ExecEvalFuncExprFusage", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "EvalFuncExprFusage"),
+								opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.func.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -806,7 +810,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 
 
 			case EEOP_FUNCEXPR_STRICT_FUSAGE:
-				build_EvalXFunc(&ecs, "ExecEvalFuncExprStrictFusage", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalFuncExprStrictFusage"),
+								opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.func.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -1190,13 +1196,17 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_NULLTEST_ROWISNULL:
-				build_EvalXFunc(&ecs, "ExecEvalRowNull", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalRowNull"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_NULLTEST_ROWISNOTNULL:
-				build_EvalXFunc(&ecs, "ExecEvalRowNotNull", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalRowNotNull"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1269,13 +1279,17 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_PARAM_EXEC:
-				build_EvalXFunc(&ecs, "ExecEvalParamExec", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalParamExec"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_PARAM_EXTERN:
-				build_EvalXFunc(&ecs, "ExecEvalParamExtern", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalParamExtern"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1322,21 +1336,27 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SBSREF_OLD:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefOld", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSubscriptingRefOld"),
+								opno,
 								v_resultp,
 								expr_nullable_datum_ref(&ecs, op->d.sbsref.prev));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SBSREF_ASSIGN:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefAssign", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSubscriptingRefAssign"),
+									opno,
 								v_resultp,
 								expr_nullable_datum_ref(&ecs, op->d.sbsref.replace));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SBSREF_FETCH:
-				build_EvalXFunc(&ecs, "ExecEvalSubscriptingRefFetch", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSubscriptingRefFetch"),
+								opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1755,24 +1775,32 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SQLVALUEFUNCTION:
-				build_EvalXFunc(&ecs, "ExecEvalSQLValueFunction", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSQLValueFunction"),
+								opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_CURRENTOFEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalCurrentOfExpr", opno);
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalCurrentOfExpr"),
+								opno);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_NEXTVALUEEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalNextValueExpr", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalNextValueExpr"),
+								opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_ARRAYEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalArrayExpr", opno, v_resultp,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalArrayExpr"),
+								opno, v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.arrayexpr.elements));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -1842,7 +1870,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_ROW:
-				build_EvalXFunc(&ecs, "ExecEvalRow", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalRow"),
+								opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.row.elements));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -2003,7 +2033,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_MINMAX:
-				build_EvalXFunc(&ecs, "ExecEvalMinMax", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalMinMax"),
+								opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.minmax.arguments),
 								expr_fcinfo_ref(&ecs, op->d.minmax.fcinfo_data, NULL));
@@ -2011,20 +2043,26 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				break;
 
 			case EEOP_FIELDSELECT:
-				build_EvalXFunc(&ecs, "ExecEvalFieldSelect", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalFieldSelect"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_FIELDSTORE_DEFORM:
-				build_EvalXFunc(&ecs, "ExecEvalFieldStoreDeForm", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalFieldStoreDeForm"),
+								opno,
 								ecs.v_econtext, v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.fieldstore.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_FIELDSTORE_FORM:
-				build_EvalXFunc(&ecs, "ExecEvalFieldStoreForm", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalFieldStoreForm"),
+								opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.fieldstore.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
@@ -2101,32 +2139,42 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_DOMAIN_NOTNULL:
-				build_EvalXFunc(&ecs, "ExecEvalConstraintNotNull", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalConstraintNotNull"),
+								opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_DOMAIN_CHECK:
-				build_EvalXFunc(&ecs, "ExecEvalConstraintCheck", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalConstraintCheck"),
+								opno,
 								expr_nullable_datum_ref(&ecs, op->d.domaincheck.check));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_CONVERT_ROWTYPE:
-				build_EvalXFunc(&ecs, "ExecEvalConvertRowtype", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalConvertRowtype"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_SCALARARRAYOP:
-				build_EvalXFunc(&ecs, "ExecEvalScalarArrayOp", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalScalarArrayOp"),
+								opno,
 								v_resultp,
 								expr_fcinfo_ref(&ecs, op->d.scalararrayop.fcinfo_data, NULL));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_XMLEXPR:
-				build_EvalXFunc(&ecs, "ExecEvalXmlExpr", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalXmlExpr"),
+								opno,
 								v_resultp,
 								expr_nullable_datum_array_ref(&ecs, op->d.xmlexpr.args),
 								expr_nullable_datum_array_ref(&ecs, op->d.xmlexpr.named_args));
@@ -2162,7 +2210,9 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_GROUPING_FUNC:
-				build_EvalXFunc(&ecs, "ExecEvalGroupingFunc", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalGroupingFunc"),
+								opno,
 								v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -2196,13 +2246,17 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_SUBPLAN:
-				build_EvalXFunc(&ecs, "ExecEvalSubPlan", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalSubPlan"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_ALTERNATIVE_SUBPLAN:
-				build_EvalXFunc(&ecs, "ExecEvalAlternativeSubPlan", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalAlternativeSubPlan"),
+								opno,
 								ecs.v_econtext, v_resultp);
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -2551,13 +2605,17 @@ llvm_compile_expr(ExprState *state, ExprStateBuilder *esb)
 				}
 
 			case EEOP_AGG_ORDERED_TRANS_DATUM:
-				build_EvalXFunc(&ecs, "ExecEvalAggOrderedTransDatum", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalAggOrderedTransDatum"),
+								opno,
 								expr_nullable_datum_array_ref(&ecs, op->d.agg_trans_ordered.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
 
 			case EEOP_AGG_ORDERED_TRANS_TUPLE:
-				build_EvalXFunc(&ecs, "ExecEvalAggOrderedTransTuple", opno,
+				build_EvalXFunc(&ecs,
+								LLVMGetNamedFunction(LLVMJITPGTypesModule, "ExecEvalAggOrderedTransTuple"),
+								opno,
 								expr_nullable_datum_array_ref(&ecs, op->d.agg_trans_ordered.columns));
 				LLVMBuildBr(b, opblocks[opno + 1]);
 				break;
@@ -3115,37 +3173,17 @@ BuildV1CallFC(ExprCompileState *ecs,
  * Implement an expression step by calling the function funcname.
  */
 static void
-build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname, int opno,
+build_EvalXFuncInt(ExprCompileState *ecs, LLVMValueRef v_fn, int opno,
 				   int nargs, LLVMValueRef v_args[])
 {
 	LLVMValueRef v_opno = l_int32_const(opno);
 	LLVMValueRef v_opp;
-	LLVMTypeRef sig;
-	LLVMValueRef v_fn;
 	LLVMValueRef *params;
-	LLVMTypeRef *param_types;
 
 	params = palloc(sizeof(LLVMValueRef) * (2 + nargs));
-	param_types = palloc(sizeof(LLVMValueRef) * (2 + nargs));
 
 	v_opp = LLVMBuildGEP(ecs->b, ecs->v_steps,
 						 (LLVMValueRef[]){l_int32_const(0), v_opno}, 2, "");
-	v_fn = LLVMGetNamedFunction(ecs->mod, funcname);
-	if (!v_fn)
-	{
-		int argno = 0;
-
-		param_types[argno++] = l_ptr(StructExprState);
-		param_types[argno++] = l_ptr(StructExprEvalStep);
-
-		for (int i = 0; i < nargs; i++)
-			param_types[argno++] = LLVMTypeOf(v_args[i]);
-
-		sig = LLVMFunctionType(LLVMVoidType(),
-							   param_types, argno,
-							   false);
-		v_fn = LLVMAddFunction(ecs->mod, funcname, sig);
-	}
 
 	{
 		int argno = 0;
@@ -3156,6 +3194,6 @@ build_EvalXFuncInt(ExprCompileState *ecs, const char *funcname, int opno,
 		for (int i = 0; i < nargs; i++)
 			params[argno++] = v_args[i];
 
-		LLVMBuildCall(ecs->b, v_fn, params, argno, "");
+		LLVMBuildCall(ecs->b, llvm_get_decl(ecs->mod, v_fn), params, argno, "");
 	}
 }
diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c
index 082b312060..d414e60b99 100644
--- a/src/backend/jit/llvm/llvmjit_types.c
+++ b/src/backend/jit/llvm/llvmjit_types.c
@@ -114,5 +114,34 @@ void	   *referenced_functions[] =
 	ExecAggInitGroup,
 	ExecEvalArrayCoerceRelabel,
 	ExecEvalArrayCoerceUnpack,
-	ExecEvalArrayCoercePack
+	ExecEvalArrayCoercePack,
+	ExecEvalFuncExprFusage,
+	ExecEvalFuncExprStrictFusage,
+	ExecEvalRowNull,
+	ExecEvalRowNotNull,
+	ExecEvalParamExec,
+	ExecEvalParamExtern,
+	ExecEvalSubscriptingRefOld,
+	ExecEvalSubscriptingRefAssign,
+	ExecEvalSubscriptingRefFetch,
+	ExecEvalSQLValueFunction,
+	ExecEvalCurrentOfExpr,
+	ExecEvalNextValueExpr,
+	ExecEvalArrayExpr,
+	ExecEvalRow,
+	ExecEvalMinMax,
+	ExecEvalFieldSelect,
+	ExecEvalFieldStoreDeForm,
+	ExecEvalFieldStoreForm,
+	ExecEvalConstraintNotNull,
+	ExecEvalConstraintCheck,
+	ExecEvalConvertRowtype,
+	ExecEvalScalarArrayOp,
+	ExecEvalXmlExpr,
+	ExecEvalGroupingFunc,
+	ExecEvalSubPlan,
+	ExecEvalAlternativeSubPlan,
+	ExecEvalAggOrderedTransDatum,
+	ExecEvalAggOrderedTransTuple,
+	ExecEvalWholeRowVar
 };
diff --git a/src/include/jit/llvmjit.h b/src/include/jit/llvmjit.h
index 751bb69e25..2fb25eb899 100644
--- a/src/include/jit/llvmjit.h
+++ b/src/include/jit/llvmjit.h
@@ -57,6 +57,8 @@ typedef struct LLVMJitContext
 	List	   *handles;
 } LLVMJitContext;
 
+extern LLVMModuleRef LLVMJITPGTypesModule;
+
 
 /* type and struct definitions */
 extern LLVMTypeRef TypeParamBool;
@@ -98,7 +100,6 @@ extern LLVMValueRef FuncExecEvalArrayCoerceRelabel;
 extern LLVMValueRef FuncExecEvalArrayCoerceUnpack;
 extern LLVMValueRef FuncExecEvalArrayCoercePack;
 
-
 extern void llvm_enter_fatal_on_oom(void);
 extern void llvm_leave_fatal_on_oom(void);
 extern void llvm_reset_after_error(void);
-- 
2.23.0

v2-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchapplication/octet-stream; name=v2-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchDownload
From 1f7b9b9a448521deb0d28b07ed77bb9ea7b842cd Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Sun, 27 Oct 2019 17:42:53 -0700
Subject: [PATCH v2] Resolve PL handler names for JITed code instead of using
 const pointers

Using const pointers to PL handler functions prevents optimization
opportunities in JITed code. Now fmgr_symbol() resolves PL function
references to the corresponding language's handler.
llvm_function_reference() now no longer needs to create the global to
such a function.

Discussion: https://postgr.es/m/20191024224303.jvdx3hq3ak2vbit3%40alap3.anarazel.de:wq
---
 src/backend/jit/llvm/llvmjit.c | 31 +++++--------------------------
 src/backend/utils/fmgr/fmgr.c  | 30 +++++++++++++++++++++++-------
 2 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 82c4afb701..9da71a2da7 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -369,38 +369,17 @@ llvm_function_reference(LLVMJitContext *context,
 
 	fmgr_symbol(fcinfo->flinfo->fn_oid, &modname, &basename);
 
-	if (modname != NULL && basename != NULL)
+	Assert(basename != NULL);
+
+	if (modname != NULL)
 	{
 		/* external function in loadable library */
 		funcname = psprintf("pgextern.%s.%s", modname, basename);
 	}
-	else if (basename != NULL)
-	{
-		/* internal function */
-		funcname = psprintf("%s", basename);
-	}
 	else
 	{
-		/*
-		 * Function we don't know to handle, return pointer. We do so by
-		 * creating a global constant containing a pointer to the function.
-		 * Makes IR more readable.
-		 */
-		LLVMValueRef v_fn_addr;
-
-		funcname = psprintf("pgoidextern.%u",
-							fcinfo->flinfo->fn_oid);
-		v_fn = LLVMGetNamedGlobal(mod, funcname);
-		if (v_fn != 0)
-			return LLVMBuildLoad(builder, v_fn, "");
-
-		v_fn_addr = l_ptr_const(fcinfo->flinfo->fn_addr, TypePGFunction);
-
-		v_fn = LLVMAddGlobal(mod, TypePGFunction, funcname);
-		LLVMSetInitializer(v_fn, v_fn_addr);
-		LLVMSetGlobalConstant(v_fn, true);
-
-		return LLVMBuildLoad(builder, v_fn, "");
+		/* internal function or a PL handler */
+		funcname = psprintf("%s", basename);
 	}
 
 	/* check if function already has been added */
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 099ebd779b..71398bb3c1 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -265,11 +265,9 @@ fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
 /*
  * Return module and C function name providing implementation of functionId.
  *
- * If *mod == NULL and *fn == NULL, no C symbol is known to implement
- * function.
- *
  * If *mod == NULL and *fn != NULL, the function is implemented by a symbol in
- * the main binary.
+ * the main binary. If the function being looked up is not a C language
+ * function, it's language handler name is returned.
  *
  * If *mod != NULL and *fn !=NULL the function is implemented in an extension
  * shared object.
@@ -285,6 +283,11 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 	bool		isnull;
 	Datum		prosrcattr;
 	Datum		probinattr;
+	Oid			language;
+	HeapTuple	languageTuple;
+	Form_pg_language languageStruct;
+	HeapTuple	plHandlerProcedureTuple;
+	Form_pg_proc plHandlerProcedureStruct;
 
 	/* Otherwise we need the pg_proc entry */
 	procedureTuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(functionId));
@@ -304,8 +307,9 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 		return;
 	}
 
+	language = procedureStruct->prolang;
 	/* see fmgr_info_cxt_security for the individual cases */
-	switch (procedureStruct->prolang)
+	switch (language)
 	{
 		case INTERNALlanguageId:
 			prosrcattr = SysCacheGetAttr(PROCOID, procedureTuple,
@@ -342,9 +346,21 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 			break;
 
 		default:
+			languageTuple = SearchSysCache1(LANGOID,
+											 ObjectIdGetDatum(language));
+			if (!HeapTupleIsValid(languageTuple))
+				elog(ERROR, "cache lookup failed for language %u", language);
+			languageStruct = (Form_pg_language) GETSTRUCT(languageTuple);
+			plHandlerProcedureTuple = SearchSysCache1(PROCOID,
+													  ObjectIdGetDatum(
+														  languageStruct->lanplcallfoid));
+			if (!HeapTupleIsValid(plHandlerProcedureTuple))
+				elog(ERROR, "cache lookup failed for function %u", functionId);
+			plHandlerProcedureStruct = (Form_pg_proc) GETSTRUCT(plHandlerProcedureTuple);
 			*mod = NULL;
-			*fn = NULL;			/* unknown, pass pointer */
-			break;
+			*fn = pstrdup(NameStr(plHandlerProcedureStruct->proname));
+			ReleaseSysCache(languageTuple);
+			ReleaseSysCache(plHandlerProcedureTuple);
 	}
 
 	ReleaseSysCache(procedureTuple);
-- 
2.23.0

point_failure.diffsapplication/octet-stream; name=point_failure.diffsDownload
diff -U3 /Users/pivotal/workspace/postgres/src/test/regress/expected/point.out /Users/pivotal/workspace/postgres/src/test/regress/results/point.out
--- /Users/pivotal/workspace/postgres/src/test/regress/expected/point.out	2019-09-30 20:23:03.000000000 -0700
+++ /Users/pivotal/workspace/postgres/src/test/regress/results/point.out	2019-10-28 23:14:59.000000000 -0700
@@ -158,91 +158,9 @@
 SELECT '' AS thirtysix, p1.f1 AS point1, p2.f1 AS point2, p1.f1 <-> p2.f1 AS dist
    FROM POINT_TBL p1, POINT_TBL p2
    ORDER BY dist, p1.f1[0], p2.f1[0];
- thirtysix |      point1       |      point2       |         dist         
------------+-------------------+-------------------+----------------------
-           | (-10,0)           | (-10,0)           |                    0
-           | (-5,-12)          | (-5,-12)          |                    0
-           | (-3,4)            | (-3,4)            |                    0
-           | (0,0)             | (0,0)             |                    0
-           | (1e-300,-1e-300)  | (1e-300,-1e-300)  |                    0
-           | (5.1,34.5)        | (5.1,34.5)        |                    0
-           | (10,10)           | (10,10)           |                    0
-           | (0,0)             | (1e-300,-1e-300)  | 1.4142135623731e-300
-           | (1e-300,-1e-300)  | (0,0)             | 1.4142135623731e-300
-           | (-3,4)            | (0,0)             |                    5
-           | (-3,4)            | (1e-300,-1e-300)  |                    5
-           | (0,0)             | (-3,4)            |                    5
-           | (1e-300,-1e-300)  | (-3,4)            |                    5
-           | (-10,0)           | (-3,4)            |     8.06225774829855
-           | (-3,4)            | (-10,0)           |     8.06225774829855
-           | (-10,0)           | (0,0)             |                   10
-           | (-10,0)           | (1e-300,-1e-300)  |                   10
-           | (0,0)             | (-10,0)           |                   10
-           | (1e-300,-1e-300)  | (-10,0)           |                   10
-           | (-10,0)           | (-5,-12)          |                   13
-           | (-5,-12)          | (-10,0)           |                   13
-           | (-5,-12)          | (0,0)             |                   13
-           | (-5,-12)          | (1e-300,-1e-300)  |                   13
-           | (0,0)             | (-5,-12)          |                   13
-           | (1e-300,-1e-300)  | (-5,-12)          |                   13
-           | (0,0)             | (10,10)           |      14.142135623731
-           | (1e-300,-1e-300)  | (10,10)           |      14.142135623731
-           | (10,10)           | (0,0)             |      14.142135623731
-           | (10,10)           | (1e-300,-1e-300)  |      14.142135623731
-           | (-3,4)            | (10,10)           |     14.3178210632764
-           | (10,10)           | (-3,4)            |     14.3178210632764
-           | (-5,-12)          | (-3,4)            |     16.1245154965971
-           | (-3,4)            | (-5,-12)          |     16.1245154965971
-           | (-10,0)           | (10,10)           |     22.3606797749979
-           | (10,10)           | (-10,0)           |     22.3606797749979
-           | (5.1,34.5)        | (10,10)           |     24.9851956166046
-           | (10,10)           | (5.1,34.5)        |     24.9851956166046
-           | (-5,-12)          | (10,10)           |     26.6270539113887
-           | (10,10)           | (-5,-12)          |     26.6270539113887
-           | (-3,4)            | (5.1,34.5)        |     31.5572495632937
-           | (5.1,34.5)        | (-3,4)            |     31.5572495632937
-           | (0,0)             | (5.1,34.5)        |     34.8749193547455
-           | (1e-300,-1e-300)  | (5.1,34.5)        |     34.8749193547455
-           | (5.1,34.5)        | (0,0)             |     34.8749193547455
-           | (5.1,34.5)        | (1e-300,-1e-300)  |     34.8749193547455
-           | (-10,0)           | (5.1,34.5)        |     37.6597928831267
-           | (5.1,34.5)        | (-10,0)           |     37.6597928831267
-           | (-5,-12)          | (5.1,34.5)        |     47.5842410888311
-           | (5.1,34.5)        | (-5,-12)          |     47.5842410888311
-           | (-10,0)           | (1e+300,Infinity) |             Infinity
-           | (-5,-12)          | (1e+300,Infinity) |             Infinity
-           | (-3,4)            | (1e+300,Infinity) |             Infinity
-           | (0,0)             | (1e+300,Infinity) |             Infinity
-           | (1e-300,-1e-300)  | (1e+300,Infinity) |             Infinity
-           | (5.1,34.5)        | (1e+300,Infinity) |             Infinity
-           | (10,10)           | (1e+300,Infinity) |             Infinity
-           | (1e+300,Infinity) | (-10,0)           |             Infinity
-           | (1e+300,Infinity) | (-5,-12)          |             Infinity
-           | (1e+300,Infinity) | (-3,4)            |             Infinity
-           | (1e+300,Infinity) | (0,0)             |             Infinity
-           | (1e+300,Infinity) | (1e-300,-1e-300)  |             Infinity
-           | (1e+300,Infinity) | (5.1,34.5)        |             Infinity
-           | (1e+300,Infinity) | (10,10)           |             Infinity
-           | (-10,0)           | (NaN,NaN)         |                  NaN
-           | (-5,-12)          | (NaN,NaN)         |                  NaN
-           | (-3,4)            | (NaN,NaN)         |                  NaN
-           | (0,0)             | (NaN,NaN)         |                  NaN
-           | (1e-300,-1e-300)  | (NaN,NaN)         |                  NaN
-           | (5.1,34.5)        | (NaN,NaN)         |                  NaN
-           | (10,10)           | (NaN,NaN)         |                  NaN
-           | (1e+300,Infinity) | (1e+300,Infinity) |                  NaN
-           | (1e+300,Infinity) | (NaN,NaN)         |                  NaN
-           | (NaN,NaN)         | (-10,0)           |                  NaN
-           | (NaN,NaN)         | (-5,-12)          |                  NaN
-           | (NaN,NaN)         | (-3,4)            |                  NaN
-           | (NaN,NaN)         | (0,0)             |                  NaN
-           | (NaN,NaN)         | (1e-300,-1e-300)  |                  NaN
-           | (NaN,NaN)         | (5.1,34.5)        |                  NaN
-           | (NaN,NaN)         | (10,10)           |                  NaN
-           | (NaN,NaN)         | (1e+300,Infinity) |                  NaN
-           | (NaN,NaN)         | (NaN,NaN)         |                  NaN
-(81 rows)
-
+WARNING:  failed to resolve name float_underflow_error
+WARNING:  failed to resolve name float_overflow_error
+ERROR:  failed to look up symbol "evalexpr_21_0"
 SELECT '' AS thirty, p1.f1 AS point1, p2.f1 AS point2
    FROM POINT_TBL p1, POINT_TBL p2
    WHERE (p1.f1 <-> p2.f1) > 3;

#9Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#8)
Re: WIP: expression evaluation improvements

Hi,

On 2019-10-28 23:58:11 -0700, Soumyadeep Chakraborty wrote:

Cool! I'll probably merge that into my patch (with attribution of
course).

I wonder if it'd nicer to not have separate C variables for all of
these, and instead look them up on-demand from the module loaded in
llvm_create_types(). Not sure.

Great! It is much nicer indeed. Attached version 2 with your suggested
changes.
(v2-0001-Rely-on-llvmjit_types-for-building-EvalFunc-calls.patch)
Used the same testing method as above.

I've comitted a (somewhat evolved) version of this patch. I think it
really improves the code!

My changes largely were to get rid of the LLVMGetNamedFunction() added
to each opcode implementation, to also convert the ExecEval* functions
we were calling directly, to remove the other functions in llvmjit.h,
and finally to rebase it onto master, from the patch series in this
thread.

I do wonder about adding a variadic wrapper like the one introduced here
more widely, seems like it could simplify a number of places. If we then
redirected all function calls through a common wrapper, for LLVMBuildCall,
that also validated parameter count (and perhaps types), I think it'd be
easier to develop...

Thanks!

Andres

#10Andres Freund
andres@anarazel.de
In reply to: Soumyadeep Chakraborty (#8)
Re: WIP: expression evaluation improvements

Hi,

On 2019-10-28 23:58:11 -0700, Soumyadeep Chakraborty wrote:

Sorry for not replying to that earlier. I'm not quite sure it's
actually worthwhile doing so - did you try to measure any memory / cpu
savings?

No problem, thanks for the reply! Unfortunately, I did not do anything
significant in terms of mem/cpu measurements. However, I have noticed
non-trivial differences between optimized and unoptimized .bc files
that were dumped from time to time.

Could you expand on what you mean here? Are you saying that you got
significantly better optimization results by doing function optimization
early on? That'd be surprising imo?

Greetings,

Andres Freund

#11Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Andres Freund (#10)
Re: WIP: expression evaluation improvements

Hi Andres,

I've comitted a (somewhat evolved) version of this patch. I think it
really improves the code!

Awesome! Thanks for taking it forward!

I do wonder about adding a variadic wrapper like the one introduced here
more widely, seems like it could simplify a number of places. If we then
redirected all function calls through a common wrapper, for LLVMBuildCall,
that also validated parameter count (and perhaps types), I think it'd be
easier to develop...

+1. I was wondering whether such validations should be Asserts instead of
ERRORs.

Regards,

Soumyadeep Chakraborty
Senior Software Engineer
Pivotal Greenplum
Palo Alto

On Thu, Feb 6, 2020 at 10:35 PM Andres Freund <andres@anarazel.de> wrote:

Show quoted text

Hi,

On 2019-10-28 23:58:11 -0700, Soumyadeep Chakraborty wrote:

Sorry for not replying to that earlier. I'm not quite sure it's
actually worthwhile doing so - did you try to measure any memory / cpu
savings?

No problem, thanks for the reply! Unfortunately, I did not do anything
significant in terms of mem/cpu measurements. However, I have noticed
non-trivial differences between optimized and unoptimized .bc files
that were dumped from time to time.

Could you expand on what you mean here? Are you saying that you got
significantly better optimization results by doing function optimization
early on? That'd be surprising imo?

Greetings,

Andres Freund

#12Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Andres Freund (#10)
Re: WIP: expression evaluation improvements

Hi Andres,

Could you expand on what you mean here? Are you saying that you got
significantly better optimization results by doing function optimization
early on? That'd be surprising imo?

Sorry for the ambiguity, I meant that I had observed differences in the
sizes
of the bitcode files dumped.

These are the size differences that I observed (for TPCH Q1):
Without my patch:
-rw------- 1 pivotal staff 278K Feb 9 11:59 1021.0.bc
-rw------- 1 pivotal staff 249K Feb 9 11:59 1374.0.bc
-rw------- 1 pivotal staff 249K Feb 9 11:59 1375.0.bc
With my patch:
-rw------- 1 pivotal staff 245K Feb 9 11:43 88514.0.bc
-rw------- 1 pivotal staff 245K Feb 9 11:43 88515.0.bc
-rw------- 1 pivotal staff 270K Feb 9 11:43 79323.0.bc

This means that the sizes of the module when execution encountered:

if (jit_dump_bitcode)
{
char *filename;

filename = psprintf("%u.%zu.bc",
MyProcPid,
context->module_generation);
LLVMWriteBitcodeToFile(context->module, filename);
pfree(filename);
}

were smaller with my patch applied. This means there is less memory
pressure between when the functions were built and when
llvm_compile_module() is called. I don't know if the difference is
practically
significant.

Soumyadeep

#13Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Soumyadeep Chakraborty (#12)
1 attachment(s)
Re: WIP: expression evaluation improvements

Hey Andres,

Awesome! +1. Attached 2nd version of patch rebased on master.
(v2-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patch)

Did you check whether there's any cases this fails in the tree with your
patch applied? The way I usually do that is by running the regression
tests like
PGOPTIONS='-cjit_above_cost=0' make -s -Otarget check-world

(which will take a bit longer if use an optimized LLVM build, and a
*lot* longer if you use a debug llvm build)

Great suggestion! I used:
PGOPTIONS='-c jit_above_cost=0' gmake installcheck-world
It all passed except a couple of logical decoding tests that never pass
on my machine for any tree (t/006_logical_decoding.pl and
t/010_logical_decoding_timelines.pl) and point (which seems to be failing
even
on master as of: d80be6f2f) I have attached the regression.diffs which
captures
the point failure.

I have attached the 3rd version of the patch rebased on master. I made one
slight modification to the previous patch. PL handlers, such as that of
plsh,
can be in an external library. So I account for that in modname (earlier
naively I set it to NULL). There are also some minor changes to the comments
and I have rehashed the commit message.

Apart from running the regress tests as you suggested above, I installed
plsh
and forced JIT on the following:

CREATE FUNCTION query_plsh (x int) RETURNS text
LANGUAGE plsh
AS $$
#!/bin/sh
psql -At -c "select 1"
$$;

SELECT query_plsh(5);

and I also ran plsh's make installcheck with jit_above_cost = 0. Everything
looks good. I think this is ready for another round of review. Thanks!!

Soumyadeep

Attachments:

v3-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchapplication/octet-stream; name=v3-0001-Resolve-PL-handler-names-for-JITed-code-instead-o.patchDownload
From a90c150c7c3e9dc42902e6e42ec91f6fe2657d44 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Wed, 19 Feb 2020 16:12:39 -0800
Subject: [PATCH v3] Resolve PL handler names for JITed code instead of using
 const pointers

This patch is similar in spirit to 8c2769405f and would similarly
increase the scope for inlining and optimization.

Earlier for references to PL functions, a const pointer to the PL's
handler function was emitted. Specifically, fmgr_symbol() would return
modname=NULL, basename=NULL for such a PL function reference. Then in
llvm_function_reference() a const pointer to fcinfo->finfo->fn_addr
(which would always be the PL handler's address for a PL function
reference) would be emitted.

Now, instead, look up the qualified PL handler's library and C function
name in fmgr_symbol(), so that llvm_resolve_symbol() can resolve the PL
handler's address.
---
 src/backend/jit/llvm/llvmjit.c | 29 +++-------------------
 src/backend/utils/fmgr/fmgr.c  | 45 ++++++++++++++++++++++++++++------
 2 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index af8b34aaaf..8c547c9090 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -363,39 +363,18 @@ llvm_function_reference(LLVMJitContext *context,
 
 	fmgr_symbol(fcinfo->flinfo->fn_oid, &modname, &basename);
 
-	if (modname != NULL && basename != NULL)
+	Assert(basename != NULL);
+
+	if (modname != NULL)
 	{
 		/* external function in loadable library */
 		funcname = psprintf("pgextern.%s.%s", modname, basename);
 	}
-	else if (basename != NULL)
+	else
 	{
 		/* internal function */
 		funcname = psprintf("%s", basename);
 	}
-	else
-	{
-		/*
-		 * Function we don't know to handle, return pointer. We do so by
-		 * creating a global constant containing a pointer to the function.
-		 * Makes IR more readable.
-		 */
-		LLVMValueRef v_fn_addr;
-
-		funcname = psprintf("pgoidextern.%u",
-							fcinfo->flinfo->fn_oid);
-		v_fn = LLVMGetNamedGlobal(mod, funcname);
-		if (v_fn != 0)
-			return LLVMBuildLoad(builder, v_fn, "");
-
-		v_fn_addr = l_ptr_const(fcinfo->flinfo->fn_addr, TypePGFunction);
-
-		v_fn = LLVMAddGlobal(mod, TypePGFunction, funcname);
-		LLVMSetInitializer(v_fn, v_fn_addr);
-		LLVMSetGlobalConstant(v_fn, true);
-
-		return LLVMBuildLoad(builder, v_fn, "");
-	}
 
 	/* check if function already has been added */
 	v_fn = LLVMGetNamedFunction(mod, funcname);
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 2b4226d3a8..fd967346d0 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -265,15 +265,15 @@ fmgr_info_cxt_security(Oid functionId, FmgrInfo *finfo, MemoryContext mcxt,
 /*
  * Return module and C function name providing implementation of functionId.
  *
- * If *mod == NULL and *fn == NULL, no C symbol is known to implement
- * function.
- *
  * If *mod == NULL and *fn != NULL, the function is implemented by a symbol in
  * the main binary.
  *
  * If *mod != NULL and *fn !=NULL the function is implemented in an extension
  * shared object.
  *
+ * If functionId references a PL function, mod and fn will point to the PL
+ * handler's shared object and function name.
+ *
  * The returned module and function names are pstrdup'ed into the current
  * memory context.
  */
@@ -285,6 +285,11 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 	bool		isnull;
 	Datum		prosrcattr;
 	Datum		probinattr;
+	Datum		pronameattr;
+	Oid			language;
+	HeapTuple	languageTuple;
+	Form_pg_language languageStruct;
+	HeapTuple	plhandlerTuple;
 
 	/* Otherwise we need the pg_proc entry */
 	procedureTuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(functionId));
@@ -304,8 +309,9 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 		return;
 	}
 
+	language = procedureStruct->prolang;
 	/* see fmgr_info_cxt_security for the individual cases */
-	switch (procedureStruct->prolang)
+	switch (language)
 	{
 		case INTERNALlanguageId:
 			prosrcattr = SysCacheGetAttr(PROCOID, procedureTuple,
@@ -342,9 +348,34 @@ fmgr_symbol(Oid functionId, char **mod, char **fn)
 			break;
 
 		default:
-			*mod = NULL;
-			*fn = NULL;			/* unknown, pass pointer */
-			break;
+			/*
+			 * We referenced a PL function. Return PL handler's module and C
+			 * function name.
+			 */
+			languageTuple = SearchSysCache1(LANGOID,
+											ObjectIdGetDatum(language));
+			if (!HeapTupleIsValid(languageTuple))
+				elog(ERROR, "cache lookup failed for language %u", language);
+			languageStruct = (Form_pg_language) GETSTRUCT(languageTuple);
+			plhandlerTuple = SearchSysCache1(PROCOID,
+											 ObjectIdGetDatum(languageStruct->lanplcallfoid));
+			if (!HeapTupleIsValid(plhandlerTuple))
+				elog(ERROR, "cache lookup failed for function %u", languageStruct->lanplcallfoid);
+
+			probinattr = SysCacheGetAttr(PROCOID, plhandlerTuple,
+										 Anum_pg_proc_probin, &isnull);
+			if (isnull)
+				*mod = NULL;
+			else
+				*mod = TextDatumGetCString(probinattr);
+
+			pronameattr = SysCacheGetAttr(PROCOID, plhandlerTuple,
+										  Anum_pg_proc_proname, &isnull);
+			Assert(!isnull);
+			*fn = pstrdup(NameStr(*(DatumGetName(pronameattr))));
+
+			ReleaseSysCache(languageTuple);
+			ReleaseSysCache(plhandlerTuple);
 	}
 
 	ReleaseSysCache(procedureTuple);
-- 
2.24.1

#14Soumyadeep Chakraborty
sochakraborty@pivotal.io
In reply to: Soumyadeep Chakraborty (#13)
1 attachment(s)
Re: WIP: expression evaluation improvements

Hello Andres,

Attached is a patch on top of
v2-0026-WIP-expression-eval-relative-pointer-suppport.patch that eliminates
the
const pointer references to fmgrInfo in the generated code.

FmgrInfos are now allocated like the FunctionCallInfos are
(ExprBuilderAllocFunctionMgrInfo()) and are initialized with
expr_init_fmgri().

Unfortunately, inside expr_init_fmgri(), I had to emit const pointers to set
fn_addr, fn_extra, fn_mcxt and fn_expr.

fn_addr, fn_mcxt should always be the same const pointer value in between
two identical
calls. So this isn't too bad?

fn_extra is NULL most of the time. So not too bad?

fn_expr is very difficult to eliminate because it is allocated way earlier.
Is
it something that will have a const pointer value in between two identical
calls? (don't know enough about plan caching..I ran the same query twice
and it
seemed to have different pointer values). Eliminating this pointer poses
a similar challenge to that of FunctionCallInfo->context. fn_expr is
allocated
quite early on. I had tried writing ExprBuilderAllocNode() to handle the
context
field. The trouble with writing something like expr_init_node() or something
even more specific like expr_init_percall() (for the percall context for
aggs)
as these structs have lots of pointer references to further pointers and so
on
-> so eventually we would have to emit some const pointers.
One naive way to handle this problem may be to emit a call to the _copy*()
functions inside expr_init_node(). It wouldn't be as performant though.

We could decide to live with the const pointers even if our cache key would
be
the generated code. The caching layer could be made smart enough to ignore
such
pointer references OR we could feed the caching layer with generated code
that
has been passed through a custom pass that normalizes all const pointer
values
to some predetermined / sentinel value. To help the custom pass we could
emit
some metadata when we generate a const pointer (that we know won't have the
same
const pointer value) to tell the pass to ignore it.

Soumyadeep

Attachments:

v1-0001-jit-Eliminate-const-pointer-to-fmgrinfo.patchapplication/octet-stream; name=v1-0001-jit-Eliminate-const-pointer-to-fmgrinfo.patchDownload
From dcee28d76d28a396cc6e925af61690afecd9c516 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <sochakraborty@pivotal.io>
Date: Tue, 3 Mar 2020 09:25:52 -0800
Subject: [PATCH v1 1/1] jit: Eliminate const pointer to fmgrinfo

---
 src/backend/executor/execExpr.c     | 96 ++++++++++++++++++++++-------
 src/backend/jit/llvm/llvmjit_expr.c | 84 +++++++++++++++++++++----
 src/include/executor/execExpr.h     |  7 +++
 src/include/jit/llvmjit.h           |  6 +-
 4 files changed, 156 insertions(+), 37 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 4e10f9aea8..a1ac44133c 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -149,8 +149,23 @@ ExprBuilderAllocNullableDatumArrayInit(ExprStateBuilder *esb, int nelems, Nullab
 	return (RelNullableDatumArray){alloc->ptr};
 }
 
+static RelFunctionMgrInfo
+ExprBuilderAllocFunctionMgrInfo(ExprStateBuilder *esb, FmgrInfo **ptr)
+{
+	Size sz = sizeof(FmgrInfo);
+	ExprStateAllocation *alloc;
+
+	alloc = ExprBuilderAllocInternal(esb, ERP_FUNCTIONMGRINFO, sz);
+
+	alloc->zeroed = true;
+	alloc->initial_content = palloc0(sz);
+	*ptr = alloc->initial_content;
+
+	return (RelFunctionMgrInfo){alloc->ptr};
+}
+
 static RelFunctionCallInfo
-ExprBuilderAllocFunctionCallInfo(ExprStateBuilder *esb, int nargs, FunctionCallInfo *ptr)
+ExprBuilderAllocFunctionCallInfo(ExprStateBuilder *esb, int nargs, RelFunctionMgrInfo finfoRel, FunctionCallInfo *ptr)
 {
 	ExprStateAllocation *alloc;
 	Size sz = SizeForFunctionCallInfo(nargs);
@@ -161,7 +176,7 @@ ExprBuilderAllocFunctionCallInfo(ExprStateBuilder *esb, int nargs, FunctionCallI
 	alloc->initial_content = palloc0(sz);
 	*ptr = alloc->initial_content;
 
-	return (RelFunctionCallInfo){alloc->ptr};
+	return (RelFunctionCallInfo){alloc->ptr, finfoRel.ptr};
 }
 
 static RelNullableDatum
@@ -1148,6 +1163,7 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				ScalarArrayOpExpr *opexpr = (ScalarArrayOpExpr *) node;
 				Expr	   *scalararg;
 				Expr	   *arrayarg;
+				RelFunctionMgrInfo finfo_rel;
 				FmgrInfo   *finfo;
 				RelFunctionCallInfo fcinfo_rel;
 				FunctionCallInfo fcinfo;
@@ -1167,8 +1183,11 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				InvokeFunctionExecuteHook(opexpr->opfuncid);
 
 				/* Set up the primary fmgr lookup information */
-				finfo = palloc0(sizeof(FmgrInfo));
-				fcinfo_rel = ExprBuilderAllocFunctionCallInfo(esb, 2, &fcinfo);
+				finfo_rel = ExprBuilderAllocFunctionMgrInfo(esb, &finfo);
+				fcinfo_rel = ExprBuilderAllocFunctionCallInfo(esb,
+															  2,
+															  finfo_rel,
+															  &fcinfo);
 
 				fmgr_info(opexpr->opfuncid, finfo);
 				fmgr_info_set_expr((Node *) node, finfo);
@@ -1441,8 +1460,10 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				Oid			typioparam;
 				FunctionCallInfo fcinfo_in;
 				FunctionCallInfo fcinfo_out;
-				FmgrInfo   *finfo_in = palloc0(sizeof(FmgrInfo));
-				FmgrInfo   *finfo_out = palloc0(sizeof(FmgrInfo));
+				RelFunctionMgrInfo finfo_in_rel;
+				FmgrInfo   *finfo_in;
+				RelFunctionMgrInfo finfo_out_rel;
+				FmgrInfo   *finfo_out;
 
 				/* evaluate argument into step's result area */
 				ExecInitExprRec(esb, iocoerce->arg, result);
@@ -1456,10 +1477,13 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				 * function are assumed to be executable by everyone.
 				 */
 				scratch.opcode = EEOP_IOCOERCE;
-
 				/* lookup the source type's output function */
+				finfo_out_rel = ExprBuilderAllocFunctionMgrInfo(esb, &finfo_out);
 				scratch.d.iocoerce.fcinfo_data_out =
-					ExprBuilderAllocFunctionCallInfo(esb, 1, &fcinfo_out);
+					ExprBuilderAllocFunctionCallInfo(esb,
+													 1,
+													 finfo_out_rel,
+													 &fcinfo_out);
 
 				getTypeOutputInfo(exprType((Node *) iocoerce->arg),
 								  &iofunc, &typisvarlena);
@@ -1470,8 +1494,12 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				scratch.d.iocoerce.fn_addr_out = finfo_out->fn_addr;
 
 				/* lookup the result type's input function */
+				finfo_in_rel = ExprBuilderAllocFunctionMgrInfo(esb, &finfo_in);
 				scratch.d.iocoerce.fcinfo_data_in =
-					ExprBuilderAllocFunctionCallInfo(esb, 3, &fcinfo_in);
+					ExprBuilderAllocFunctionCallInfo(esb,
+													 3,
+													 finfo_in_rel,
+													 &fcinfo_in);
 
 				getTypeInputInfo(iocoerce->resulttype,
 								 &iofunc, &typioparam);
@@ -1888,6 +1916,7 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 					Oid			lefttype;
 					Oid			righttype;
 					Oid			proc;
+					RelFunctionMgrInfo finfo_rel;
 					FmgrInfo   *finfo;
 					RelFunctionCallInfo fcinfo_rel;
 					FunctionCallInfo fcinfo;
@@ -1905,9 +1934,12 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 							 BTORDER_PROC, lefttype, righttype, opfamily);
 
 					/* Set up the primary fmgr lookup information */
-					finfo = palloc0(sizeof(FmgrInfo));
+					finfo_rel = ExprBuilderAllocFunctionMgrInfo(esb, &finfo);
 					fcinfo_rel =
-						ExprBuilderAllocFunctionCallInfo(esb, 3, &fcinfo);
+						ExprBuilderAllocFunctionCallInfo(esb,
+														 3,
+														 finfo_rel,
+														 &fcinfo);
 
 					fmgr_info(proc, finfo);
 					fmgr_info_set_expr((Node *) node, finfo);
@@ -2026,6 +2058,7 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				MinMaxExpr *minmaxexpr = (MinMaxExpr *) node;
 				int			nelems = list_length(minmaxexpr->args);
 				TypeCacheEntry *typentry;
+				RelFunctionMgrInfo finfo_rel;
 				FmgrInfo   *finfo;
 				RelFunctionCallInfo fcinfo_rel;
 				FunctionCallInfo fcinfo;
@@ -2049,9 +2082,12 @@ ExecInitExprRec(ExprStateBuilder *esb, Expr *node, RelNullableDatum result)
 				 */
 
 				/* Perform function lookup */
-				finfo = palloc0(sizeof(FmgrInfo));
+				finfo_rel = ExprBuilderAllocFunctionMgrInfo(esb, &finfo);
 				fcinfo_rel =
-					ExprBuilderAllocFunctionCallInfo(esb, 2, &fcinfo);
+					ExprBuilderAllocFunctionCallInfo(esb,
+													 2,
+													 finfo_rel,
+													 &fcinfo);
 				fmgr_info(typentry->cmp_proc, finfo);
 				fmgr_info_set_expr((Node *) node, finfo);
 				InitFunctionCallInfoData(*fcinfo, finfo, 2,
@@ -2327,6 +2363,7 @@ ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args, Oid funcid,
 {
 	int			nargs = list_length(args);
 	AclResult	aclresult;
+	RelFunctionMgrInfo flinfo_rel;
 	FmgrInfo   *flinfo;
 	RelFunctionCallInfo fcinfo_rel;
 	FunctionCallInfo fcinfo;
@@ -2354,8 +2391,11 @@ ExecInitFunc(ExprEvalStep *scratch, Expr *node, List *args, Oid funcid,
 							   FUNC_MAX_ARGS)));
 
 	/* Allocate function lookup data and parameter workspace for this call */
-	flinfo = palloc0(sizeof(FmgrInfo));
-	fcinfo_rel = ExprBuilderAllocFunctionCallInfo(esb, nargs, &fcinfo);
+	flinfo_rel = ExprBuilderAllocFunctionMgrInfo(esb, &flinfo);
+	fcinfo_rel = ExprBuilderAllocFunctionCallInfo(esb,
+												  nargs,
+												  flinfo_rel,
+												  &fcinfo);
 	scratch->d.func.fcinfo_data = fcinfo_rel;
 
 	/* Set up the primary fmgr lookup information */
@@ -3145,6 +3185,8 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 	for (int transno = 0; transno < aggstate->numtrans; transno++)
 	{
 		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		RelFunctionMgrInfo trans_finfo_rel;
+		FmgrInfo *trans_finfo;
 		RelFunctionCallInfo trans_fcinfo_rel;
 		FunctionCallInfo trans_fcinfo;
 		List	   *adjust_bailout = NIL;
@@ -3154,13 +3196,17 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		ListCell   *bail;
 		size_t		numTransArgs;
 
+		trans_finfo_rel = ExprBuilderAllocFunctionMgrInfo(&esb, &trans_finfo);
+		*trans_finfo = pertrans->transfn;
 		/* account for the current transition state */
 		numTransArgs = pertrans->numTransInputs + 1;
 		trans_fcinfo_rel =
-			ExprBuilderAllocFunctionCallInfo(&esb, numTransArgs,
+			ExprBuilderAllocFunctionCallInfo(&esb,
+											 numTransArgs,
+											 trans_finfo_rel,
 											 &trans_fcinfo);
 		InitFunctionCallInfoData(*trans_fcinfo,
-								 &pertrans->transfn,
+								 trans_finfo,
 								 numTransArgs,
 								 pertrans->aggCollation,
 								 NULL, /* will get set when calling */
@@ -3218,6 +3264,8 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			}
 			else
 			{
+				RelFunctionMgrInfo ds_finfo_rel;
+				FmgrInfo *ds_finfo;
 				FunctionCallInfo ds_fcinfo;
 				RelFunctionCallInfo ds_fcinfo_rel;
 				AggStatePerCallContext *percall;
@@ -3226,12 +3274,17 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 				percall->aggstate = aggstate;
 				percall->pertrans = pertrans;
 
+				ds_finfo_rel = ExprBuilderAllocFunctionMgrInfo(&esb, &ds_finfo);
+				*ds_finfo = pertrans->deserialfn;
+
 				ds_fcinfo_rel =
-					ExprBuilderAllocFunctionCallInfo(&esb, numTransArgs,
+					ExprBuilderAllocFunctionCallInfo(&esb,
+													 numTransArgs,
+													 ds_finfo_rel,
 													 &ds_fcinfo);
 
 				InitFunctionCallInfoData(*ds_fcinfo,
-										 &pertrans->deserialfn,
+										 ds_finfo,
 										 2,
 										 InvalidOid,
 										 (void *) percall, NULL);
@@ -3603,6 +3656,7 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 		Form_pg_attribute ratt = TupleDescAttr(rdesc, attno - 1);
 		Oid			foid = eqfunctions[natt];
 		Oid			collid = collations[natt];
+		RelFunctionMgrInfo finfo_rel;
 		FmgrInfo   *finfo;
 		RelFunctionCallInfo fcinfo_rel;
 		FunctionCallInfo fcinfo;
@@ -3616,8 +3670,8 @@ ExecBuildGroupingEqual(TupleDesc ldesc, TupleDesc rdesc,
 		InvokeFunctionExecuteHook(foid);
 
 		/* Set up the primary fmgr lookup information */
-		finfo = palloc0(sizeof(FmgrInfo));
-		fcinfo_rel = ExprBuilderAllocFunctionCallInfo(&esb, 2, &fcinfo);
+		finfo_rel = ExprBuilderAllocFunctionMgrInfo(&esb, &finfo);
+		fcinfo_rel = ExprBuilderAllocFunctionCallInfo(&esb,2, finfo_rel, &fcinfo);
 		fmgr_info(foid, finfo);
 		fmgr_info_set_expr(NULL, finfo);
 		InitFunctionCallInfoData(*fcinfo, finfo, 2,
diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index 43501a5bb6..f9af5482f8 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -110,6 +110,7 @@ typedef struct ExprCompileState
 static Datum ExecRunCompiledExpr(ExprState *state, ExprContext *econtext, bool *isNull);
 
 static void expr_emit_allocations(ExprCompileState *ecs);
+static void expr_init_fmgri(ExprCompileState *ecs, FmgrInfo *fmgrinfo, LLVMValueRef v_fmgrinfo);
 static void expr_init_fci(ExprCompileState *ecs, FunctionCallInfo fcinfo, LLVMValueRef v_fcinfo);
 static void expr_init_nullable_datums(ExprCompileState *ecs, int nelems, NullableDatum *datums, LLVMValueRef v_datums);
 
@@ -2672,6 +2673,13 @@ expr_emit_allocations(ExprCompileState *ecs)
 											  v_allocation);
 				break;
 
+			case ERP_FUNCTIONMGRINFO:
+				Assert(allocation->initial_content != NULL);
+				allocname = psprintf("alloc.%u.fmgr_info", i + 1);
+				v_allocation = LLVMBuildAlloca(ecs->b, StructFmgrInfo, allocname);
+				expr_init_fmgri(ecs, (FmgrInfo *) allocation->initial_content, v_allocation);
+				break;
+
 			case ERP_FUNCTIONCALLINFO:
 				Assert(allocation->initial_content != NULL);
 
@@ -2704,22 +2712,49 @@ expr_emit_allocations(ExprCompileState *ecs)
 	}
 }
 
+static void
+expr_init_fmgri(ExprCompileState *ecs, FmgrInfo *fmgrinfo, LLVMValueRef v_fmgrinfo)
+{
+	/*
+	 * There is no need to set fn_addr as the function will be resolved via
+	 * llvm_function_reference() followed by llvm_resolve_symbl().
+	 */
+
+	LLVMBuildStore(ecs->b,
+				   l_int32_const(fmgrinfo->fn_oid),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 1, ""));
+	LLVMBuildStore(ecs->b,
+				   l_int16_const(fmgrinfo->fn_nargs),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 2, ""));
+	LLVMBuildStore(ecs->b,
+				   l_sbool_const(fmgrinfo->fn_strict),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 3, ""));
+	LLVMBuildStore(ecs->b,
+				   l_sbool_const(fmgrinfo->fn_retset),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 4, ""));
+	LLVMBuildStore(ecs->b,
+				   l_int8_const(fmgrinfo->fn_stats),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 5, ""));
+	LLVMBuildStore(ecs->b,
+				   l_ptr_const(fmgrinfo->fn_extra, LLVMStructGetTypeAtIndex(StructFmgrInfo, 6)),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 6, "")); /* TODO */
+	LLVMBuildStore(ecs->b,
+				   l_ptr_const(fmgrinfo->fn_mcxt, l_ptr(StructMemoryContextData)),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 7, ""));
+	LLVMBuildStore(ecs->b,
+				   l_ptr_const(fmgrinfo->fn_expr, l_ptr(StructNode)),
+				   LLVMBuildStructGEP(ecs->b, v_fmgrinfo, 8, "")); /* TODO */
+}
+
 static void
 expr_init_fci(ExprCompileState *ecs, FunctionCallInfo fcinfo, LLVMValueRef v_fcinfo)
 {
 	LLVMValueRef v_args;
 
 	/*
-	 * FIXME: should be allocated as part of the ExprState, allowing this to
-	 * be referenced efficiently / without pointer constants.
+	 * We can't set fcinfo->flinfo here. It will be set inside expr_fcinfo_ref()
+	 * where we have access to the flinfo relptr.
 	 */
-	if (fcinfo->flinfo)
-		LLVMBuildStore(ecs->b,
-					   l_ptr_const(fcinfo->flinfo, l_ptr(StructFmgrInfo)),
-					   LLVMBuildStructGEP(ecs->b,
-										  v_fcinfo,
-										  FIELDNO_FUNCTIONCALLINFODATA_FLINFO,
-										  ""));
 
 	/*
 	 * FIXME: should be allocated as part of the ExprState, allowing this to
@@ -2910,13 +2945,36 @@ expr_allocation_ref(ExprCompileState *ecs, ExprRelPtr rel)
 static LLVMValueRef
 expr_fcinfo_ref(ExprCompileState *ecs, RelFunctionCallInfo relfc, FunctionCallInfo *fcinfo)
 {
-	ExprStateAllocation *alloc = list_nth(ecs->esb->allocations, relfc.ptr.allocno - 1);
+	FmgrInfo            *fmgrInfo;
+	LLVMValueRef        v_fmgrInfo;
+	LLVMValueRef        v_fcinfo;
+	ExprStateAllocation *fcinfo_alloc;
+	ExprStateAllocation *fmgr_alloc;
+
+	fcinfo_alloc = list_nth(ecs->esb->allocations, relfc.ptr.allocno - 1);
+	Assert(fcinfo_alloc->initial_content != NULL);
+
+	fmgr_alloc = list_nth(ecs->esb->allocations, relfc.fmgrInfoPtr.allocno - 1);
+	Assert(fmgr_alloc->initial_content != NULL);
 
-	Assert(alloc->initial_content != NULL);
 	if (fcinfo)
-		*fcinfo = (FunctionCallInfo) alloc->initial_content;
+	{
+		*fcinfo = (FunctionCallInfo) fcinfo_alloc->initial_content;
+		fmgrInfo = (FmgrInfo *) fmgr_alloc->initial_content;
+		(*fcinfo)->flinfo = fmgrInfo;
+	}
+	v_fmgrInfo = expr_allocation_ref(ecs, relfc.fmgrInfoPtr);
+
+	v_fcinfo = expr_allocation_ref(ecs, relfc.ptr);
+
+	LLVMBuildStore(ecs->b,
+				   v_fmgrInfo,
+				   LLVMBuildStructGEP(ecs->b,
+										  v_fcinfo,
+										  FIELDNO_FUNCTIONCALLINFODATA_FLINFO,
+										  ""));
 
-   return expr_allocation_ref(ecs, relfc.ptr);
+	return v_fcinfo;
 }
 
 static LLVMValueRef
diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index b523e839bb..a3f9e8676f 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -265,9 +265,15 @@ typedef struct RelNullableDatumArray
 	ExprRelPtr ptr;
 } RelNullableDatumArray;
 
+typedef struct RelFunctionMgrInfo
+{
+	ExprRelPtr ptr;
+} RelFunctionMgrInfo;
+
 typedef struct RelFunctionCallInfo
 {
 	ExprRelPtr ptr;
+	ExprRelPtr fmgrInfoPtr;
 } RelFunctionCallInfo;
 
 typedef struct ExprEvalStep
@@ -682,6 +688,7 @@ typedef enum ExprRelPtrKind
 	ERP_BOOL,
 	ERP_NULLABLE_DATUM,
 	ERP_NULLABLE_DATUM_ARRAY,
+	ERP_FUNCTIONMGRINFO,
 	ERP_FUNCTIONCALLINFO
 } ExprRelPtrKind;
 
diff --git a/src/include/jit/llvmjit.h b/src/include/jit/llvmjit.h
index 6250148159..ae874fca03 100644
--- a/src/include/jit/llvmjit.h
+++ b/src/include/jit/llvmjit.h
@@ -112,9 +112,9 @@ extern void llvm_split_symbol_name(const char *name, char **modname, char **func
 extern LLVMValueRef llvm_get_decl(LLVMModuleRef mod, LLVMValueRef f);
 extern void llvm_copy_attributes(LLVMValueRef from, LLVMValueRef to);
 extern LLVMValueRef llvm_function_reference(LLVMJitContext *context,
-						LLVMBuilderRef builder,
-						LLVMModuleRef mod,
-						FunctionCallInfo fcinfo);
+											LLVMBuilderRef builder,
+											LLVMModuleRef mod,
+											FunctionCallInfo fcinfo);
 
 extern void llvm_inline(LLVMModuleRef mod);
 
-- 
2.24.1

#15Daniel Gustafsson
daniel@yesql.se
In reply to: Soumyadeep Chakraborty (#14)
Re: WIP: expression evaluation improvements

On 3 Mar 2020, at 21:21, Soumyadeep Chakraborty <sochakraborty@pivotal.io> wrote:

Attached is a patch on top of
v2-0026-WIP-expression-eval-relative-pointer-suppport.patch that eliminates the
const pointer references to fmgrInfo in the generated code.

Since the CFBot patch tester isn't to apply and test a patchset divided across
multiple emails, can you please submit the full patchset for consideration such
that we can get it to run in the CI?

cheers ./daniel

#16Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#15)
Re: WIP: expression evaluation improvements

On Wed, Jul 01, 2020 at 02:50:14PM +0200, Daniel Gustafsson wrote:

Since the CFBot patch tester isn't to apply and test a patchset divided across
multiple emails, can you please submit the full patchset for consideration such
that we can get it to run in the CI?

This thread seems to have died a couple of weeks ago, so I have marked
it as RwF.
--
Michael

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: WIP: expression evaluation improvements

Andres asked me off-list for comments on 0026, so here goes.

As a general comment, I think the patches could really benefit from
more meaningful commit messages and more comments on individual
functions. It would definitely help me review, and it might help other
people review, or modify the code later. For example, I'm looking at
ExprEvalStep. If the intent here is that we don't want the union
members to point to data that might differ from one execution of the
plan to the next, it's surely important to mention that and explain to
people who are trying to add steps later what they should do instead.
But I'm also not entirely sure that's the intended rule. It kind of
surprises me that the only things that we'd be pointing to here that
would fall into that category would be a bool, a NullableDatum, a
NullableDatum array, and a FunctionCallInfo ... but I've been
surprised by a lot of things that turned out to be true.

I am not a huge fan of the various Rel[Whatever] typedefs. I am not
sure that's really adding any clarity. On the other hand I would be a
big fan of renaming the structure members in some systematic way. This
kind of thing doesn't sit well with me:

- NullableDatum *value; /* value to return */
+ RelNullableDatum value; /* value to return */

Well, if NullableDatum was the value to return, then RelNullableDatum
isn't. It's some kind of thing that lets you find the value to return.
Actually that's not really right either, because before 'value' was a
pointer to the value to return and the corresponding isnull flag, and
now it is a way of finding that stuff. I don't know exactly what to do
here to keep the comment comprehensible and not unreasonably long, but
I don't think not changing at it all is the thing. Nor do I think just
having it be called 'value' when it's clearly not the value, nor even
a pointer to the value, is as clear as I would like to be.

I wonder if ExprBuilderAllocBool ought to be using sizeof(bool) rather
than sizeof(NullableDatum).

Is it true that allocno is only used for, err, some kind of LLVM
thing, and not in the regular interpreted path? As far as I can see,
outside of the LLVM code, we only ever test whether it's 0, and don't
actually care about the specific value.

I hope that the fact that this patch reverses the order of the first
two arguments to ExecInitExprRec is only something you did to make it
so that the compiler would find places you needed to update. Because
otherwise it makes no sense to introduce a new thing called an
ExprStateBuilder in 0017, make it an argument to that function, and
then turn around and change the signature again in 0026. Anyway, a
final patch shouldn't include this kind of churn.

+ offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) +
+ state->mutable_off = offsetof(ExprState, steps) * esb->steps_len *
sizeof(ExprEvalStep);

Well, either I'm confused here, or the first * should be a + in each
case. I wonder how this works at all.

+ /* copy in step data */
+ {
+ ListCell *lc;
+ int off = 0;
+
+ foreach(lc, esb->steps)
+ {
+ memcpy(&state->steps[off], lfirst(lc), sizeof(ExprEvalStep));
+ off++;
+ }
+ }

This seems incredibly pointless to me. Why use a List in the first
place if we're going to have to flatten it using this kind of code?

I think stuff like RelFCIOff() and RelFCIIdx() and RelArrayIdx() is
just pretty much incomprehensible. Now, the executor is full of
badly-named stuff already -- ExecInitExprRec being a fine example of a
name nobody is going to understand on first reading, or maybe ever --
but we ought to try not to make things worse. I also do understand
that anything with relative pointers is bound to involve a bunch of
crappy notation that we're just going to have to accept as the price
of doing business. But it would help to pick names that are not so
heavily abbreviated. Like, if RelFCIIdx() were called
find_function_argument_in_relative_fcinfo() or even
get_fnarg_from_relfcinfo() the casual reader might have a chance of
guessing what it does. Sure, the code might be longer, but if you can
tell what it does without cross-referencing, it's still better.

I would welcome changes that make it clearer which things happen just
once and which things happen at execution time; that said, it seems
like RELPTR_RESOLVE() happens at execution time, and it surprises me a
bit that this is OK from a performance perspective. The pointers can
change from execution to execution, but not within an individual
execution, or so I think. So it doesn't need to be resolved every
time, if somehow that can be avoided. But maybe CPUs are sufficiently
well-optimized for computing a pointer address as a+b*c that it does
not matter.

I'm not sure how helpful any of these comments are, but those are my
initial thoughts.

--
Robert Haas
EDB: http://www.enterprisedb.com

#18Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#17)
Re: WIP: expression evaluation improvements

Hi,

I pushed a rebased (ugh, that was painul) version of the patches to
https://github.com/anarazel/postgres/tree/jit-relative-offsets

Besides rebasing I dropped a few patches and did some *minor* cleanup. Besides
that there's one substantial improvement, namely that I got rid of one more
absolute pointer reference (in the aggregate steps).

The main sources for pointers that remain is FunctionCallInfo->{flinfo,
context}. There's also WindowFuncExprState->wfuncno (which isn't yet known at
"expression compile time"), but that's not too hard to solve differently.

On 2021-11-04 12:30:00 -0400, Robert Haas wrote:

As a general comment, I think the patches could really benefit from
more meaningful commit messages and more comments on individual
functions. It would definitely help me review, and it might help other
people review, or modify the code later.

Sure. I was mostly exploring what would be necessary to to change expression
evaluation so that there's no absolute pointers in it. I still haven't figured
out all the necessary bits.

For example, I'm looking at ExprEvalStep. If the intent here is that we
don't want the union members to point to data that might differ from one
execution of the plan to the next, it's surely important to mention that and
explain to people who are trying to add steps later what they should do
instead. But I'm also not entirely sure that's the intended rule. It kind
of surprises me that the only things that we'd be pointing to here that
would fall into that category would be a bool, a NullableDatum, a
NullableDatum array, and a FunctionCallInfo ... but I've been surprised by a
lot of things that turned out to be true.

The immediate goal is to be able to generate JITed code/LLVM-IR that doesn't
contain any absolute pointer values. If the generated code doesn't change
regardless of any of the other contents of ExprEvalStep, we can still cache
the JIT optimization / code emission steps - which are the expensive bits.

With the exception of what I listed at the top, the types that you listed
really are what's needed to avoid such pointer constants. There are more
contents in the steps, but either they are constants (and thus just can be
embedded into the generated code), the expression step is just passed to
ExecEval*, or the data can just be loaded from the ExprStep at runtime
(although that makes the generated code slower).

There's a "more advanced" version of this where we can avoid recreating
ExprStates for e.g. prepared statements. Then we'd need to make a bit more of
the data use relative pointers. But that's likely a bit further off. A more
moderate version will be to just store the number of steps for expressions
inside the expressions - for simple queries the allocation / growing / copying
of ExprSteps is quite visible.

FWIW interpreted execution does seem to win a bit from the higher density of
memory allocations for variable data this provides.

I am not a huge fan of the various Rel[Whatever] typedefs. I am not
sure that's really adding any clarity. On the other hand I would be a
big fan of renaming the structure members in some systematic way. This
kind of thing doesn't sit well with me:

I initially had all the Rel* use the same type, and it was much more error
prone because the compiler couldn't tell that the types are different.

- NullableDatum *value; /* value to return */
+ RelNullableDatum value; /* value to return */

Well, if NullableDatum was the value to return, then RelNullableDatum
isn't.It's some kind of thing that lets you find the value to return.

I don't really know what you mean? It's essentially just a different type of
pointer?

Is it true that allocno is only used for, err, some kind of LLVM
thing, and not in the regular interpreted path? As far as I can see,
outside of the LLVM code, we only ever test whether it's 0, and don't
actually care about the specific value.

I'd expect it to be useful for a few interpreded cases as well, but right now
it's not.

I hope that the fact that this patch reverses the order of the first
two arguments to ExecInitExprRec is only something you did to make it
so that the compiler would find places you needed to update. Because
otherwise it makes no sense to introduce a new thing called an
ExprStateBuilder in 0017, make it an argument to that function, and
then turn around and change the signature again in 0026. Anyway, a
final patch shouldn't include this kind of churn.

Yes, that definitely needs to go.

+ offsetof(ExprState, steps) * esb->steps_len * sizeof(ExprEvalStep) +
+ state->mutable_off = offsetof(ExprState, steps) * esb->steps_len *
sizeof(ExprEvalStep);

Well, either I'm confused here, or the first * should be a + in each
case. I wonder how this works at all.

Oh. yes, that doesn't look right. I assume it's just always too big, and
that's why it doesn't cause problems...

+ /* copy in step data */
+ {
+ ListCell *lc;
+ int off = 0;
+
+ foreach(lc, esb->steps)
+ {
+ memcpy(&state->steps[off], lfirst(lc), sizeof(ExprEvalStep));
+ off++;
+ }
+ }

This seems incredibly pointless to me. Why use a List in the first
place if we're going to have to flatten it using this kind of code?

We don't know how many steps an expression is going to require. It turns out
that in the current code we spend a good amount of time just growing
->steps. Using a list (even if it's an array of pointers as List now is)
during building makes appending fairly cheap. Building a dense array after all
steps have been computed keeps the execution time benefit.

I think stuff like RelFCIOff() and RelFCIIdx() and RelArrayIdx() is
just pretty much incomprehensible. Now, the executor is full of
badly-named stuff already -- ExecInitExprRec being a fine example of a
name nobody is going to understand on first reading, or maybe ever --
but we ought to try not to make things worse. I also do understand
that anything with relative pointers is bound to involve a bunch of
crappy notation that we're just going to have to accept as the price
of doing business. But it would help to pick names that are not so
heavily abbreviated. Like, if RelFCIIdx() were called
find_function_argument_in_relative_fcinfo() or even
get_fnarg_from_relfcinfo() the casual reader might have a chance of
guessing what it does.

Yea, they're crappily named. If this were C++ it'd be easy to wrap the
relative pointers in something that then makes them behave like normal
pointers, but ...

Sure, the code might be longer, but if you can tell what it does without
cross-referencing, it's still better.

Unfortunately it's really hard to keep the code legible and keep pgindent
happy with long names :(. But I'm sure that we can do better than these.

I would welcome changes that make it clearer which things happen just
once and which things happen at execution time; that said, it seems
like RELPTR_RESOLVE() happens at execution time, and it surprises me a
bit that this is OK from a performance perspective.

It's actually fairly cheap, at least on x86, because every relative pointer
dereference is just an offset from one base pointer. That base address can be
kept in a register. In some initial benchmarking the gains from the higher
allocation density of the variable data is bigger than potential losses.

The pointers can change from execution to execution, but not within an
individual execution, or so I think. So it doesn't need to be resolved every
time, if somehow that can be avoided. But maybe CPUs are sufficiently
well-optimized for computing a pointer address as a+b*c that it does not
matter.

It should just be a + b, right? Well, for arrays it's more complicated, but
it's also more complicated for "normal arrays".

I'm not sure how helpful any of these comments are, but those are my
initial thoughts.

It's helpful.

The biggest issue I see with getting to the point of actually caching JITed
code is the the ->flinfo, ->context thing mentioned above. The best thing I
can come up with is moving the allocation of those into the ExprState as well,
but my gut says there must be a better approach that I'm not quite seeing.

Greetings,

Andres Freund

#19Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#18)
Re: WIP: expression evaluation improvements

On Thu, Nov 4, 2021 at 7:47 PM Andres Freund <andres@anarazel.de> wrote:

The immediate goal is to be able to generate JITed code/LLVM-IR that doesn't
contain any absolute pointer values. If the generated code doesn't change
regardless of any of the other contents of ExprEvalStep, we can still cache
the JIT optimization / code emission steps - which are the expensive bits.

I'm not sure why that requires all of this relative pointer stuff,
honestly. Under that problem statement, we don't need everything to be
one contiguous allocation. We just need it to have the same lifespan
as the JITted code. If you introduced no relative pointers at all,
you could still solve this problem: create a new memory context that
contains all of the EvalExprSteps and all of the allocations upon
which they depend, make sure everything you care about is allocated in
that context, and don't destroy any of it until you destroy it all. Or
another option would be: instead of having one giant allocation in
which we have to place data of every different type, have one
allocation per kind of thing. Figure out how many FunctionCallInfo
objects we need and make an array of them. Figure out how many
NullableDatum objects we need and make a separate array of those. And
so on. Then just use pointers.

I think that part of your motivation here is unrelated caching the JIT
results: you also want to improve performance by increasing memory
locality. That's a good goal as far as it goes, but maybe there's a
way to be a little less ambitious and still get most of the benefit.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#19)
Re: WIP: expression evaluation improvements

Hi,

On 2021-11-05 08:34:26 -0400, Robert Haas wrote:

I'm not sure why that requires all of this relative pointer stuff,
honestly. Under that problem statement, we don't need everything to be
one contiguous allocation. We just need it to have the same lifespan
as the JITted code. If you introduced no relative pointers at all,
you could still solve this problem: create a new memory context that
contains all of the EvalExprSteps and all of the allocations upon
which they depend, make sure everything you care about is allocated in
that context, and don't destroy any of it until you destroy it all.

I don't see how that works - the same expression can be evaluated multiple
times at once, recursively. So you can't have things like FunctionCallInfoData
shared. One key point of separating out the mutable data into something that
can be relocated is precisely so that every execution can have its own
"mutable" data area, without needing to change anything else.

Or another option would be: instead of having one giant allocation in which
we have to place data of every different type, have one allocation per kind
of thing. Figure out how many FunctionCallInfo objects we need and make an
array of them. Figure out how many NullableDatum objects we need and make a
separate array of those. And so on. Then just use pointers.

Without the relative pointer thing you'd still have pointers into those arrays
of objects. Which then would make the thing non-shareable.

Greetings,

Andres Freund

#21Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: WIP: expression evaluation improvements

On Fri, Nov 5, 2021 at 12:48 PM Andres Freund <andres@anarazel.de> wrote:

I don't see how that works - the same expression can be evaluated multiple
times at once, recursively. So you can't have things like FunctionCallInfoData
shared. One key point of separating out the mutable data into something that
can be relocated is precisely so that every execution can have its own
"mutable" data area, without needing to change anything else.

Oh. That makes it harder.

Or another option would be: instead of having one giant allocation in which
we have to place data of every different type, have one allocation per kind
of thing. Figure out how many FunctionCallInfo objects we need and make an
array of them. Figure out how many NullableDatum objects we need and make a
separate array of those. And so on. Then just use pointers.

Without the relative pointer thing you'd still have pointers into those arrays
of objects. Which then would make the thing non-shareable.

Well, I guess you could store indexes into the individual arrays, but
then I guess you're not gaining much of anything.

It's a pretty annoying problem, really. Somehow it's hard to shake the
feeling that there ought to be a better approach than relative
pointers.

--
Robert Haas
EDB: http://www.enterprisedb.com

#22Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#21)
Re: WIP: expression evaluation improvements

Hi,

On 2021-11-05 13:09:10 -0400, Robert Haas wrote:

On Fri, Nov 5, 2021 at 12:48 PM Andres Freund <andres@anarazel.de> wrote:

I don't see how that works - the same expression can be evaluated multiple
times at once, recursively. So you can't have things like FunctionCallInfoData
shared. One key point of separating out the mutable data into something that
can be relocated is precisely so that every execution can have its own
"mutable" data area, without needing to change anything else.

Oh. That makes it harder.

Yes. Optimally we'd do JIT caching across connections as well. One of the
biggest issues with the costs of JITing is actually parallel query, where
we'll often recreate the same JIT code again and again. For that you really
can't have much in the way of pointers...

Or another option would be: instead of having one giant allocation in which
we have to place data of every different type, have one allocation per kind
of thing. Figure out how many FunctionCallInfo objects we need and make an
array of them. Figure out how many NullableDatum objects we need and make a
separate array of those. And so on. Then just use pointers.

Without the relative pointer thing you'd still have pointers into those arrays
of objects. Which then would make the thing non-shareable.

Well, I guess you could store indexes into the individual arrays, but
then I guess you're not gaining much of anything.

You'd most likely just loose a bit of locality, because the different types of
data are now all on separate cachelines, even if referenced by the one
expression step.

It's a pretty annoying problem, really. Somehow it's hard to shake the
feeling that there ought to be a better approach than relative
pointers.

Yes. I don't like it much either :(. Basically native code has the same issue,
and also largely ended up with making most things relative (see x86-64 which
does most addressing relative to the instruction pointer, and binaries
pre-relocation, where the addresses aren't resolved yed).

Greetings,

Andres Freund

#23Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#20)
Re: WIP: expression evaluation improvements

Hi,

On 2021-11-05 09:48:16 -0700, Andres Freund wrote:

On 2021-11-05 08:34:26 -0400, Robert Haas wrote:

I'm not sure why that requires all of this relative pointer stuff,
honestly. Under that problem statement, we don't need everything to be
one contiguous allocation. We just need it to have the same lifespan
as the JITted code. If you introduced no relative pointers at all,
you could still solve this problem: create a new memory context that
contains all of the EvalExprSteps and all of the allocations upon
which they depend, make sure everything you care about is allocated in
that context, and don't destroy any of it until you destroy it all.

I don't see how that works - the same expression can be evaluated multiple
times at once, recursively. So you can't have things like FunctionCallInfoData
shared. One key point of separating out the mutable data into something that
can be relocated is precisely so that every execution can have its own
"mutable" data area, without needing to change anything else.

Oh, and the other bit is that the absolute addresses make it much harder to
generate efficient code. If I remove the code setting
FunctionCallInfo->{context,flinfo} to the constant pointers (obviously
incorrect, but works for functions not using either), E.g. TPCH-Q1 gets about
20% faster.

Greetings,

Andres Freund

#24Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#22)
Re: WIP: expression evaluation improvements

On Fri, Nov 5, 2021 at 1:20 PM Andres Freund <andres@anarazel.de> wrote:

Yes. Optimally we'd do JIT caching across connections as well. One of the
biggest issues with the costs of JITing is actually parallel query, where
we'll often recreate the same JIT code again and again. For that you really
can't have much in the way of pointers...

Well that much is clear, and parallel query also needs relative
pointers in some places for other reasons, which reminds me to ask you
whether these new relative pointers can't reuse "utils/relptr.h"
instead of inventing another way of do it. And if not maybe we should
try to first change relptr.h and the one existing client
(freepage.c/h) to something better and then use that in both places,
because if we're going to be stuck with relative pointers are all over
the place it would at least be nice not to have too many different
kinds.

It's a pretty annoying problem, really. Somehow it's hard to shake the
feeling that there ought to be a better approach than relative
pointers.

Yes. I don't like it much either :(. Basically native code has the same issue,
and also largely ended up with making most things relative (see x86-64 which
does most addressing relative to the instruction pointer, and binaries
pre-relocation, where the addresses aren't resolved yed).

Yes, but the good thing about those cases is that they're handled by
the toolchain. What's irritating about this case is that we're using a
just-in-time compiler, and yet somehow it feels like the job that
ought to be done by the compiler is having to be done by our code, and
the result is a lot of extra notation. I don't know what the
alternative is -- if you don't tell the compiler which things it's
supposed to assume are constant and which things might vary from
execution to execution, it can't know. But it feels a little weird
that there isn't some better way to give it that information.

--
Robert Haas
EDB: http://www.enterprisedb.com

#25Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#24)
Re: WIP: expression evaluation improvements

Hi,

On 2021-11-05 14:13:38 -0400, Robert Haas wrote:

On Fri, Nov 5, 2021 at 1:20 PM Andres Freund <andres@anarazel.de> wrote:

Yes. Optimally we'd do JIT caching across connections as well. One of the
biggest issues with the costs of JITing is actually parallel query, where
we'll often recreate the same JIT code again and again. For that you really
can't have much in the way of pointers...

Well that much is clear, and parallel query also needs relative
pointers in some places for other reasons, which reminds me to ask you
whether these new relative pointers can't reuse "utils/relptr.h"
instead of inventing another way of do it. And if not maybe we should
try to first change relptr.h and the one existing client
(freepage.c/h) to something better and then use that in both places,
because if we're going to be stuck with relative pointers are all over
the place it would at least be nice not to have too many different
kinds.

Hm. Yea, that's a fair point. Right now the "allocno" bit would be a
problem. Perhaps we can get around that somehow. We could search for
allocations by the offset, I guess.

It's a pretty annoying problem, really. Somehow it's hard to shake the
feeling that there ought to be a better approach than relative
pointers.

Yes. I don't like it much either :(. Basically native code has the same issue,
and also largely ended up with making most things relative (see x86-64 which
does most addressing relative to the instruction pointer, and binaries
pre-relocation, where the addresses aren't resolved yed).

Yes, but the good thing about those cases is that they're handled by
the toolchain. What's irritating about this case is that we're using a
just-in-time compiler, and yet somehow it feels like the job that
ought to be done by the compiler is having to be done by our code, and
the result is a lot of extra notation. I don't know what the
alternative is -- if you don't tell the compiler which things it's
supposed to assume are constant and which things might vary from
execution to execution, it can't know. But it feels a little weird
that there isn't some better way to give it that information.

Yes, I feel like there must be something better too. But in the end, I think
we want something like this for the non-JIT path too, so that we can avoid the
expensive re-creation of expression for every query execution. Which does make
referencing at least the mutable data only by offset fairly attractive, imo.

Greetings,

Andres Freund