Memory leak in WAL sender with pgoutput (v10~)

Started by Michael Paquierabout 1 year ago63 messages
#1Michael Paquier
michael@paquier.xyz
3 attachment(s)

Hi all,

This is a follow-up of ea792bfd93ab and this thread where I've noticed
that some memory was still leaking when running sysbench with a
logical replication setup:
/messages/by-id/Zz7SRcBXxhOYngOr@paquier.xyz

Reproducing the problem is quite simple, and can be done with the
following steps (please adapt, like previous thread):
1) Initialize a primary and set up some empty tables:
NUM_TABLES=500
PRIMARY_PORT=5432
STANDBY_PORT=5433
sysbench oltp_read_only --db-driver=pgsql \
--pgsql-port=$PRIMARY_PORT \
--pgsql-db=postgres \
--pgsql-user=postgres \
--pgsql-host=/tmp \
--tables=$NUM_TABLES --table_size=0 \
--report-interval=1 \
--threads=1 prepare
2) Create a standby, promote it:
STANDBY_DATA=/define/your/standby/path/
pg_basebackup --checkpoint=fast -D $STANDBY_DATA -p $PRIMARY_PORT -h /tmp/ -R
echo "port = $STANDBY_PORT" >> $STANDBY_DATA/postgresql.conf
pg_ctl -D $STANDBY_DATA start
pg_ctl -D $STANDBY_DATA promote
3) Publication and subscription
psql -p $PRIMARY_PORT -c "CREATE PUBLICATION pub1 FOR ALL TABLES;"
psql -p $STANDBY_PORT -c "CREATE SUBSCRIPTION sub1 CONNECTION
'port=$PRIMARY_PORT user=postgres dbname=postgres PUBLICATION pub1 WITH
(enabled, create_slot, slot_name='pub1_to_sub1', copy_data=false);"
4) Run sysbench:
sysbench oltp_write_only --db-driver=pgsql \
--pgsql-port=$PRIMARY_PORT \
--pgsql-db=postgres \
--pgsql-user=postgres \
--pgsql-host=/tmp \
--tables=$NUM_TABLES --table_size=100 \
--report-interval=1 \
--threads=$NUM_THREADS run \
--time=5000

With that, I've mentioned internally that there was an extra leak and
left the problem lying aside for a few days before being able to come
back to it. In-between, Jeff Davis has done a review of the code to
note that we leak memory in the CacheMemoryContext, due to the
list_free_deep() done in get_rel_sync_entry() that misses the fact
that the publication name is pstrdup()'d in GetPublication(), but
list_free_deep() does not know that so the memory of the strdupped
publication names stays around. That's wrong.

Back to today, I've done more benchmarking using the above steps and
logged periodic memory samples of the WAL sender with
pg_log_backend_memory_contexts() (100s, 50 points), and analyzed the
evolution of the whole thing. "Grand Total" used memory keeps
increasing due to one memory context: CacheMemoryContext. So yes,
Jeff has spotted what looks like the only leak we have.

Attached is a graph that shows the evolution of memory between HEAD
and a patched version for the used memory of CacheMemoryContext
(ylabel is in bytes, could not get that right as my gnuplot skills are
bad).

One thing to note in this case is that I've made the leak more
noticeable with this tweak to force the publication to be reset each
time with go through get_rel_sync_entry(), or memory takes much longer
to pile up.  That does not change the problem, speeds up the detection
by a large factor:
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2061,12 +2061,13 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
         List       *rel_publications = NIL;
         /* Reload publications if needed before use. */
-        if (!publications_valid)
+        elog(WARNING, "forcing reload publications");
         {
             oldctx = MemoryContextSwitchTo(CacheMemoryContext);
             if (data->publications)
             {
                 list_free_deep(data->publications);

This problem exists since the introduction of logical replication,
down to v10. It would be sad to have an extra loop on publications to
free explicitely the publication names, and the issue would still be
hidden to the existing callers of GetPublication(), so I'd suggest to
change the Publication structure to use a NAMEDATALEN string to force
the free to happen, as in the attached.

That means an ABI break. I've looked around and did not notice any
out-of-core code relying on sizeof(Publication). That does not mean
that nothing would break, of course, but the odds should be pretty
good as this is a rather internal structure. One way would be to just
fix that on HEAD and call it a day, but I'm aware of heavy logirep
users whose workloads would be able to see memory bloating because
they maintain WAL senders around.

Thoughts or comments?
--
Michael

Attachments:

all_used_total.csvtext/csv; charset=us-asciiDownload
memused_vs_time.pngimage/pngDownload
�PNG


IHDR����K�	pHYs���+ IDATx���{@S��?�	WC�_����x�:�����:D���V�v��ZK;t�o):�Uf�iK��*����,��oW�"�ET�P�Dn�\B �?N�1 ����<������������y>�b�V�	�S�P�L`x����
6���������{�����*��D.��d���������<S'�a�b�����c�Ht����S��:w�h���okk��40Rnkk

�����h�"���?��?z�����y�fKK�����zi���AAAvvv'NLMMe������n��-  @,���k�
�3�q�L|||UU�H$��U�B�|�r�D�Y����{���__�t���3�����?�H$���s��MOO����D.T�TR�t���{������3g��I��-[6����1
pAA����������~WP(s��
		a�����={�DDD����zi��=[�n��q#!d���7n���{����B�����U�!b�x��(�`�OAwuuEEE������i[����������?��������yz��UKK���OB��_��-%%%2�,,,L���_���nnnV*�����r�Jemm=�O�����
����r_
�";;��'�9r���������.B�����i�&BHWW��-[�/_�{�*�J	!����%������R����-<<\"�tuu��w/---<<|p>$�����.//�������lllt�***$���zzz\\!���#!$))i��I)))J�������K���B� ��O3������������B�p���������[�7m��d����^�j�����!!!r�<!!!66������+!!A"��T��>�����������...g��1t���)�S�Negg�����7���w��e��\������d���e�njj�,ak;�$x_�tI.��:����"11��7���F��j���z{{wtt��������q�!R�Ts����wA``���-,,}U�~�H��}�v���y���������#
{����USS���iii�,�t������	!��������������/�}�R@@@``���g�{�9f�W_}��3��9��G��:��^C��l�#���q@m0B}�A�&�������Y�T(
��;��������)��&L��e�����9rd��=VVV������+W�|��g	!K�.���������*))iii!�twwWTT���B������bcc_{������s��={�����}L.���������y*�H�8++��_�D���&L8x� �<>>^$���1O����������j��9s�~���2f���$kk�)S�|����s�
�8�6���B6�
�~������!�f(j�d�
�8����������o��Fff���7�e�������jF	���
F��+d���`��l|@0`�t��1SG���l�#���
��#v�����5j��E����B���v��9v�X������{���|���=<<�������!�K���.���������`�c�P�����4�R��w��={���k��������}��}��eeeqqq����?f���i��?����!�J����l�r��9��5k���};333''���]����'z;�&G��$�G�}�������o����7�xzz�������Y�y�����{������1c/^�������������6l`^
���7� ,�� ,4=���D


w��mii�7o�f���s�R��������W.�B���	!�����f���_�!�,��^�6���q�ldoo�y<b�BHMM
!d��y��r�JBSh�_���K/������YYYeee1�e.������Z�+B����������3g���\(666�����<�Lff����mll�|�I�U��*�J��


z��!����n���)���=����k��:�V�f�6A6��m������{����o������g���� BHcc���������������\����+++K�=�(��&�)h�������������������_�����7b���'O���^�re�����O/++#��3����������oo������[[[����`�Ds{��l�#����!!!a��M�������U����C	HLL<r�����������%Iqq���S	!'N�7n\XX���������Y���a��B;LAG���=����w����KK��;wjNC���7�x��7z.illd�3����=_z��w9���S]��y���y�C
�Y���Dm6j�d�
��!�q	hC��V�b���A�6�V�l�����;%%E�jb�x��u�v�2N*�C�5j��~���o����9�W��U}}=�~����{���#G����((( �tvvn�������S����j6���
A`�Ds{��l�#���J $&&
��������������_'��3���>{����W���O�6�}�����m��u���.\`6���
A�0\�1����9C7   :::>>���Us[_B�o�����/Z����j���������������.\����o~��E"�{��G	


���1��37�%�/3Km6j�d�
�8�;w������U*��?��k�}{),,����>}z�[�{s���;��#=o���������v��M{�{sC�
S�f���c��Mm6j�d�
�8P(��---��������~�3�s����Z���v8P�O0���	`����<.,,���}������lmm��w����?xJ��	a��,���5��l�#���qPVV�w��|��������/

�v�AB�P(����s�Ncc��Q���Y���p���;w�;vl����}��I?��`y���d2Yppp{{{XX���G������|��3g�<p�@]]���+�N�z����[��Y���g�9q�Dhh�������l��P(F��j�*S&��-h�9G�5u���Fm05�q�l�ruu��{��S�%>*��L�;�
���!���-::��������>*j�V(�����'t^f�Am6j�d�
�8�6�;P�L��Za
��"P����j�Q� W��������`hsj�00�"P����j�Q� W��������`hsj�00�"�Wnkk��a�������{��g�8���)�6���q�lP������K$�\.�����������L�L�$Uq����_t�9?~,��]�6u�T6��0�)kU2{O��}u��GE��h���okk������� �O����+u0�W7�t[[[PP�X,������7o����������dgg7q����TfaMM�����m��b��]�8|J��)�6���q�lP�7[}a��i��Z���	&���h8>>���j�V#�\�~}��I2�L �=�����%ILLLNNNTTTdd��!*�J*�N�>���4+++))����>���m5�#d���O��4e5�������}}}����#��W��������=_�����}����������V�322���4����_�u_�8;��"�����{�F>*�#���������???CW�������


��W�^���d��������������L�Y���/fgg777+����Vf�J������3��F��W�rA��WSg!�S�P(���j���jXX��M�!]]][�lY�|��e�tlG*�B���5K�������R���[xx�D"����w�^ZZZxx8�fR��p@m0�l\!�#|f{��v��H��8���������Q^^��W_���p[-))i��I)))J�������K���B� �899i�0������������B�p������>��[�nn���b� }��vOaaa�W�f'&&jk��^-99���]$�<y��{{��?��SB�B��,),,$�|��7���{����z�m���o��������S����o��j+W�|�����%K�~Q�HDijj5j����I����Y����'GQ����|��%�\����s���Ebb��o��~���Xoo������8�p7n!D*�jN&�{��@ ��,,,4��=�v�Z��.��cM{��<��l&y�3
yx��0�-mV�}���Q����u��^
��Z���A�C\\����L&kiia�ZNN�@ ����t��@ �����U����y�x�����s������G��:�V�f�6���B6�
��lE���
���4����1����D �^���#88x��5��+Vwtt�����������kk���~�y��t��qkk��������3��C`�0�k�����Q������/�������eee{��!��[�n��Y�f������o����VDD�����7{��������g�}���"mV�[���O�M���G+���t��1�QA!j�Q� W������l�ZU~�</A�{+r�B�x���!�0S_��� �Lo��J���@� ��hy��[������1��#��
�0}%������`�6WS�+��>���3G!j�Q� W����!���K��xW^n	M�]��7��/�(��.�-F�rA =�K���Z�`4_����S��G�:H��������!����`�Dy���Gm0�l\!t�zE�/��tf�
0�S}M�����
=`�I�����M�`>*j�V(�|;dQd���AX��;%�f�6A6�������|�*���	�#�{�,�V��`������UO�����f}s�|f`
��0�-mV������)m�����T����#�{B6K4wJ��Fm0�l\!�,sc������{�'�����z�\Xy���4i���~��m�����`�����CEn!�+r�B�4��=/�
M���+?�������1�e���z��z�����*�;f��C���+15��l�����b��k>k����
����Uu�m�}��s��[I�"k[����l;���F_8~Bs���l�#���q�-��Zu�p��E���9�5>B��f���@��3�;�'��������������5�Vu�]�4y�h��1��6#���J�fY��4Oe����5�?�4���/�q�"G���Cz�F��M^�<k[�_���"������0K����'4wJ��Fm0�l\!����*R<��jU��S�39Jd��K�N�	���4g
�\��)��YV�0
C[}a���B��	�.����C5>*��v����m5����<����3L�l�h��P���`��B6�;�����4�8����@��/�;�'�7�l��+X_3��Y=`���!�����_���8\��=`�Ds���l�#��������������0���s���b�,�?�9�B���.E	zp�c 
�Y��SBm6j�d�
���r��K���%��v=P�������[�������+hsj�0���g����!��S�p?`�Bs�^������Md��AX��;%�f�6A6��M���?.���fM�4z���o6<���U8���A�0�e�
S�@�;G��(�����A�8\�CQ����U�A�\�)����
6���������{�����j�Q� WC2��w�Ye��U^n��2��������0�
�D"���2�,;;;999//���~��^��lVO�:����xgm��/l�w��/��?�k�bi��0X�Q�����"����kS�Ne�>z�`������{b���8�������TEI��2>B8>Bh��S�������'��i�����~�!,,������(��7m�����v�����:�[_�]���
�|��0����������bq��������;c�����

��kWWW��m677o�����r����^��PP������SSS��555�����m���v���2!�;%�f�6A6��:[�	g����l��^��5fls����\���`&Ds6>�(������*������[�������o��	�]������@^�~}���3g�z��?�X"�$&&��;7===22R$-\�P�RI�����������r��9�&MZ�l��}HvY=�����z���P�����
�M����|���h__���644����8qB�d��������y��������j[[�����/���n���������V�322���4����_�u_�8;������#�����OAwuuEEE�����������smm�����Y���"
	!�����UAA���������O�&��_�>""���JJJd2YXX�f��/��������T*[[[��*����zp>$;�,�8|aH��8p@�PH$�k�;wn����}�����Oy��W���6m�D�����e����uOK�RB����#������R����-<<\"�tuu��w/---<<|��Dh��P���`��2�l4T_��iC�=������������F�EJ�o��������'�d&%%M�4)%%E�TVTT\�tI�F
!���I��y�,ONN���tuu
�;v�

���BC����vOaaa�W�f'&&���I�T�?���u��M��������"�����}�������B
�fIaa!!��o�14<�;��}���)�S�Negg�����[���.\�c�������Ff���+���	!K�,���@���&��������/���)�K�.��rw��:����"11��7���������Y�������1ccc���;::����^�y��q��T�9������ 00��G����<>z����k���xlv�5�%J� �0���.d�z���h��GC���[���j���


����<==e2YKKK�����;BHFF�f��;A{{�Z��������.]999��J������5O/^<�|�y�9t��QSG���l�S#W�g�p����F'>*������������J��;�����������5g��\�������_�vM"����k�������5k�����F^~�������++������B�SO=�e���+WB������RSS_{��}����;��������>����q)J����
���=v{�������yJ����---/^��}�������GK$����B��}�L��������8a��={����s��u��]c������4����3&""B�T~��o����q����8T_����}���=�n�z��V�V�R���5*�q}a����V��z��z��y��<QCm6j����+#g�6���i]������7j�����GE^������CEn!�+r�B�L��9�B4s��>�6u0����S�������>�����)h��9g�B�f�6A6�����V�:�paQ�����o�#tg�F��an[������[�:�.r��Q���S��!mN�����7��k��i���_���E��m�~�QPc�B�*/��]P�����q�"'{/L�.|T�����j�Q� W��em���+��������Q">�/����`��l|@�aDY�:9Mj�!X��g�,0�a�U+LA1%���m5���\�mM����(c����f����S�f��N	���
F��+����S<���/��;�����`��l|����yI����@{������z�����2�i�O��4u0{���0�����[����CL���%�;%�f�6A6��f�u�!mV����Bl7�����R���
F�������i����D�y��9�B�~�7�:�H��<9Jd�80d�Z�F�@3e�*��xr�hr�3���7\~Bs���l�#�f����S��Gs�y��D IDAT�:�L�~##tg�
0����-�#d���;��j�)h
��*JR����a-/A^_���C�����j�Q� ;Y�j�jU=��LO����Fm0Bw6>���X#�����0����z��8��l|�p|���A`�B�����O'x���0�`
�,��)�6���p�v���CE)��i�J/,*��X�Ww�pC�����m�j!$��84�W[���m��
F���������[�W��N���)�jUkUgk����SY�z��^v�Y�t}����*�9�B`�iJ��8wSg0.E	���pV��f�T_�l�h��P���`d�g������j�������!�"��k� �A�S+��������/��i�[�A�������VQ}�A6K4wJ��Fm02���L���������i��7#�6�;��A�H��W���8S���Z����\���7�Vdi� �=`�����>�E���,��)�6����v��:��vc~��_����~3&j����=`��h���-��S0hsj�0���
��f��?�S����>���~�q�z�����a�777���{��q��0�-mV������)j59dQt��:�����?-�0�
�D"���2�,;;;999//���8��SBm6j���en��vc�s�x������c���u����e���GZ9'm��'j�Q������?~�899y���666>>>���3f�0u(��:�p���-�vEN�[�]�J[.�K���r�*�����a�����������Y��0+s���X�b7�o�+g��]�nv���_:���*L>��g�=������ �X�������v�


9r�������?��fss����---������������M�8155�YXSS����m�����X�k���|"�����3��5�a�~83R��������jtt�| �H�����_}���~���
^�~}��I2�L �}&��,�Hbbbrrr���"##/\�@Q�TR�t������YYYIII�O��G3	�;%�f�61u6ms���2�g4=�=���`��l|0F.((8x�`TTT��vvv~���;v���a��3�z���K��8qB�6������s���#z�={�l��u����'O~���^y����wB�B�����U�!b�x�����������	��D�0��y�R�f����������-111��'N���������#F|��=W���MLL�<-..&�dffj�|���#F�hjj���---��-[�l��UG#��:1�n��vS�0|T�G�P(���[�����s�B^}�����M�6B����l��|��e���x�T*%����k����wwwK�R77���p�D���u��������p��
��}���gy�&��:�0�o.//�������ll�������}�vll,�4))����)))III��v�BAqrr�,a3����+**\]],X�c����P�4wJ��Fm0b�l_����I�3o��5����6���������7m��d�������������O�81q�O����JHH�H$*����>rss34Cww�������3g�����|����(S�}R[�������uuu�S�=��[�����?��r�R���$����iV��_�x�RQQ�Y����B4�d�x�9��W���ZS�03|T��/]�$�����-,,,,,���e2���E��v	!���o�?����p��^/���z{{����8�_t��q���`���wA`` ��`��fn���cx��f����e����P������=K����t���Yqqq���2�Ls�FZZ���uVVV���������K�.	����^+���������h������?�C~^w�=z����6�����v/Mq�����o�o�P���`j���Qx�;;;;;;k�
�B�@����<}��w:;;?~�}���K�Z[[���j��>}zwwwdd���+�}�YB���K###srr���JJJZZZ!����������bcc_{������s��={����8��0���~��~��.r��������Zoj�U�X#����k��,�H�Z�����7XMMM||�H$�t�����������j��9s������Y3))������z��)_~�%����9��(O�H��z�V���k�6���;�B�?����Q�\��V��IwM`(��"�~Z�f@���Uu�m�}�����s���7�?��Z�9v���"G���}>YY7�TQ�3�.���6�����t�1��b��N�9/�����������q��q@m0Bw6>�{0"�����K09J��<P��Za
��m��S����V����`�P����j�Q���V_�~j�t�z��&��t�����
F���`zU^n�![_��� �9�BL�$UQ��X��g� �KE�Q�4�K�����a��6K4wJ��Fm0�'[����Z��}M��'3�oT�6������@��52;�S{=M��6�V����]P6>B8>Bh� �_��NN�>��)u0u0LA�%�;%�f3a��T�!�����Y��gn�����u���|���6e����c)�����V_j���8�6�;0���V�!��lVOQV�Z�:[�U�U��j�����s��~��&�������Z��� �O��@���0�y�/l�!���m�6K4wJ��f�`Y�j.G�BS}��JD�N#�����
F���`�ASy�%����C�JA�K��������
=`0H���F�B�/��zp?`J�(�:,N�C��P����j�����?.�� ���v�d���l�#tg�
0��\�]��b'���M���Z�z�?�|�o
ag��:������
0��^������Mx����'4wJ��6��N?���+���)jwA6���Fm0Bw6>�p���3��
���:�+��j�)h������+���1u0\��%�o| _Ud� `�0m�h��P�Mw���!����Us�tog[���4�l\Q���`��l|@���$U��Y�V�����b�����Y��7���������u�4rHz���
=�!��r����i�O���y	���m�7�:�09J49��-��Y~���z��b{�#2p�Q�a���Wbj!O'x��m���
�7B&G9�&�^�]���y�������)�6LY��X#�!���uq����N���	x�������y�F'd���l�#tg�
0#Y�jNM��C^)�:�|�[���O�7����
���j�)�!��E�S{=g���:����
x(���jG����\L�z���;%fk�u���)_t��A��p�i 7�f�6�;P�a���v���p����Za
zh�����[5����!�S���[5?��_�
�Y��SBU6��9��<������l� �#tg���C���k�a���6�V������+;���2u0{�T[[��
���|||�{�=S��T^n9dQ��V�\����Q}�Z��K$�\.�����������L��#�;%<e��V��W�Y=�T�T�-{I�w��A�l�P���`��l|F=���'''_�v�������������e���������~���U�f���s�w����k+�/%���Q���~���g�>z�4(IUdm��{�����9N�����WT��)������>3������bm+477o�����r���,���-���

����8qbjj*�������u��mb�x��]�>M�Y����5��4p��1s�������������(g�_UU������O�4I&�	l��u����?�H$111999QQQ���.\ ��T*�T:}�������������Os�D�Es�dP����x�C^8��m�y�x[X������]yc��4� 7�f�6�;�Q�

<�m����={��9sf��yt�e��=[�n��q�����z��W^ye�����P����j�*B�X,^�bEFF��<�K�_���\8>B�{�'v����^XT�Y���TS�L�R��=;...11���W����������'N���������#F|��:�R\\L����,���OG����TWW'ZZZ��[�l��u��0F�9�����Wb�
z��sM�B��������<��a��������
�B"�px�����i�&BHWW��-[�/_�l�2o�J����\�������[*�������K$����{��������sH�9dQ4c��S{
�t�_��s�x�x|X�k'_+��"~pyyyll�G}dcc�mIII�o�NIIIJJ���8x����
!���I��y�,ONN���puu]�`��;BCC��29�;%��}9������C8��-����@�W��;���j�Q�������i��%K�<����������� �HT*�G}���f����5�]\\��9�9����j�;y=��wk�%���2��x�����S�������vV�\���NY�d���E"!���I��y�,2��]k�Zq�v�lsc����]���1�v�� 7�f�6�;x,��.]���������2�����������Xooo�P�w�q���w�w������'��4s#����A||��g�XUvf%y������=K�������� �!..���S&�i�C���!�j�:''G dff^�tI ����}K```tt����������!?�;g��=j�Z����w��y
��P�i��l�P���`j���Qx�;;;;;;k�
�B�@����<}��w:;;	!%%%---����������\BHpp0!$22r����>�,!d������999VVV��bee��k���;����/^�����f����]��9�`���k��<`�D"��s������,>>^$���1�UUU999����x�fRR��������)S���Kn���s���k�g���)���"�~Z�f�R�<��R��� S���o��Ks��Xf;�������_o���f��
���
F���`4�ZU����u!��:���Vi[��K������X3U0������/l�K�W��L�q��Zv���\��sM#�����.rt���|���z��2�z�&�GE@��
����-y{��Z����7/z��Vv������������9�8[^�]����:�0@��Oh����*NN��������J�[���b��N�9/�����������I����A6n��j������kA���� �O��8������y���`r�hr��� (�@`�U+LA�Uv�����1��c\�=��|T�
�27V�Vu��C�=`�d�N�����b������]��.���q�lP�������������@{/���9�BXY�:���/�qv�����z�`<�7d������0X�6K|wJ.,*����j��Km��`��B6�
F������)e������Z�Z���V�V��������;�:#�P6���:�p�������#��Fz	Fz[�{	Fz��SSg�z�0PY�j��8����6K:%�Z��iR;��t?��]j�d�
�8�6�;0JRY�j����t3�mh�9���g��B^8�k� �
�#����x�CP}h�l��tJ���#d/��S�qP���6A6���j����xh:����V�JA n@�!��������[�.N��:�:����A���\`
�,���t�^^Q���6A6���j����#����0;f��4�����F�0����|�l��;6�[����.���q�lP����SJY�j�U)kU�U�C�='�-{���9M��=�CE�;�����S��c���R<���L@������i���~z�/��j�Q� W��������L����N��-{����2r8c�L�`�;�������%��`�@6��9!dF�+����)�6���q�lP����8���JR���/�5u0*���2B��rK�^��t?^�
�R������Rh� `����VuvA��Ks���l�#���q@m0Bw6>`l)����SL�Lfx������l�r��Ykk�-[����;:V��|>�|��1�����(�D"��e2Yvvvrrr^^��3��~T�5�/�07�
�������w��mcc���SZZ:c�#gx$�(<P��^���j�Q� W�������0�
p~~������SM���/������t�����������+++�����,�H���������o�|i�����d~~~�������B��3g�o��x��/�����#**�����[XX�#���B�/�B����d����������pa����?8>>���J$�]����K�.�9s�@��#|����$11q���������"�h���*�J*�N�>}������s���4i��e�t|��m/�(�u����w�>���������c���04P4]PPp�����(6+������3#F��{����u���'O���[o���+�w�&��B�U�VB�b��+222�E�s�qb�sL�/�������k�����|Xy��1��\_4wJ��Fm0�l\!�#tg�-���+***&&���O����?���*((`�^�z������������3������d���0��_|1;;���988X�T���2�U*���5��L^���A�����
��?����Xp�/j:���o��q������������/~����j�J��9s��+z����611Q���������
����,B��7�j��%K�l��R�����������#����X����������#w^J9�'�]�����7�t|!����pyyyll�W_}ecc������I�&���(�����K�.���B� �899i�0������������B�p������2�_�S�`��5�k2n���z7�Q���i�2���g��Z0P1�i��%K�<���}_���JHH�H$���tss3t������...g��ill���o��-���5N���k��}�O}���W~�YBQ�f9O���)�6���q�lP�����/��N������o��V�\���NY�d���1GP755i�0��Y�����^O���X�}i|��7W��xo���.��0!�_[�����
��i�!����_�����}�'&&2g�n������+V�x���z�fgg����k�.--
���o�{�9fIJJJTT�B�9r�A�,,,z>=z����k/�T^=6_0C�v�Z����z=��Vs����������D�c;�6VO����7�^�`�6��������u����\��766*�J�����������P8r�����9s�ddd������_�v-$$���{`BHPPPxx��~�<}���ZZZ�����i���]����A>Q�{+/��]P��~G�������kP�������Kt]!dQ�������'�e
���+�8=�i��>6�y������������������7���9o�Q�l�jk�V6U)�������^H�n1���q{�A?�k�z����f����+���;::�juqqqNNNNN�����o��<f^:~�������������Aff&�$}wN��fm�����9����[����>�k}��Q�%gm���H�d�����������
�v��l�����������:y�n��������W�M����#;r�2n/l����	:Vx�����\���O��O������=]V�a���
��S�O�Z��u6���
m����^X1�l��W�l\6��wIV�a�j��Ly���#�6��	�
�_�����Q��(G^y���'�W���������9����w��P�� IDAT��r�2�����wC�|~�oQW������_=Wq+G^Y��h�������W�_�?0���[����`~��i_�Y�
���q �G���3+,����e%�6��GAk�����d2��2'L��g���;w�[����k��}��1-����1c�DDD(��>�����7n\ZZ���>;(y�i\xf��l���8zQ��*eS��9zW^���>��\x+\<iW~!�.����J~�3^�
S��E��)��6^��n�(w+�������������?����c?����.>6^�P,�u�����?|����q��g?�x�u���/�tY��1_��tY�2�`���r����f��
�x�\6�U���B���js���b�d+��
��:s�z��z�������������+�M�{Y����q�xjinD@��g^���B��dw�������f�������B.�����u�����_��������su���{|zU���I1�AX������A���������������;wB�^�����1c�0kn���������7o����~5(���(�\-��{�(5@�^��������X������������2�Y�t)����f;2���C������F�
:��-�?�[�l*m������.��J^�����s���`������:��v�����Xm��+xX��u[t��t��/�o�qB���7���Y.��z�W_��
�����������l6�c����p�8!�l����u�i���

��2n�?xy�<+�zl���r��qi���Y:VX(�t����Wu��8]~s���_p�!����`�H����������w��u~�3��OO�+���u�� ����dc���z����3�
^z��?��'�������QSg�$���u�8{q����B�=�r���[����W���\2��7�^f��7���~3�g-���-���U�3��fk)��tQv���_����dk�g����e�0O�$�h�l��>2t���8���:�Xg���J�����>���~��^V�a����[����t�3z�p����*���d��f���-����P�P=M�L���h��g�ce��I�j�����s���EGz-O(��{3����v�=��=[�[f3��w�#[��n�Po��r/[�:}�����X=���77�`�S������,���8>-��:`&��9��	!�D�����]���-[gw�mE�4�3�z7�
��;v����L�-��%��S��n�q�^Vt���p���Ks���]m�x�����]���v��p�=
��B�[�G�����;vl �GzW[�=.�����
0+�V_F��9���u6Q�d�w���������)z������zy{�A�bS�](�w�8�b3��w�������_�����+��v,���Q�f���:}�G�^���j����Jo����I��X9��������.3��.����U���8����V��V�Z�����j��lx���Z_7�'���$�W�6�w�����
�8�����r+m�W���K�
G+gk;����{D�,4
0+F(���p��O�^�}*�2��?�j?�JwG�5C�,�g�=�-[m[���@����>>�c�+����`3]|8����?3��]3o��*������M��������t�a����N��llz�z7�
���]���������]*���Y��f�����r����,��$��1�fc�����/i���{�\���m�n�� X��W�����^|����E��6y����F�Z!.����-RbR�]�b�I�p���
�����IB'+��~zg���f��xe���*����4�!�����#���1�3@�|$m���2�y3
�f�Y�?3�w�de;I�q�a9����2���+���oh
f9�e:f?_��/@AC������#{?�cA
�~��jx�;��'�����e�h��s��]��s�>R[6��2f�����X�k;vl��/�60���o������y�~X!��tr%�/�����2	��4�p_��?H������	v'���6,7�d3tl@���Q/�����{j�,t�47"�������������s�zf��{���qu�x��_�w_����g=.�>�����E_v��0!$2��#���fs���)h�#`b�����_B����XG��:����u�cE��}����-
JU'���������,�B��8!�E���dlpSg����<���m�LA�Y������O{���C�0h�}��q���u�1Y�qKQ�r�l
��GT�B�����&��eHi7u��V��q�fe�X[�b��	B�7���*n��dS�?�1�U��l��v�Z;K+?G��v�l���z,�����f��g���7��lL��r�c&6���2�U���8��l9�J����m��b�q&��1��M���;�b?�%�Ly�4p|GQ��(���b�=e_��Ks���BFY�Nu����~���3�����-����[���(I����1�3 �������y�����5L�08wR2�p���ZNE����	���jUu(:�
c���;!��������J`� X�$�9�����[����Y���F�l>��e��0��C���e��Cf��^�';K+����7#�!�_�Y��.���[�^9�� ����K�5�	!��.U�
a������=�``]�����(�.f;�g4,[���.6�6>C����_�~�=�RU�5�9���60���{3�4���������J]W���o;�o(!d��Hhm���#��?Z��u4��'����S#��v�a�n�Hm���k�P��]��Cg�5���������4�=�o�y������z������aY�{f��{�='#��b����X������/!����O�3:�#;y?������5=��9H=�������lk,3���A���	�k��o�=�S/�e���x�8a�'��%�<t_��C�Db��b�>aW�����^�:�*Z�W��chX�E(����=�a��b����c�2��t^��
|[Q�de�;R��&z�q t��8`�o�����L0C/Hi�!��/�����o(a1�a�����������}�=�h�����S����k����de{[_I0��0�@h�E(�O��B��\_������}���������"�}��{��8'������F�\(���q�#{��?��As�c�p���F����FO����F-�f�_B������#���)hB����`��?d����m�L�9<X[g�Y����Xvj9�h��8a76t��8-m�{�9:Z�GC�����A������Xmkid��S����F���Q2����_bH���8�S���7��@�K� ��p�`���m]�Z4���6{��m[����`n�_F�����m��x_m�g6�R����:4�C	tr}������9T=��2�{��z�sO}����q��1���5���#����;���#�(�(4
�.�`�,O:7�����j�eM[3�������c�s���6���j��Yh�N@�e����5�}�$<��/CG�[���S��G���	��a��`2��� i�c�J]�����xt�qcb��z=R�����sjM#��j��a�@k��{�0pgP�P�������A
b��*��V}O!e�o�`����6����?��^7��o(!d��O������^��Z��0�4���=[�o6CO�����x�l�����R���B�<�����0���z�5�3��&y{W'���!7.h������������`e3����5mx�<�9Z��p�����~_��WyK�������j��U��V/��������rz��H�~o��m4�1���U���&Z���VU��'�x�8CG��>z����_��azpO�j� `-��$������q������~�x�O�2� �,�2�q,4�,�*R��W�o6��_F�6��j�8�x�;�M0�����
�c �������c?��jJ5K����
%Z.���X���ns�l�/�f����^�t`������H�-��V6��B�=�w��e���9���=R8��N�zH�B�_���_�������� 4�B�����>j�lgsw�����X�m-KFO9y?_������a��4������zz�}�����m������
O�����6z'���?a���IB���vYk?�n�p��)h�
`F�#��~�S�m��VE][3��r\�l����*����{������5,��<A6K��'
=w�~l��������`&}����7�Y��Gv�V������_��d���q�C���^7G�t�F�C�ze�|�T�������o�kw�2kJ9���l��,�t_������=�U�m�����p�=���`]~�/��tJm;�����L>F.���X��:�=�/�� i,�W�XS�l"��?������=a���8�6�54�������d�����a�#`��O�@��l�����K�Y���B�z���X��B(lXx�;��:������*�������'#��%-�~��H&Z���fh����,��7W�-����\�GG���7�2�+�F��������{$*!���3�����,7���c�=q�x���6��b����B\�S�9|O{��~�UO�A0�������sY�H[6gt	;�����L~M������EU����a��IyE|"��ra��z!�������Ink��y�W���,�,�j�r�Cn�^VjnK,?3�K�h40���%4 
�8����lg�y8��s����_p�x�=���9sn��biX�g-7��:�i�r�����1���;��7aZ{�;�&�LT�z��s�Tt������y������?��^�����8����PE=8!��C1S9����}���0��m����z��s=]l
����ot��l��ZhA��
Pw��9B����9��
���7t��h<��$~-C�I���_b������B���.��U�t~�Z�oX�S���������������J��Br��a��5��H�����8��q���rc^`�#h�a�������=-�������`���v�6��hh���>a��s�}�x�v�<�X��\wmY�cx�
U�	!b���<���l�oX�S����������{�#)qg~��9�?�����8A���3`B�������`���0w�S��7�������{�g�&�����Z�%�t
��l�����y�5����1~����%�k�����W:[;������-�|�}���`�m`�/��a����P��S�s.�S��/����y6G>l���x���-
�r���k��9~���]�-���
���s4_��RN
��k���:tH���}��-8�;h�?G<<>�� H�3m|����L�8�Lq��T��c�/�6'���)h�?�������v���
R�����7*i������>���Nk�t�z����3��YhG��e���	�/{���~���0�O������a�7����������w�p&`���e�N�)�����"��g,�{����������&�o(!��)�?�u	8&??Q�f������no��,���p^����2�{�l�%�����s�z:x�?�!�?��@�/O����r�0
���#�;�����g�Q��'`�`kg�?-*�����T��+��pp�
���fq���>�� ��P�r����J������@��H$��%�n/�
��������C1��Qm���<����������:,'��v�,�I*�����q������8��mc�m����8m�
,��S�����%`���B���[�}�G��ImA`���;��V<t�1'U5�����L	!��C/��2���/���s���8hqFqL�y���W`9���_�O�{�����p�	��A��i��
`�-W��%3`Ar&�.�BvI�tJRUqg[��O�|���ah��b���l=h�
���?K���0A�V�6v����5v�+|'��4�����1n�
�����B�f�;m������=��q3,���4��I�<���f���U�	���T`k��]�z�9������E�V���g�*|z��G�+�6���s��kP� ����B������
uh'��:,`��`�D�?�-�B�g�3H�9h6'�����B�S�d�lqp��s����2�l�;�L�{�<��?�y��R�A�������BGq4�(�;�aaL�:���hE=�xl��������0�-}����u;�O�I0
�K���d��/������]�tl�@����
��w~�w���7��NR�lt���<n��_[��/���\2����!x���������������X���P������?�%�NA7v���L�r��*%�eS|�v�B�v�N�
��	6&�\<^��>�/#.u	�Hl��]�h�$�FvIT���������3x����F[�9h�~A����?�'���
e1�J:tH������9�����,t������h� �=�|O��l~��M(�4��b
0��U�`{,���u,Q=����w��/o}�@��=ZQ.�O"�L���/�j��4��r3`I��$��K�g��l���f(i�
�?hm`I�a��
e$����[�~���w���H���G��q���^����q�Y|�04t��m�2d���v�D���v���������m�B���DA�u,���cq

`��o�A�
�K���d���d�g4���E��	�
�y�
��N�-=Q���2<���&L�|hO���n�b�x�%G��q����v���[�L�{J�	f��}���A	
0���*�_BH�r���;��AB��;?$���{���`U��[������W]��H�t�BH�q�f��8�$�I�bQB�X�c����?��3������R����N�$��F�P��������1h�$X��������@��	r�����lov����{���3����,n���&���H�VV�����:������������:��a1	��s`/���
,�%E4�t������a���%�,��,�{�7���oy��rlC���{eee������gxx��7���'u(�(���m`&Uw�fp;��3H���� U�a@�{`	{pI���$����T�`�Y�o�o+|�{x�8��&�1T�kkk
E|�`a�9�� ]k� (�B���h[���#��[8�=}�]������a��"H��������c���^^^111;w�4������_~Y�Ryzz��3����p����������{�n��v�������;�0Z���^z��i���T��;w:��$Da��m�b8�?������/w�Oog�bP�����B�����GSo�����K�7���B;R��4��[��~������WWWo����W_�����-''���D��TWW/Z�(--����G���/g�����d2��Co���Z���m�V�}��g322��n0���{�����������_��	?�7�tL��p�|kh6?0�D�e��#�?�D�e4���3`4��O����QUUU\\�z��y��=��s+W�<u���n���%%%j��������|�������B��_�|9??����nn�/!??��-����f����/�Z���W_%�(�J__���z��R�V�\YYY)��u*
;%���������p�K�7h#>k�!�X��,d���Y�
{��bP;n������~.���P;h"��������^�������T*-vkll��������+V0�����uuu��/��������n��a��u�Oz��5�N����nY�lYMMMwwwBBB___oo/��`0��ra^*B�V%��s��uG�0	��_w��X�dnnQ�������M���=69�"�����>�(++����zj;44d�%((������c���)))�������y�����k��v�IDAT'�x���	!QQQ����������������T�Zm4�_�~�����Ta^����)a�A�_����=h�#���+���7��l�B����}
��q/�l����9�L����]����WVV������?����FGG���s��-^���r��yBHWW���ii���38������\^^��\����???v�5����,###((H�T���,]�T�W	d���?6B�;�����R�������A��a8��C����[���E[A>g��?������#G�h4���,�G}||6m�TTT�����������JEE!���*44t��]j�Z�����78x������O�<�����������w�eI��N	3v�0��#1t��d����[�7���V�\����M(�4�8�s�^^^��/���)--����x���p��U�7o^�passsNN���[P��o��f��^OIKK��B����-���vs!t?1�C��K�j�����.|x��-���i����Y�������ovttttt8p���%66����7R�h4aaaJ���G�,�����:�����L�#�83��f���kb��"
y��=�e*�e�4�FC����_/ydC6��f��b_{������i���k�@�`MUU!��������#���z����>�<���`���R����Z�V&��={���\&�i�Z��P(JJJ��DGGggg���X�b��%<��:8�UZ��-�?\_uT� B�"�XcCbbbdd�{��WSS�g�oo�-[�0���j�L�|�r����'���_�xq��U!!!mmm&�i`` !!�7��
�[BB�����d����j�Z�V.�o�����y����r�|������yyyL�����|��A�#��s�������_���@�����/d���`&���QD�
������?�h4��ookk�<y�Z���}���o����-[��]������t���b���B�N��.���d����������O?�\,M).....&���ys��)������+**��ukll��c����Al��C����a��8����q�08#���-N����V
���M��0�VC'���g4���
�Kb/������9��AC6~��j����@hs��00�%P�]��j�Q� _�������� �9�B��(�.��N	���
F��/d���`��lb@���v���F	`�Ds���l�#����@m0Bw61�HmN��z��
�K��SBm6j�d��x�6�;�P�$�6�]�=`�Q�%��)�6�����l<P���M(�@��.����0�(���h��P���`��B6�
F��&`	��iz��@`�@vI4wJ��Fm0�l|!�#tg
0����=``�0J��$�;%�f�6A6���j����@hs��00�%P�]��j�Q� _�������� �9�B��(�.��N	���
F��/d���`��lb[�����g�	/((�:�]c��������y���c���������7o�����a��{���?>>>~8��a9���V�P��R��N	���
F��/d���`��lb�������c���^^^111;w�4������������M����>}����ns7���YYY����w��xh���111^^^qqq�������%((���^�6m�J���s��/�'�����CBB�}���/���_.����X����1y����������O���j4�#���DDD�X�B.�����?��~�BQZZz�����b�Lv��i��t��q77�#G��L����������s<����B�� b��s�����}�]v���k�����-,,d���qcll,���~�����L&�BaQ�#""�o�n���-2�L���~~~�����M�6q<
00��"�����omm]�z5����G�T��yhh��[777B���G=<<�����_|��������O��a��u���s��5�N����nY�lYMMMwwwBBB___oo/��`0��r�^�th��P���`��B6�
F��&
�K�����S�NeffTWW[��V�'O�\SS�������J���{��V�X��C�L&��0���+WZ�[�����	!�������jB���M&SZZ����
CCC���+**8b;gp�A6�
fB6���j���^6�j����������>������.d�������b�.--�9s���������������������na�f����edd)������K�
��F���O>���?s���O?}���}��Y����U]]]QQ1{���/n�������J944t��]j��`0���?88x��n~r;00������9�s�^^^��/���)--���0���������h��������/�h4EEE����k�����������>Q@@!������|�l���3�������������6���|����Pll,�%22r``@�����B4MXX���@nn�}o�����Q�R1[d2Ytt4����������x�6A6���j���	N�����'''WVV>��������Li��m}}��9s�-


nnn��.���TVV���������p<��i����O�:��#�0[N�8������3��&��D#b~���7n�XXXu��������LOOOB��;v��5888u�����Z4s��K�.eddxyy
fdd�Y�f������<##C��zxx\�v����244���|��BHBB����F���qctttRR��S�>�����J�^&�_Wm���m��M�'OV(���yyyz��yH�V�d2������[�FEEy{{���j4���>������������?�������L�E�Y���7o2{���FEE�����g����F�����jH�@�
�m64t�_����~����N6�b�t�G�E�L�1��sgLL�������a>���$7�5L�7�l�����?&&���#�.�/�l��F�M�&4��-h�4III���Z3���9���������L�1������-++�x�bqq���[qq�#��M�q�X�T�q��&��1�j���wDD�E�A��&��	`��P�m�������=[�V�|�#�x�y,�������������E6���	��l������w6~fkkk}}}����������M�A<�H��[���'��={��r��77���.�e�<yr~~���9'�E���G�����1��m����A��f�o�������ypN6�f0.\���[RR������l&
M�<��l�����na���lw�������x���'<<���_fZM�$��/�s��6�g��IJJ��9�5�VWW��GeeeUUU���l��q��gOgg�Z�6�H����F(4a������\����!������juTTTEEEnn.!���^��&Il~y���1|����\�r��B���g#����������U����&�Fs��	�>�,�^6"��	�G6��3_��9n���~�`�����]�vi4�=9�9?6��:9*=c��?�1//��w�e*�H�(���a�;�H��������4����?h�$��&v��f�)hK.h��_�r``����#�$����$�Tc�������}��O<����A��fM��{k��3n���wp��������)..�~H�A��fM�_t��
���!���
�,�L8�I�_I�J2�/������+++�/_�n�d�lf���q��g��4���l�{:s�������'N��|�5;;[���7n����G6��%�E�`>�x\96�EGGggg���X�b��%N{v��������g������������#xT�`4���c��ryuu��C���l��[UU!��������#���E\�7�l����;wtfrssCBBt:]OO�gj�8�I��M�#��l�����r���{kkk���d2����b�y��B�HOO?{��V������dEEE��&^���z�S�r�|������S~y��j/��c���###W�Z��9��]�A��&�����������{���f��=���[�lq$��I>n�,>�#��qd�|��0�l(��I��auu��e��r��3���;�l"������Q9�I;����6O8���8�	�$�����)�N�&���,
0�g���E6�M�#���$���$� `	�H@(�@�
0�P�$�0����k���C|||��v���`��{�����E�V����}��Z��!~���3����x���_iiiZ����F�1)))!!��7�#��Bm�y����!��*>>�����_����g�}�d��K�.��5K�H.��F�Gy����>x���q�����OAO�4i��}/�����``���;���W�^=~�x�J�l����+�L�:���3&&�������>�������og�|����/���?~�/~��O?����x�������B1_7��AU***.\�v�Z�^?g����r����#""�Z���5���+SRRjkk�,Y������A���-...**�y�fnn�Z�~��7������			*������755u����.]�r�JrrrJJ�����GSSS���E~�.`T���pss��d�������������Mcf�aaa����6m���[�z}CCCoooII��m��|�������z*##�����P���III���|�MWWWzz�J�R�TZ�����y4))���������h��0���;��"00�����BHgggCCCOO��?���������[�nY���%,,��666666v��5_}�!d��Y��!!!�?�e�$`�1������nn��U����~x�O��YCioo�8Tgg';�%�xzz�;wn��Uo�������L�r��Q�Q�R���_��B��`jjEEE�����Z��T*����������~�m}}�c�=�v�Zf*L~*�L
0�GBB�B�hhh�9www�=CBB�O)�����e���O��_�*��.]��la�dND`�Q*����W�^e�j����o��)//�������uuu��/��a���III���c��r��c�=�o����������B��E��G��;$��@\
0�h�e��N�����jy��]�vefffggGDD,[�,   ??�z����K�.}������>�hYYYYY����N�>}������3��>}:%%��+�p',�i��9K�,��������`
0��������L��
`NAOK�.��m���+9VC�����g�3����$�0�P�$� `	��
��rca!IEND�B`�
logirep_cache_leak.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h
index 9a83a72d6b..809bd1f0a6 100644
--- a/src/include/catalog/pg_publication.h
+++ b/src/include/catalog/pg_publication.h
@@ -103,7 +103,7 @@ typedef struct PublicationDesc
 typedef struct Publication
 {
 	Oid			oid;
-	char	   *name;
+	char		name[NAMEDATALEN];
 	bool		alltables;
 	bool		pubviaroot;
 	bool		pubgencols;
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 9bbb60463f..8f4e3b83f1 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1061,7 +1061,7 @@ GetPublication(Oid pubid)
 
 	pub = (Publication *) palloc(sizeof(Publication));
 	pub->oid = pubid;
-	pub->name = pstrdup(NameStr(pubform->pubname));
+	strlcpy(pub->name, NameStr(pubform->pubname), NAMEDATALEN);
 	pub->alltables = pubform->puballtables;
 	pub->pubactions.pubinsert = pubform->pubinsert;
 	pub->pubactions.pubupdate = pubform->pubupdate;
#2Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#1)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Fri, Nov 29, 2024 at 11:05:51AM +0900, Michael Paquier wrote:

This is a follow-up of ea792bfd93ab and this thread where I've noticed
that some memory was still leaking when running sysbench with a
logical replication setup:
/messages/by-id/Zz7SRcBXxhOYngOr@paquier.xyz

Adding Amit in CC in case, as we've talked about this topic offlist.

This problem exists since the introduction of logical replication,
down to v10. It would be sad to have an extra loop on publications to
free explicitely the publication names, and the issue would still be
hidden to the existing callers of GetPublication(), so I'd suggest to
change the Publication structure to use a NAMEDATALEN string to force
the free to happen, as in the attached.

That means an ABI break. I've looked around and did not notice any
out-of-core code relying on sizeof(Publication). That does not mean
that nothing would break, of course, but the odds should be pretty
good as this is a rather internal structure. One way would be to just
fix that on HEAD and call it a day, but I'm aware of heavy logirep
users whose workloads would be able to see memory bloating because
they maintain WAL senders around.

After sleeping on that, and because the leak is minimal, I'm thinking
about just applying the fix only on HEAD and call it a day. This
changes the structure of Publication so as we use a char[NAMEDATALEN]
rather than a char*, avoiding the pstrdup(), for the publication name
and free it in the list_free_deep() when reloading the list of
publications.

If there are any objections or comments, feel free. I'll revisit that
again around the middle of next week.
--
Michael

#3Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#2)
1 attachment(s)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Nov-30, Michael Paquier wrote:

After sleeping on that, and because the leak is minimal, I'm thinking
about just applying the fix only on HEAD and call it a day. This
changes the structure of Publication so as we use a char[NAMEDATALEN]
rather than a char*, avoiding the pstrdup(), for the publication name
and free it in the list_free_deep() when reloading the list of
publications.

I'm not sure about your proposed fix. Isn't it simpler to have another
memory context which we can reset instead of doing list_free_deep()? It
doesn't have to be a global memory context -- since this is not
reentrant and not referenced anywhere else, it can be a simple static
variable in that block, as in the attached. I ran the stock tests (no
sysbench) and at least it doesn't crash.

This should be easily backpatchable also, since there's no ABI change.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

Attachments:

0001-untested-try-at-fixing-memleak.patchtext/x-diff; charset=utf-8Download
From 3ed287180a007447bbee152177139cf1d4347625 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@alvh.no-ip.org>
Date: Sat, 30 Nov 2024 09:02:10 +0100
Subject: [PATCH] untested try at fixing memleak

---
 src/backend/replication/pgoutput/pgoutput.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f071..010b341e0f5 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2063,12 +2063,16 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			static MemoryContext pubmemcxt = NULL;
+
+			if (pubmemcxt == NULL)
+				pubmemcxt = AllocSetContextCreate(CacheMemoryContext,
+												  "publication list context",
+												  ALLOCSET_DEFAULT_SIZES);
+			else
+				MemoryContextReset(pubmemcxt);
+			oldctx = MemoryContextSwitchTo(pubmemcxt);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
-- 
2.39.5

#4Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#3)
2 attachment(s)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Sat, Nov 30, 2024 at 09:28:31AM +0100, Alvaro Herrera wrote:

I'm not sure about your proposed fix. Isn't it simpler to have another
memory context which we can reset instead of doing list_free_deep()? It
doesn't have to be a global memory context -- since this is not
reentrant and not referenced anywhere else, it can be a simple static
variable in that block, as in the attached. I ran the stock tests (no
sysbench) and at least it doesn't crash.

This should be easily backpatchable also, since there's no ABI change.

Yeah. Using more memory contexts around these API calls done without
the cache manipulation is something I've also mentioned to Amit a few
days ago, and I'm feeling that this concept should be applied to a
broader area than just the cached publication list in light of this
thread and ea792bfd93ab. That's a discussion for later, likely. I'm
not sure how broad this should be and if this should be redesign to
make the whole context tracking easier. What I am sure of at this
stage is that for this workload we don't have any garbage left behind.

+			static MemoryContext pubmemcxt = NULL;
+
+			if (pubmemcxt == NULL)
+				pubmemcxt = AllocSetContextCreate(CacheMemoryContext,
+												  "publication list context",
+												  ALLOCSET_DEFAULT_SIZES);
+			else
+				MemoryContextReset(pubmemcxt);
+			oldctx = MemoryContextSwitchTo(pubmemcxt);
+
data->publications = LoadPublications(data->publication_names);
MemoryContextSwitchTo(oldctx);
publications_valid = true;

Something like that works for me in stable branches, removing the need
for the list free as there is only one caller of LoadPublications()
currently. I've checked with my previous script, with the aggressive
invalidation tweak, in the case I'm missing something, and the memory
numbers are stable.

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name. Anyway, your
suggestion is also OK by me on top of that, that's less conflicts in
all the branches.

May I suggest the attached then? 0001 is your backpatch, 0002 would
be my HEAD-only suggestion, if that's OK for you and others of course.
--
Michael

Attachments:

v2-0001-Fix-memory-leak-in-pgoutput-with-publication-list.patchtext/x-diff; charset=us-asciiDownload
From 22795bd2c7a8c7934d4ac77cfe1cd2ef36ae9fd3 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 2 Dec 2024 09:50:01 +0900
Subject: [PATCH v2 1/2] Fix memory leak in pgoutput with publication list
 cache

Blah.

Analyzed-by: Jeff Davis, Michael Paquier
Author: Alvaro Herrera
Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz
Backpatch-through: 13
---
 src/backend/replication/pgoutput/pgoutput.c | 22 ++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f07..41d204ac3b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2060,15 +2060,23 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		char		relkind = get_rel_relkind(relid);
 		List	   *rel_publications = NIL;
 
-		/* Reload publications if needed before use. */
+		/*
+		 * Reload publications if needed before use.  A dedicated memory
+		 * context is used to hold the publication information, so as any
+		 * memory allocated within LoadPublications() is freed.
+		 */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			static MemoryContext pubmemcxt = NULL;
+
+			if (pubmemcxt == NULL)
+				pubmemcxt = AllocSetContextCreate(CacheMemoryContext,
+												  "publication list context",
+												  ALLOCSET_DEFAULT_SIZES);
+			else
+				MemoryContextReset(pubmemcxt);
+			oldctx = MemoryContextSwitchTo(pubmemcxt);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
-- 
2.45.2

v2-0002-Avoid-extra-pstrdup-in-Publication.patchtext/x-diff; charset=us-asciiDownload
From 22cb03ab8e0fe024f8c3bb36b1581a32eff04ebc Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 2 Dec 2024 10:36:52 +0900
Subject: [PATCH v2 2/2] Avoid extra pstrdup() in Publication

The structure is changed so as the publication name is included in a
single memory allocation, knowing that the publication name is limited
by NAMEDATALEN.

No backpatch is required, as this is an ABI breakage.
---
 src/include/catalog/pg_publication.h | 2 +-
 src/backend/catalog/pg_publication.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h
index 9a83a72d6b..809bd1f0a6 100644
--- a/src/include/catalog/pg_publication.h
+++ b/src/include/catalog/pg_publication.h
@@ -103,7 +103,7 @@ typedef struct PublicationDesc
 typedef struct Publication
 {
 	Oid			oid;
-	char	   *name;
+	char		name[NAMEDATALEN];
 	bool		alltables;
 	bool		pubviaroot;
 	bool		pubgencols;
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 9bbb60463f..8f4e3b83f1 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1061,7 +1061,7 @@ GetPublication(Oid pubid)
 
 	pub = (Publication *) palloc(sizeof(Publication));
 	pub->oid = pubid;
-	pub->name = pstrdup(NameStr(pubform->pubname));
+	strlcpy(pub->name, NameStr(pubform->pubname), NAMEDATALEN);
 	pub->alltables = pubform->puballtables;
 	pub->pubactions.pubinsert = pubform->pubinsert;
 	pub->pubactions.pubupdate = pubform->pubupdate;
-- 
2.45.2

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#4)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 2, 2024 at 7:19 AM Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Nov 30, 2024 at 09:28:31AM +0100, Alvaro Herrera wrote:

I'm not sure about your proposed fix. Isn't it simpler to have another
memory context which we can reset instead of doing list_free_deep()? It
doesn't have to be a global memory context -- since this is not
reentrant and not referenced anywhere else, it can be a simple static
variable in that block, as in the attached. I ran the stock tests (no
sysbench) and at least it doesn't crash.

This should be easily backpatchable also, since there's no ABI change.

Yeah. Using more memory contexts around these API calls done without
the cache manipulation is something I've also mentioned to Amit a few
days ago, and I'm feeling that this concept should be applied to a
broader area than just the cached publication list in light of this
thread and ea792bfd93ab.

We already have PGOutputData->cachectx which could be used for it. I
think we should be able to reset such a context when we are
revalidating the publications. Even, if we want a new context for some
localized handling, we should add that in PGOutputData rather than a
local context as the proposed patch is doing at the very least for
HEAD.

Can't we consider freeing the publication names individually that can
be backpatchable and have no or minimal risk of breaking anything?

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name.

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

--
With Regards,
Amit Kapila.

#6Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#5)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 02, 2024 at 11:47:09AM +0530, Amit Kapila wrote:

We already have PGOutputData->cachectx which could be used for it. I
think we should be able to reset such a context when we are
revalidating the publications. Even, if we want a new context for some
localized handling, we should add that in PGOutputData rather than a
local context as the proposed patch is doing at the very least for
HEAD.

cachectx is used for the publications and the hash table holding
all the RelationSyncEntry entries, but we lack control of individual
parts within it. So you cannot reset the whole context when
processing a publication invalication. Perhaps adding that to
PGOutputData would be better, but that would be inconsistent with
RelationSyncCache.

Can't we consider freeing the publication names individually that can
be backpatchable and have no or minimal risk of breaking anything?

Sure. The first thing I did was a loop that goes through the
publication list and does individual pfree() for the publication
names. That works, but IMO that's weird as we rely on the internals
of GetPublication() hidden two levels down in pg_publication.c.

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name.

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

Good point about the inconsistency, so the name could also be switched
to a fixed-NAMEDATALEN there if we were to do that. The subscription
has much more pstrdup() fields, though.. How about having some Free()
routines instead that deal with the whole cleanup of a single list
entry? If that's kept close to the GetPublication() and
GetSubscription() routines, a refresh when changing these structures
would be hard to miss.
--
Michael

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#6)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 2, 2024 at 12:49 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 02, 2024 at 11:47:09AM +0530, Amit Kapila wrote:

We already have PGOutputData->cachectx which could be used for it. I
think we should be able to reset such a context when we are
revalidating the publications. Even, if we want a new context for some
localized handling, we should add that in PGOutputData rather than a
local context as the proposed patch is doing at the very least for
HEAD.

cachectx is used for the publications and the hash table holding
all the RelationSyncEntry entries, but we lack control of individual
parts within it. So you cannot reset the whole context when
processing a publication invalication. Perhaps adding that to
PGOutputData would be better, but that would be inconsistent with
RelationSyncCache.

AFAICS, RelationSyncCache is not allocated in PGOutputData->cachectx.
It is allocated in CacheMemoryContext, see caller of
init_rel_sync_cache(). I think you are talking about individual hash
entries. Ideally, we can free all entries together and reset cachectx
but right now, we are freeing the allocated memory in those entries,
if required, at the next access. So, resetting the entire
PGOutputData->cachectx won't be possible. But, I don't get why adding
new context in PGOutputData for publications would be inconsistent
with RelationSyncCache? Anyway, I think it would be okay to
retail-free in this case, see the below responses.

Can't we consider freeing the publication names individually that can
be backpatchable and have no or minimal risk of breaking anything?

Sure. The first thing I did was a loop that goes through the
publication list and does individual pfree() for the publication
names. That works, but IMO that's weird as we rely on the internals
of GetPublication() hidden two levels down in pg_publication.c.

We can look at it from a different angle which is that the
FreePublication(s) relies on how the knowledge of Publication
structure is built. So, it doesn't look weird if we think from that
angle.

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name.

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

Good point about the inconsistency, so the name could also be switched
to a fixed-NAMEDATALEN there if we were to do that. The subscription
has much more pstrdup() fields, though.. How about having some Free()
routines instead that deal with the whole cleanup of a single list
entry? If that's kept close to the GetPublication() and
GetSubscription() routines, a refresh when changing these structures
would be hard to miss.

We already have FreeSubscription() which free name and other things
before calling list_free_deep. So, I thought a call on those lines for
publications wouldn't be a bad idea.

--
With Regards,
Amit Kapila.

#8Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#4)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-02, Michael Paquier wrote:

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name. Anyway, your
suggestion is also OK by me on top of that, that's less conflicts in
all the branches.

TBH I'm not sure that wastefully allocating NAMEDATALEN for each
relation is so great. Our strategy for easing memory management is to
use appropriately timed contexts.

I guess if you wanted to make a publication a single palloc block (so
that it's easy to free) and not waste so much memory, you could stash
the name string at the end of the struct. I think that'd be a change
wholly contained in GetPublication.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Are you not unsure you want to delete Firefox?
[Not unsure] [Not not unsure] [Cancel]
http://smylers.hates-software.com/2008/01/03/566e45b2.html

#9Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#5)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-02, Amit Kapila wrote:

We already have PGOutputData->cachectx which could be used for it.
I think we should be able to reset such a context when we are
revalidating the publications.

I don't see that context being reset anywhere, so I have a hard time
imagining that this would work without subtle risk of breakage
elsewhere.

Even, if we want a new context for some localized handling, we should
add that in PGOutputData rather than a local context as the proposed
patch is doing at the very least for HEAD.

I don't necessarily agree, given that this context is not needed
anywhere else.

Can't we consider freeing the publication names individually that can
be backpatchable and have no or minimal risk of breaking anything?

That was my first thought, but then it occurred to me that such a thing
would be totally pointless.

you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name.

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

True (with conninfo, slotname, synccommit, and origin).

FWIW it seems FreeSubscription is incomplete, and not only because it
fails to free the publication names ...

(Why are we storing a string in Subscription->synccommit?)

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Los trabajadores menos efectivos son sistematicamente llevados al lugar
donde pueden hacer el menor daño posible: gerencia." (El principio Dilbert)

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 2:12 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Dec-02, Amit Kapila wrote:

Even, if we want a new context for some localized handling, we should
add that in PGOutputData rather than a local context as the proposed
patch is doing at the very least for HEAD.

I don't necessarily agree, given that this context is not needed
anywhere else.

But that suits the current design more. We allocate PGOutputData and
other contexts in that structure in a "Logical decoding context". A
few of its members (publications, publication_names) residing in
totally unrelated contexts sounds odd. In the first place, we don't
need to allocate publications under CacheMemoryContext, they should be
allocated in PGOutputData->cachectx. However, because we need to free
those entirely at one-shot during invalidation processing, we could
use a new context as a child context of PGOutputData->cachectx. Unless
I am missing something, the current memory context usage appears more
like a coding convenience than a thoughtful design decision.

--
With Regards,
Amit Kapila.

#11Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#10)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:

But that suits the current design more. We allocate PGOutputData and
other contexts in that structure in a "Logical decoding context". A
few of its members (publications, publication_names) residing in
totally unrelated contexts sounds odd. In the first place, we don't
need to allocate publications under CacheMemoryContext, they should be
allocated in PGOutputData->cachectx. However, because we need to free
those entirely at one-shot during invalidation processing, we could
use a new context as a child context of PGOutputData->cachectx. Unless
I am missing something, the current memory context usage appears more
like a coding convenience than a thoughtful design decision.

PGOutputData->cachectx has been introduced in 2022 in commit 52e4f0cd4,
while the decision to have RelationSyncEntry and the publication list
in CacheMemoryContext gets down to v10 where this logical replication
has been introduced. This was a carefully-thought choice back then
because this is data that belongs to the process cache, so yes, this
choice makes sense to me. Even today this choice makes sense: this is
still data cached in the process, and it's stored in the memory
context for cached data.

The problem is that we have two locations where the cached data is
stored, and I'd agree to make the whole leaner by relying on one or
the other entirely, but not both. If you want to move all that to the
single memory context in PGOutputData, perhaps that makes sense in the
long-run and the code gets simpler. Perhaps also it could be simpler
to get everything under CacheMemoryContext and remove
PGOutputData->cachectx. However, if you do so, I'd suggest to use the
same parent context for RelationSyncData as well rather than have two
concepts, not only the publication list.

That's digressing from the original subject a bit, btw..
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#7)
1 attachment(s)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 02, 2024 at 03:29:56PM +0530, Amit Kapila wrote:

We can look at it from a different angle which is that the
FreePublication(s) relies on how the knowledge of Publication
structure is built. So, it doesn't look weird if we think from that
angle.

OK, I can live with that on all the stable branches with an extra
list free rather than a deep list free.

I agree that the memory handling of this whole area needs some rework
to make such leaks harder to introduce in the WAL sender. Still,
let's first solve the problem at hand :)

So how about the attached that introduces a FreePublication() matching
with GetPublication(), used to do the cleanup? Feel free to comment.
--
Michael

Attachments:

v3-0001-Fix-memory-leak-in-pgoutput-with-publication-list.patchtext/x-diff; charset=us-asciiDownload
From f65c7bf220ba152a868ef52bb7df245809ce6d73 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 3 Dec 2024 15:45:04 +0900
Subject: [PATCH v3] Fix memory leak in pgoutput with publication list cache

Blah.

Analyzed-by: Jeff Davis, Michael Paquier
Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz
Backpatch-through: 13
---
 src/include/catalog/pg_publication.h        |  1 +
 src/backend/catalog/pg_publication.c        | 10 ++++++++++
 src/backend/replication/pgoutput/pgoutput.c |  8 +++++---
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h
index 9a83a72d6b..4e2de947e9 100644
--- a/src/include/catalog/pg_publication.h
+++ b/src/include/catalog/pg_publication.h
@@ -118,6 +118,7 @@ typedef struct PublicationRelInfo
 } PublicationRelInfo;
 
 extern Publication *GetPublication(Oid pubid);
+extern void FreePublication(Publication *pub);
 extern Publication *GetPublicationByName(const char *pubname, bool missing_ok);
 extern List *GetRelationPublications(Oid relid);
 
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 9bbb60463f..2efe86d953 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -1075,6 +1075,16 @@ GetPublication(Oid pubid)
 	return pub;
 }
 
+/*
+ * Free memory allocated by Publication structure.
+ */
+void
+FreePublication(Publication *pub)
+{
+	pfree(pub->name);
+	pfree(pub);
+}
+
 /*
  * Get Publication using name.
  */
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f07..a51b1c15fb 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2064,11 +2064,13 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		if (!publications_valid)
 		{
 			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
+			foreach(lc, data->publications)
 			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
+				Publication *pub = lfirst(lc);
+
+				FreePublication(pub);
 			}
+			list_free(data->publications);
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
-- 
2.45.2

#13Peter Smith
smithpb2250@gmail.com
In reply to: Michael Paquier (#12)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 5:47 PM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 02, 2024 at 03:29:56PM +0530, Amit Kapila wrote:

We can look at it from a different angle which is that the
FreePublication(s) relies on how the knowledge of Publication
structure is built. So, it doesn't look weird if we think from that
angle.

OK, I can live with that on all the stable branches with an extra
list free rather than a deep list free.

I agree that the memory handling of this whole area needs some rework
to make such leaks harder to introduce in the WAL sender. Still,
let's first solve the problem at hand :)

So how about the attached that introduces a FreePublication() matching
with GetPublication(), used to do the cleanup? Feel free to comment.
--

Perhaps the patch can use foreach_ptr macro instead of foreach?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#14Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Peter Smith (#13)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-03, Peter Smith wrote:

Perhaps the patch can use foreach_ptr macro instead of foreach?

That cannot be backpatched, though.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)

#15Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#14)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 03, 2024 at 08:49:41AM +0100, Alvaro Herrera wrote:

That cannot be backpatched, though.

Extra code churn is always annoying across stable branches, so I'd
rather avoid it if we can.
--
Michael

#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#11)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:

But that suits the current design more. We allocate PGOutputData and
other contexts in that structure in a "Logical decoding context". A
few of its members (publications, publication_names) residing in
totally unrelated contexts sounds odd. In the first place, we don't
need to allocate publications under CacheMemoryContext, they should be
allocated in PGOutputData->cachectx. However, because we need to free
those entirely at one-shot during invalidation processing, we could
use a new context as a child context of PGOutputData->cachectx. Unless
I am missing something, the current memory context usage appears more
like a coding convenience than a thoughtful design decision.

PGOutputData->cachectx has been introduced in 2022 in commit 52e4f0cd4,
while the decision to have RelationSyncEntry and the publication list
in CacheMemoryContext gets down to v10 where this logical replication
has been introduced. This was a carefully-thought choice back then
because this is data that belongs to the process cache, so yes, this
choice makes sense to me.

The parent structure (PGOutputData) was stored in the "Logical
decoding context" even in v11. So, how does storing its member
'publications' in CacheMemoryContext a good idea? It is possible that
we are leaking memory while doing decoding via SQL APIs where we free
decoding context after getting changes though I haven't tested the
same.

--
With Regards,
Amit Kapila.

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Michael Paquier (#12)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 12:17 PM Michael Paquier <michael@paquier.xyz> wrote:

So how about the attached that introduces a FreePublication() matching
with GetPublication(), used to do the cleanup? Feel free to comment.

As you would have noted I am fine with the fix on these lines but I
suggest holding it till we conclude the memory context point raised by
me today. It is possible that we are still leaking some memory in
other related scenarios.

--
With Regards,
Amit Kapila.

#18Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Amit Kapila (#16)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Tuesday, December 3, 2024 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael@paquier.xyz>
wrote:

On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:

But that suits the current design more. We allocate PGOutputData and
other contexts in that structure in a "Logical decoding context". A
few of its members (publications, publication_names) residing in
totally unrelated contexts sounds odd. In the first place, we don't
need to allocate publications under CacheMemoryContext, they should
be allocated in PGOutputData->cachectx. However, because we need to
free those entirely at one-shot during invalidation processing, we
could use a new context as a child context of
PGOutputData->cachectx. Unless I am missing something, the current
memory context usage appears more like a coding convenience than a

thoughtful design decision.

PGOutputData->cachectx has been introduced in 2022 in commit
PGOutputData->52e4f0cd4,
while the decision to have RelationSyncEntry and the publication list
in CacheMemoryContext gets down to v10 where this logical replication
has been introduced. This was a carefully-thought choice back then
because this is data that belongs to the process cache, so yes, this
choice makes sense to me.

The parent structure (PGOutputData) was stored in the "Logical decoding
context" even in v11. So, how does storing its member 'publications' in
CacheMemoryContext a good idea? It is possible that we are leaking memory
while doing decoding via SQL APIs where we free decoding context after
getting changes though I haven't tested the same.

Right. I think I have faced this memory leak recently. It might be true for
walsender that 'publications' is a per-process content. But SQL APIs might use
different publication names each time during execution.

I can reproduce the memory leak due to allocating the publication
names under CacheMemoryContext like the following:

--
CREATE PUBLICATION pub FOR ALL TABLES;
CREATE TABLE stream_test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput');
INSERT INTO stream_test SELECT generate_series(1, 2, 1);

- he backend's memory usage increases with each execution of the following function
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '4', 'publication_names', 'pub,pub,........ <lots of pub names>');

Best Regards,
Hou zj

#19Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Alvaro Herrera (#9)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Tuesday, December 3, 2024 4:43 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Dec-02, Amit Kapila wrote:

you call anything that loads a Publication depending on how the
caller caches its data. So I would still choose for modifying the
structure on HEAD removing the pstrdup() for the publication name.

BTW, the subscription structure also used the name in a similar way.
This will make the publication/subscription names handled differently.

True (with conninfo, slotname, synccommit, and origin).

...

(Why are we storing a string in Subscription->synccommit?)

I think it's because the primary purpose of sub->synccommit is to serve as a
parameter for SetConfigOption() in the apply worker, which requires a string
value. Additionally, the existing function set_config_option() that validates
this option only accepts a string input. Although we could convert
sub->synccommit to an integer, this would necessitate additional conversion
code before passing it to these functions.

Best Regards,
Hou zj

#20Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#17)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 03, 2024 at 02:01:00PM +0530, Amit Kapila wrote:

As you would have noted I am fine with the fix on these lines but I
suggest holding it till we conclude the memory context point raised by
me today. It is possible that we are still leaking some memory in
other related scenarios.

Sure. I've not seen anything else, but things are complicated enough
in this code that a path could always have been missed.
--
Michael

#21Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Michael Paquier (#12)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-03, Michael Paquier wrote:

So how about the attached that introduces a FreePublication() matching
with GetPublication(), used to do the cleanup? Feel free to comment.

I think this doubles down on bad design in the logical replication code,
or at least it goes against what we do almost everywhere else in backend
code. We should do less freeing, more context deleting/resetting.
(Storing stuff in CacheMemoryContext was surely a mistake.)

If you don't like the idea of a static memcxt in the one block where
it's needed, I propose to store a new memcxt in PGOutputData, to be used
exclusively for publications, with a well defined lifetime. I'm against
reusing data->cachecxt, because the lifetime of that is 100% unclear.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#21)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If you don't like the idea of a static memcxt in the one block where
it's needed, I propose to store a new memcxt in PGOutputData, to be used
exclusively for publications, with a well defined lifetime.

+1. This sounds like a way to proceed at least for HEAD. For
back-branches, it is less clear whether changing PGOutputData is a
good idea. Can such a change in back branches break any existing
non-core code (extensions)?

--
With Regards,
Amit Kapila.

#23Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#22)
Re: Memory leak in WAL sender with pgoutput (v10~)

On 2024-Dec-03, Amit Kapila wrote:

On Tue, Dec 3, 2024 at 4:03 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

If you don't like the idea of a static memcxt in the one block where
it's needed, I propose to store a new memcxt in PGOutputData, to be used
exclusively for publications, with a well defined lifetime.

+1. This sounds like a way to proceed at least for HEAD. For
back-branches, it is less clear whether changing PGOutputData is a
good idea. Can such a change in back branches break any existing
non-core code (extensions)?

We can put the new member at the end of the struct, it shouldn't damage
anything even if they're using this struct -- which I find pretty
unlikely. The only way that could break anything is if somebody is
allocating/using arrays of it, which sounds even more unlikely.

If we don't want to accept that risk (for which I see no argument, but
happy to be proven wrong), I would suggest to use the foreach-pfree
pattern Michael first proposed for the backbranches, and the new memory
context in master. I think this is conducive to better coding overall
as we clean things up in this area.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#24Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#23)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 03, 2024 at 02:45:22PM +0100, Alvaro Herrera wrote:

We can put the new member at the end of the struct, it shouldn't damage
anything even if they're using this struct -- which I find pretty
unlikely. The only way that could break anything is if somebody is
allocating/using arrays of it, which sounds even more unlikely.

Yes, that sounds unlikely.

If we don't want to accept that risk (for which I see no argument, but
happy to be proven wrong), I would suggest to use the foreach-pfree
pattern Michael first proposed for the backbranches, and the new memory
context in master. I think this is conducive to better coding overall
as we clean things up in this area.

Is it really worth betting on nobody doing something that does a
sizeof(PGOutputData) for the stable branches? People like doing fancy
things, and we would not hear about such problems except if we push
the button making it a possibility because compiled code suddenly
breaks after a minor release update of the core engine.
--
Michael

#25Euler Taveira
euler@eulerto.com
In reply to: Michael Paquier (#24)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 3, 2024, at 7:41 PM, Michael Paquier wrote:

On Tue, Dec 03, 2024 at 02:45:22PM +0100, Alvaro Herrera wrote:

If we don't want to accept that risk (for which I see no argument, but
happy to be proven wrong), I would suggest to use the foreach-pfree
pattern Michael first proposed for the backbranches, and the new memory
context in master. I think this is conducive to better coding overall
as we clean things up in this area.

Is it really worth betting on nobody doing something that does a
sizeof(PGOutputData) for the stable branches? People like doing fancy
things, and we would not hear about such problems except if we push
the button making it a possibility because compiled code suddenly
breaks after a minor release update of the core engine.

Although, Debian code search [1]https://codesearch.debian.net/search?q=PGOutputData&amp;literal=0 says this data structure is not used outside
PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
upgrade (even if it is known that such data structure is from that particular
output plugin -- pgoutput -- and other output plugins generally have its own
data structure). +1 from Alvaro's proposal.

[1]: https://codesearch.debian.net/search?q=PGOutputData&amp;literal=0

--
Euler Taveira
EDB https://www.enterprisedb.com/

#26Michael Paquier
michael@paquier.xyz
In reply to: Euler Taveira (#25)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 03, 2024 at 09:46:06PM -0300, Euler Taveira wrote:

Although, Debian code search [1] says this data structure is not used outside
PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
upgrade (even if it is known that such data structure is from that particular
output plugin -- pgoutput -- and other output plugins generally have its own
data structure). +1 from Alvaro's proposal.

A lookup of the public repos of github did not show fancy with the
manipulation of the structure for peoject related to Postgres, either.

FWIW, I'm OK with the memory context reset solution as much as the
direct free calls as we are sure that they will be safe. And at the
end of the day, the problem would be solved with any of these
solutions. My votes would be +0.6 for the free and +0.5 for the mcxt
manipulation, so let's say that they are tied.

As Alvaro and yourself are in favor of the mcxt approach, then let's
go for it. Amit has concerns with other code paths that could be
similarly leaking. I'm not sure if this is worth waiting too long
based on how local the fix for the existing leak is with any of these
solutions.
--
Michael

#27Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#26)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 4, 2024 8:55 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 03, 2024 at 09:46:06PM -0300, Euler Taveira wrote:

Although, Debian code search [1] says this data structure is not used

outside

PostgreSQL, I wouldn't risk breaking third-party extensions during a minor
upgrade (even if it is known that such data structure is from that particular
output plugin -- pgoutput -- and other output plugins generally have its own
data structure). +1 from Alvaro's proposal.

A lookup of the public repos of github did not show fancy with the
manipulation of the structure for peoject related to Postgres, either.

FWIW, I'm OK with the memory context reset solution as much as the
direct free calls as we are sure that they will be safe. And at the
end of the day, the problem would be solved with any of these
solutions. My votes would be +0.6 for the free and +0.5 for the mcxt
manipulation, so let's say that they are tied.

As Alvaro and yourself are in favor of the mcxt approach, then let's
go for it.

+1

Amit has concerns with other code paths that could be
similarly leaking. I'm not sure if this is worth waiting too long
based on how local the fix for the existing leak is with any of these
solutions.

It appears there is an additional memory leak caused by allocating publication
names within the CacheMemoryContext, as noted in [1]/messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com. And it can also be fixed by
creating a separate memctx for publication names under the logical decoding
context. I think the approach makes sense since the lifespan of publication
names should ideally align with that of the logical decoding context.

[1]: /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

#28Amit Kapila
amit.kapila16@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#27)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 4, 2024 8:55 AM Michael Paquier <michael@paquier.xyz> wrote:

Amit has concerns with other code paths that could be
similarly leaking. I'm not sure if this is worth waiting too long
based on how local the fix for the existing leak is with any of these
solutions.

It appears there is an additional memory leak caused by allocating publication
names within the CacheMemoryContext, as noted in [1]. And it can also be fixed by
creating a separate memctx for publication names under the logical decoding
context. I think the approach makes sense since the lifespan of publication
names should ideally align with that of the logical decoding context.

Yeah, I don't think we can go with the proposed patch for the local
memory context as it is.

--
With Regards,
Amit Kapila.

#29Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#28)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 04, 2024 at 11:05:43AM +0530, Amit Kapila wrote:

On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:

It appears there is an additional memory leak caused by allocating publication
names within the CacheMemoryContext, as noted in [1]. And it can also be fixed by
creating a separate memctx for publication names under the logical decoding
context. I think the approach makes sense since the lifespan of publication
names should ideally align with that of the logical decoding context.

Yeah, I don't think we can go with the proposed patch for the local
memory context as it is.

Ah, indeed. I was missing your point. Would any of you like to write
a patch to achieve that?
--
Michael

#30Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#29)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 4, 2024 2:22 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 04, 2024 at 11:05:43AM +0530, Amit Kapila wrote:

On Wed, Dec 4, 2024 at 7:39 AM Zhijie Hou (Fujitsu)

<houzj.fnst@fujitsu.com> wrote:

It appears there is an additional memory leak caused by allocating
publication names within the CacheMemoryContext, as noted in [1]. And
it can also be fixed by creating a separate memctx for publication
names under the logical decoding context. I think the approach makes
sense since the lifespan of publication names should ideally align with that

of the logical decoding context.

Yeah, I don't think we can go with the proposed patch for the local
memory context as it is.

Ah, indeed. I was missing your point. Would any of you like to write a patch
to achieve that?

I can try to write a patch if no one else is working on this.

Best Regards,
Hou zj

#31Michael Paquier
michael@paquier.xyz
In reply to: Zhijie Hou (Fujitsu) (#30)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 04, 2024 at 06:42:55AM +0000, Zhijie Hou (Fujitsu) wrote:

I can try to write a patch if no one else is working on this.

If you have some room to write a patch, that would be really nice.
Thanks.
--
Michael

#32Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#31)
1 attachment(s)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 4, 2024 7:39 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 04, 2024 at 06:42:55AM +0000, Zhijie Hou (Fujitsu) wrote:

I can try to write a patch if no one else is working on this.

If you have some room to write a patch, that would be really nice.
Thanks.

No problem. Here is the patch for the HEAD. This patch introduces a new memory
context within PGOutputData, specifically for allocating memory for
publication_names. The new memory context is nested under the logical decoding
context, ensuring it is freed at the end of decoding through
FreeDecodingContext.

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1]/messages/by-id/202411300828.hwe55pzx5a4x@alvherre.pgsql for the back branches.

[1]: /messages/by-id/202411300828.hwe55pzx5a4x@alvherre.pgsql

Best Regards,
Hou zj

Attachments:

0001-Fix-memory-leak-in-pgoutput-with-publication-list-ca.patchapplication/octet-stream; name=0001-Fix-memory-leak-in-pgoutput-with-publication-list-ca.patchDownload
From d31e8003edaf2f53c437ff66400d3e6c61167daf Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Wed, 4 Dec 2024 15:13:03 +0800
Subject: [PATCH] Fix memory leak in pgoutput with publication list cache

The pgoutput module caches publication names in a list and frees the list upon
invalidation but forgot to free the actual publication names within the list
elements.

Additionally, these cached publication names are allocated under the
CacheMemoryContext, which does not get explicitly freed at the end of decoding.
While this is not an issue for logical decoding in walsender, as the process
exits at the end of decoding, it poses a memory leak risk when decoding via SQL
APIs like pg_logical_slot_peek_changes. Since these APIs creates a separate
decoding context with each execution, failing to free the publication names
results in a memory leak.

To address this, we create a separate memory context within the logical
decoding context and reset it each time the publication names cache is
invalidated. This ensures that the lifespan of the publication names aligns
with that of the logical decoding context.
---
 src/backend/replication/pgoutput/pgoutput.c | 19 ++++++++++---------
 src/include/replication/pgoutput.h          |  2 ++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f07..7a27ee38a0 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -436,6 +436,10 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 										   "logical replication cache context",
 										   ALLOCSET_DEFAULT_SIZES);
 
+	data->pubmemcxt = AllocSetContextCreate(ctx->context,
+											"logical replication publication list context",
+											ALLOCSET_DEFAULT_SIZES);
+
 	ctx->output_plugin_private = data;
 
 	/* This plugin uses binary protocol. */
@@ -1734,9 +1738,9 @@ pgoutput_origin_filter(LogicalDecodingContext *ctx,
 /*
  * Shutdown the output plugin.
  *
- * Note, we don't need to clean the data->context and data->cachectx as
- * they are child contexts of the ctx->context so they will be cleaned up by
- * logical decoding machinery.
+ * Note, we don't need to clean the data->context, data->cachectx, and
+ * data->pubmemcxt as they are child contexts of the ctx->context so they
+ * will be cleaned up by logical decoding machinery.
  */
 static void
 pgoutput_shutdown(LogicalDecodingContext *ctx)
@@ -2063,12 +2067,9 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			MemoryContextReset(data->pubmemcxt);
+
+			oldctx = MemoryContextSwitchTo(data->pubmemcxt);
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 89f94e1147..89346c4067 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,8 @@ typedef struct PGOutputData
 	MemoryContext context;		/* private memory context for transient
 								 * allocations */
 	MemoryContext cachectx;		/* private memory context for cache data */
+	MemoryContext pubmemcxt;	/* private memory context for
+								 * publication_names */
 
 	bool		in_streaming;	/* true if we are streaming a chunk of
 								 * transaction */
-- 
2.30.0.windows.2

#33Michael Paquier
michael@paquier.xyz
In reply to: Zhijie Hou (Fujitsu) (#32)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 05, 2024 at 04:31:56AM +0000, Zhijie Hou (Fujitsu) wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

[1] /messages/by-id/202411300828.hwe55pzx5a4x@alvherre.pgsql

Thanks for the patch.

For HEAD it should be as good as it can be as it avoids the problem of
CacheMemoryContext bloating for your case and my case. Alvaro's patch
would not take care of your case, unfortunately, but I'm less worried
about this case in the back branches and we don't track the parent
context where StartupDecodingContext() has begun its work when
building PGOutputData. Thoughts?
--
Michael

#34Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#32)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 5, 2024 at 1:32 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 4, 2024 7:39 PM Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Dec 04, 2024 at 06:42:55AM +0000, Zhijie Hou (Fujitsu) wrote:

I can try to write a patch if no one else is working on this.

If you have some room to write a patch, that would be really nice.
Thanks.

No problem. Here is the patch for the HEAD. This patch introduces a new memory
context within PGOutputData, specifically for allocating memory for
publication_names. The new memory context is nested under the logical decoding
context, ensuring it is freed at the end of decoding through
FreeDecodingContext.

+1 for using new memory context to fix the issue for HEAD.

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions. I think we can live with such (a bit of) ugliness on back
branches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#35Euler Taveira
euler@eulerto.com
In reply to: Zhijie Hou (Fujitsu) (#32)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 5, 2024, at 1:31 AM, Zhijie Hou (Fujitsu) wrote:

No problem. Here is the patch for the HEAD. This patch introduces a new memory
context within PGOutputData, specifically for allocating memory for
publication_names. The new memory context is nested under the logical decoding
context, ensuring it is freed at the end of decoding through
FreeDecodingContext.

Thanks for taking care of it. I suggest 2 small adjustments: (a) use
ALLOCSET_SMALL_SIZES instead of ALLOCSET_DEFAULT_SIZES and (b) replace
pubmemcxt with pubmemctx (that's the same abbreviation used by
cachectx). I think you could remove 'mem' from this variable. My
suggestions are pubcxt or pubnamescxt. Although, I prefer the former, if
other publication elements are added to this context in the future.

--
Euler Taveira
EDB https://www.enterprisedb.com/

#36Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Euler Taveira (#35)
1 attachment(s)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Thursday, December 5, 2024 11:39 PM Euler Taveira <euler@eulerto.com> wrote:

Thanks for taking care of it. I suggest 2 small adjustments: (a) use
ALLOCSET_SMALL_SIZES instead of ALLOCSET_DEFAULT_SIZES and (b) replace
pubmemcxt with pubmemctx (that's the same abbreviation used by
cachectx). I think you could remove 'mem' from this variable. My
suggestions are pubcxt or pubnamescxt. Although, I prefer the former, if
other publication elements are added to this context in the future.

Thanks for the suggestions. They make sense to me.

Please see the updated version as attached.

Best Regards,
Hou zj

Attachments:

v2-0001-Fix-memory-leak-in-pgoutput-with-publication-list.patchapplication/octet-stream; name=v2-0001-Fix-memory-leak-in-pgoutput-with-publication-list.patchDownload
From 240f738907b28912f177ef95906ec5465589fc8a Mon Sep 17 00:00:00 2001
From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
Date: Wed, 4 Dec 2024 15:13:03 +0800
Subject: [PATCH v2] Fix memory leak in pgoutput with publication list cache

The pgoutput module caches publication names in a list and frees the list upon
invalidation but forgot to free the actual publication names within the list
elements.

Additionally, these cached publication names are allocated under the
CacheMemoryContext, which does not get explicitly freed at the end of decoding.
While this is not an issue for logical decoding in walsender, as the process
exits at the end of decoding, it poses a memory leak risk when decoding via SQL
APIs like pg_logical_slot_peek_changes. Since these APIs creates a separate
decoding context with each execution, failing to free the publication names
results in a memory leak.

To address this, we create a separate memory context within the logical
decoding context and reset it each time the publication names cache is
invalidated. This ensures that the lifespan of the publication names aligns
with that of the logical decoding context.
---
 src/backend/replication/pgoutput/pgoutput.c | 19 ++++++++++---------
 src/include/replication/pgoutput.h          |  2 ++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 5e23453f07..b50b3d62e3 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -436,6 +436,10 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 										   "logical replication cache context",
 										   ALLOCSET_DEFAULT_SIZES);
 
+	data->pubctx = AllocSetContextCreate(ctx->context,
+										 "logical replication publication list context",
+										 ALLOCSET_SMALL_SIZES);
+
 	ctx->output_plugin_private = data;
 
 	/* This plugin uses binary protocol. */
@@ -1734,9 +1738,9 @@ pgoutput_origin_filter(LogicalDecodingContext *ctx,
 /*
  * Shutdown the output plugin.
  *
- * Note, we don't need to clean the data->context and data->cachectx as
- * they are child contexts of the ctx->context so they will be cleaned up by
- * logical decoding machinery.
+ * Note, we don't need to clean the data->context, data->cachectx, and
+ * data->pubctx as they are child contexts of the ctx->context so they
+ * will be cleaned up by logical decoding machinery.
  */
 static void
 pgoutput_shutdown(LogicalDecodingContext *ctx)
@@ -2063,12 +2067,9 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			MemoryContextReset(data->pubctx);
+
+			oldctx = MemoryContextSwitchTo(data->pubctx);
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 89f94e1147..93e66eaa5d 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -20,6 +20,8 @@ typedef struct PGOutputData
 	MemoryContext context;		/* private memory context for transient
 								 * allocations */
 	MemoryContext cachectx;		/* private memory context for cache data */
+	MemoryContext pubctx;		/* private memory context for
+								 * publication_names */
 
 	bool		in_streaming;	/* true if we are streaming a chunk of
 								 * transaction */
-- 
2.31.1

#37Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Michael Paquier (#33)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Thursday, December 5, 2024 12:52 PM Michael Paquier <michael@paquier.xyz> wrote:

Hi,

On Thu, Dec 05, 2024 at 04:31:56AM +0000, Zhijie Hou (Fujitsu) wrote:

I realized that this patch cannot be backpatched because it introduces
a new field into the public PGOutputData structure. Therefore, I think
we may need to use Alvaro's version [1] for the back branches.

Thanks for the patch.

For HEAD it should be as good as it can be as it avoids the problem of
CacheMemoryContext bloating for your case and my case. Alvaro's patch
would not take care of your case, unfortunately, but I'm less worried about this
case in the back branches and we don't track the parent context where
StartupDecodingContext() has begun its work when building PGOutputData.
Thoughts?

I am fine with the plan. Thanks.

Best Regards,
Hou zj

#38Michael Paquier
michael@paquier.xyz
In reply to: Zhijie Hou (Fujitsu) (#36)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Fri, Dec 06, 2024 at 08:23:00AM +0000, Zhijie Hou (Fujitsu) wrote:

Thanks for the suggestions. They make sense to me.

Please see the updated version as attached.

It sounds to me that we are in agreement for HEAD, so I've moved ahead
and fixed this issue on HEAD using your patch that adds a dedicated
memory context in PGOutputData as that's the cleanest way to address
things in a single execution context of pgoutput.

For stable branches, let's see.. I need to reply to the latest
message.
--
Michael

#39Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#34)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions.

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]/messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com won't
be solved.

[1]: /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

#40Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#39)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions.

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#41Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#39)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Alvaro's solution is not perfect either as we would still bloat some
memory in the CacheMemoryContext when a single execution of the
logical decoding context finishes, but the static context approach
proposed at [2]/messages/by-id/202411300828.hwe55pzx5a4x@alvherre.pgsql -- Michael has the merit to limit the damage on repetitive calls
when an invalidation can happen as well as in WAL senders with
periodic cleanups. The free APIs would just lose track of these
pointers: good for WAL senders, not for repetitive pgoutput calls.

Instead of ALLOCSET_DEFAULT_SIZES, it seems to me that we should
switch to ALLOCSET_SMALL_SIZES for consistency with HEAD in Alvaro's
patch. I would also switch "publication list context" to "logical
replication publication list context" to match with HEAD, while on it.

Would all that be OK for everybody in terms of what to do in stable
branches down to 13?

[2]: /messages/by-id/202411300828.hwe55pzx5a4x@alvherre.pgsql -- Michael
--
Michael

#42Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#40)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 09, 2024 at 12:46:35PM -0800, Masahiko Sawada wrote:

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.

Right. I'm also slightly worried about how we handle streamed_txns in
set_schema_sent_in_streamed_txn() through CacheMemoryContext. It
feels like this could be made more robust without relying on an
explicit list_free() in get_rel_sync_entry(), including the fact that
some cleanup also relies on pgoutput_stream_abort() being taken.

As a whole, thinking long-term, it seems to me that we'd live better
if we remove all direct dependencies to CacheMemoryContext in this
code.
--
Michael

#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#40)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions.

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.

We have also noticed this but it needs more analysis on the fix which
one of my colleagues is doing. I think we can fix this as a separate
issue unless you think otherwise.

--
With Regards,
Amit Kapila.

#44vignesh C
vignesh21@gmail.com
In reply to: Michael Paquier (#41)
2 attachment(s)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Alvaro's solution is not perfect either as we would still bloat some
memory in the CacheMemoryContext when a single execution of the
logical decoding context finishes, but the static context approach
proposed at [2] has the merit to limit the damage on repetitive calls
when an invalidation can happen as well as in WAL senders with
periodic cleanups. The free APIs would just lose track of these
pointers: good for WAL senders, not for repetitive pgoutput calls.

Instead of ALLOCSET_DEFAULT_SIZES, it seems to me that we should
switch to ALLOCSET_SMALL_SIZES for consistency with HEAD in Alvaro's
patch. I would also switch "publication list context" to "logical
replication publication list context" to match with HEAD, while on it.

Yes, that makes sense. How about something like the attached patch.

Regards,
Vignesh

Attachments:

v1-0001-Fix-memory-leak-in-pgoutput-with-static-memory-co_PG14.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-memory-leak-in-pgoutput-with-static-memory-co_PG14.patchDownload
From 3695caaa087faac36347847052cecf6c1d133956 Mon Sep 17 00:00:00 2001
From: Vignesh <vignesh21@gmail.com>
Date: Tue, 10 Dec 2024 07:28:38 +0530
Subject: [PATCH v1] Fix memory leak in pgoutput with static memory context.

The pgoutput module caches publication names in a list and frees it upon
invalidation.  However, the code forgot to free the actual publication
names within the list elements, as publication names are pstrdup()'d in
GetPublication().  This would cause memory to leak in
CacheMemoryContext, bloating it over time as this context is not
cleaned.

This is a problem for WAL senders running for a long time, as an
accumulation of invalidation requests would bloat its cache memory
usage.  A second case, where this leak is easier to see, involves a
backend calling SQL functions like pg_logical_slot_{get,peek}_changes()
which create a new decoding context with each execution.  More
publications create more bloat.

To address this, this commit adds a new static memory context and
resets it each time the publication names cache is invalidated. This
ensures that the lifespan of the publication names aligns with that of
the logical decoding context.
---
 src/backend/replication/pgoutput/pgoutput.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index df2ea94d46..78c81888c4 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1071,9 +1071,16 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-				list_free_deep(data->publications);
+			static MemoryContext pubctx = NULL;
+
+			if (pubctx == NULL)
+				pubctx = AllocSetContextCreate(CacheMemoryContext,
+											   "logical replication publication list context",
+											   ALLOCSET_SMALL_SIZES);
+			else
+				MemoryContextReset(pubctx);
+
+			oldctx = MemoryContextSwitchTo(pubctx);
 
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
-- 
2.43.0

v1-0001-Fix-memory-leak-in-pgoutput-with-static-memory-co_PG17.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-memory-leak-in-pgoutput-with-static-memory-co_PG17.patchDownload
From 3d2607acdeeb56715ddd29cdc9456470a457350f Mon Sep 17 00:00:00 2001
From: Vignesh <vignesh21@gmail.com>
Date: Tue, 10 Dec 2024 06:21:32 +0530
Subject: [PATCH v1] Fix memory leak in pgoutput with static memory context.

The pgoutput module caches publication names in a list and frees it upon
invalidation.  However, the code forgot to free the actual publication
names within the list elements, as publication names are pstrdup()'d in
GetPublication().  This would cause memory to leak in
CacheMemoryContext, bloating it over time as this context is not
cleaned.

This is a problem for WAL senders running for a long time, as an
accumulation of invalidation requests would bloat its cache memory
usage.  A second case, where this leak is easier to see, involves a
backend calling SQL functions like pg_logical_slot_{get,peek}_changes()
which create a new decoding context with each execution.  More
publications create more bloat.

To address this, this commit adds a new static memory context and
resets it each time the publication names cache is invalidated. This
ensures that the lifespan of the publication names aligns with that of
the logical decoding context.
---
 src/backend/replication/pgoutput/pgoutput.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index c9c952a278..b047d4dcd3 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -2024,12 +2024,17 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			static MemoryContext pubctx = NULL;
+
+			if (pubctx == NULL)
+				pubctx = AllocSetContextCreate(CacheMemoryContext,
+											   "logical replication publication list context",
+											   ALLOCSET_SMALL_SIZES);
+			else
+				MemoryContextReset(pubctx);
+
+			oldctx = MemoryContextSwitchTo(pubctx);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
-- 
2.43.0

#45Michael Paquier
michael@paquier.xyz
In reply to: vignesh C (#44)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 03:24:19AM +0000, vignesh C wrote:

Yes, that makes sense. How about something like the attached patch.

So you have this bit hidden in 7f481b8d3884, causing a small conflict
when cherry-picking the change from 15 to 14:

-            oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-            if (data->publications)
-                list_free_deep(data->publications);
+            static MemoryContext pubctx = NULL;
[...]
-            if (data->publications)
-            {
-                list_free_deep(data->publications);
-                data->publications = NIL;
-            }
+            static MemoryContext pubctx = NULL;

Your versions look OK at quick glance, for the publication list
caching.
--
Michael

#46Amit Kapila
amit.kapila16@gmail.com
In reply to: vignesh C (#44)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

--
With Regards,
Amit Kapila.

#47Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#46)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

--
With Regards,
Amit Kapila.

#48Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#43)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Mon, Dec 9, 2024 at 6:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions.

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.

We have also noticed this but it needs more analysis on the fix which
one of my colleagues is doing. I think we can fix this as a separate
issue unless you think otherwise.

I agree to fix this as a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#49Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#47)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com> wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

[1] - /messages/by-id/OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#50Zhijie Hou (Fujitsu)
houzj.fnst@fujitsu.com
In reply to: Masahiko Sawada (#49)
RE: Memory leak in WAL sender with pgoutput (v10~)

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

Best Regards,
Hou zj

#51vignesh C
vignesh21@gmail.com
In reply to: Masahiko Sawada (#48)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, 10 Dec 2024 at 23:36, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 9, 2024 at 6:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Dec 10, 2024 at 2:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Dec 9, 2024 at 2:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Dec 5, 2024 at 2:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that this patch cannot be backpatched because it introduces a new
field into the public PGOutputData structure. Therefore, I think we may need to
use Alvaro's version [1] for the back branches.

FWIW for back branches, I prefer using the foreach-pfree pattern
Michael first proposed, just in case. It's not elegant but it can
solve the problem while there is no risk of breaking non-core
extensions.

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1] won't
be solved.

True. There seems another place where we possibly leak memory on
CacheMemoryContext when using pgoutput via SQL APIs:

/* Map must live as long as the session does. */
oldctx = MemoryContextSwitchTo(CacheMemoryContext);

entry->attrmap = build_attrmap_by_name_if_req(indesc, outdesc, false);

MemoryContextSwitchTo(oldctx);
RelationClose(ancestor);

entry->attrmap is pfree'd only when validating the RelationSyncEntry
so remains even after logical decoding API calls.

We have also noticed this but it needs more analysis on the fix which
one of my colleagues is doing. I think we can fix this as a separate
issue unless you think otherwise.

I agree to fix this as a separate patch.

Thanks Sawada-san, I have started a new thread with a test case which
can reproduce this issue at [1]/messages/by-id/CALDaNm1hewNAsZ_e6FF52a=9drmkRJxtEPrzCB6-9mkJyeBBqA@mail.gmail.com:
[1]: /messages/by-id/CALDaNm1hewNAsZ_e6FF52a=9drmkRJxtEPrzCB6-9mkJyeBBqA@mail.gmail.com

Regards,
Vignesh

#52Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Zhijie Hou (Fujitsu) (#50)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#53Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#52)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Your preference is not completely clear. Are you okay with the idea of
Vignesh's currently proposed patch for back-branches, or do you prefer
to use a memory context reset callback, or do you have a different
idea that should be adopted for back-branches?

--
With Regards,
Amit Kapila.

#54Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#53)
3 attachment(s)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Your preference is not completely clear. Are you okay with the idea of
Vignesh's currently proposed patch for back-branches, or do you prefer
to use a memory context reset callback, or do you have a different
idea that should be adopted for back-branches?

IIUC the current Vignesh's patch[1]/messages/by-id/CALDaNm2C=mYNB9ZUS5t9irB2P0Tjm_r+nRVc717JdO+NtCVunw@mail.gmail.com doesn't solve the memory leak in
case of using logical decoding APIs, as you mentioned. I've tried the
idea of using memory context reset callback to reset pubctx. We need
to register the callback to LogicalContextDecodingContext->context,
meaning that we need to pass it to get_rel_sync_entry() (see
fix_memory_leak_v1.patch). I don't prefer this approach as it could
make backpatching complex in the future. Alternatively, we can declare
pubctx as a file static variable, create the memory context at the
startup callback, reset the pubctx at the shutdown callback, and use
the memory context reset callback to ensure the pubctx is reset (see
fix_memory_leak_v2.patch).Or I think we might not necessarily need to
use the memory context reset callback (see fix_memory_leak_v3.patch).
I prefer the latter two approaches.

Regards,

[1]: /messages/by-id/CALDaNm2C=mYNB9ZUS5t9irB2P0Tjm_r+nRVc717JdO+NtCVunw@mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

fix_memory_leak_v2.patchapplication/octet-stream; name=fix_memory_leak_v2.patchDownload
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index c9c952a278f..58d87a13eb9 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -80,6 +80,7 @@ static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
 										ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
 
 static bool publications_valid;
+static MemoryContext pubctx = NULL;
 
 static List *LoadPublications(List *pubnames);
 static void publication_invalidation_cb(Datum arg, int cacheid,
@@ -234,6 +235,7 @@ static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 								TupleTableSlot **new_slot_ptr,
 								RelationSyncEntry *entry,
 								ReorderBufferChangeType *action);
+static void pgoutput_pubctx_reset_callback(void *arg);
 
 /* column list routines */
 static void pgoutput_column_list_init(PGOutputData *data,
@@ -420,6 +422,7 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 {
 	PGOutputData *data = palloc0(sizeof(PGOutputData));
 	static bool publication_callback_registered = false;
+	MemoryContextCallback *mcallback;
 
 	/* Create our memory context for private allocations. */
 	data->context = AllocSetContextCreate(ctx->context,
@@ -430,6 +433,15 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 										   "logical replication cache context",
 										   ALLOCSET_DEFAULT_SIZES);
 
+	Assert(pubctx == NULL);
+	pubctx = AllocSetContextCreate(ctx->context,
+								   "logical replication publication list context",
+								   ALLOCSET_SMALL_SIZES);
+
+	mcallback = palloc0(sizeof(MemoryContextCallback));
+	mcallback->func = pgoutput_pubctx_reset_callback;
+	MemoryContextRegisterResetCallback(ctx->context, mcallback);
+
 	ctx->output_plugin_private = data;
 
 	/* This plugin uses binary protocol. */
@@ -1696,9 +1708,9 @@ pgoutput_origin_filter(LogicalDecodingContext *ctx,
 /*
  * Shutdown the output plugin.
  *
- * Note, we don't need to clean the data->context and data->cachectx as
- * they are child contexts of the ctx->context so they will be cleaned up by
- * logical decoding machinery.
+ * Note, we don't need to clean the data->context, data->cachectx and pubctx
+ * as they are child contexts of the ctx->context so they will be cleaned up
+ * by logical decoding machinery.
  */
 static void
 pgoutput_shutdown(LogicalDecodingContext *ctx)
@@ -1708,6 +1720,8 @@ pgoutput_shutdown(LogicalDecodingContext *ctx)
 		hash_destroy(RelationSyncCache);
 		RelationSyncCache = NULL;
 	}
+
+	pubctx = NULL;
 }
 
 /*
@@ -2024,12 +2038,11 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			Assert(pubctx);
+
+			MemoryContextReset(pubctx);
+			oldctx = MemoryContextSwitchTo(pubctx);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
@@ -2402,3 +2415,9 @@ send_repl_origin(LogicalDecodingContext *ctx, RepOriginId origin_id,
 		}
 	}
 }
+
+static void
+pgoutput_pubctx_reset_callback(void *arg)
+{
+	pubctx = NULL;
+}
fix_memory_leak_v1.patchapplication/octet-stream; name=fix_memory_leak_v1.patchDownload
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index c9c952a278f..33fe1461e11 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -211,7 +211,7 @@ static HTAB *RelationSyncCache = NULL;
 
 static void init_rel_sync_cache(MemoryContext cachectx);
 static void cleanup_rel_sync_cache(TransactionId xid, bool is_commit);
-static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data,
+static RelationSyncEntry *get_rel_sync_entry(LogicalDecodingContext *ctx,
 											 Relation relation);
 static void rel_sync_cache_relation_cb(Datum arg, Oid relid);
 static void rel_sync_cache_publication_cb(Datum arg, int cacheid,
@@ -234,6 +234,7 @@ static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,
 								TupleTableSlot **new_slot_ptr,
 								RelationSyncEntry *entry,
 								ReorderBufferChangeType *action);
+static void pgoutput_pubctx_reset_callback(void *arg);
 
 /* column list routines */
 static void pgoutput_column_list_init(PGOutputData *data,
@@ -1426,7 +1427,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 	if (data->in_streaming)
 		xid = change->txn->xid;
 
-	relentry = get_rel_sync_entry(data, relation);
+	relentry = get_rel_sync_entry(ctx, relation);
 
 	/* First check the table filter */
 	switch (action)
@@ -1598,7 +1599,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
 		if (!is_publishable_relation(relation))
 			continue;
 
-		relentry = get_rel_sync_entry(data, relation);
+		relentry = get_rel_sync_entry(ctx, relation);
 
 		if (!relentry->pubactions.pubtruncate)
 			continue;
@@ -1970,8 +1971,9 @@ set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid)
  * when publishing.
  */
 static RelationSyncEntry *
-get_rel_sync_entry(PGOutputData *data, Relation relation)
+get_rel_sync_entry(LogicalDecodingContext *ctx, Relation relation)
 {
+	PGOutputData *data = (PGOutputData *) ctx->output_plugin_private;
 	RelationSyncEntry *entry;
 	bool		found;
 	MemoryContext oldctx;
@@ -2024,12 +2026,27 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
+			static MemoryContext pubctx = NULL;
+
+			if (pubctx == NULL)
 			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
+				MemoryContextCallback *mcallback;
+
+				pubctx = AllocSetContextCreate(ctx->context,
+											   "logical replication publication list context",
+											   ALLOCSET_SMALL_SIZES);
+
+				mcallback = MemoryContextAllocZero(ctx->context,
+												   sizeof(MemoryContextCallback));
+				mcallback->func = pgoutput_pubctx_reset_callback;
+				mcallback->arg = &pubctx;
+				MemoryContextRegisterResetCallback(ctx->context, mcallback);
 			}
+			else
+				MemoryContextReset(pubctx);
+
+			oldctx = MemoryContextSwitchTo(pubctx);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
@@ -2402,3 +2419,11 @@ send_repl_origin(LogicalDecodingContext *ctx, RepOriginId origin_id,
 		}
 	}
 }
+
+static void
+pgoutput_pubctx_reset_callback(void *arg)
+{
+	MemoryContext *pubctx = (MemoryContext *) arg;
+
+	*pubctx = NULL;
+}
fix_memory_leak_v3.patchapplication/octet-stream; name=fix_memory_leak_v3.patchDownload
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index c9c952a278f..c715e8b8d4c 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -80,6 +80,7 @@ static void pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx,
 										ReorderBufferTXN *txn, XLogRecPtr prepare_lsn);
 
 static bool publications_valid;
+static MemoryContext pubctx = NULL;
 
 static List *LoadPublications(List *pubnames);
 static void publication_invalidation_cb(Datum arg, int cacheid,
@@ -430,6 +431,10 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 										   "logical replication cache context",
 										   ALLOCSET_DEFAULT_SIZES);
 
+	pubctx = AllocSetContextCreate(ctx->context,
+								   "logical replication publication list context",
+								   ALLOCSET_SMALL_SIZES);
+
 	ctx->output_plugin_private = data;
 
 	/* This plugin uses binary protocol. */
@@ -1696,9 +1701,9 @@ pgoutput_origin_filter(LogicalDecodingContext *ctx,
 /*
  * Shutdown the output plugin.
  *
- * Note, we don't need to clean the data->context and data->cachectx as
- * they are child contexts of the ctx->context so they will be cleaned up by
- * logical decoding machinery.
+ * Note, we don't need to clean the data->context, data->cachectx and pubctx
+ * as they are child contexts of the ctx->context so they will be cleaned up
+ * by logical decoding machinery.
  */
 static void
 pgoutput_shutdown(LogicalDecodingContext *ctx)
@@ -2024,12 +2029,11 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
 		/* Reload publications if needed before use. */
 		if (!publications_valid)
 		{
-			oldctx = MemoryContextSwitchTo(CacheMemoryContext);
-			if (data->publications)
-			{
-				list_free_deep(data->publications);
-				data->publications = NIL;
-			}
+			Assert(pubctx);
+
+			MemoryContextReset(pubctx);
+			oldctx = MemoryContextSwitchTo(pubctx);
+
 			data->publications = LoadPublications(data->publication_names);
 			MemoryContextSwitchTo(oldctx);
 			publications_valid = true;
#55Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#54)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 12, 2024 at 11:52:20AM -0800, Masahiko Sawada wrote:

IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
case of using logical decoding APIs, as you mentioned. I've tried the
idea of using memory context reset callback to reset pubctx. We need
to register the callback to LogicalContextDecodingContext->context,
meaning that we need to pass it to get_rel_sync_entry() (see
fix_memory_leak_v1.patch). I don't prefer this approach as it could
make backpatching complex in the future. Alternatively, we can declare
pubctx as a file static variable, create the memory context at the
startup callback, reset the pubctx at the shutdown callback, and use
the memory context reset callback to ensure the pubctx is reset (see
fix_memory_leak_v2.patch).Or I think we might not necessarily need to
use the memory context reset callback (see fix_memory_leak_v3.patch).
I prefer the latter two approaches.

+    pubctx = AllocSetContextCreate(ctx->context,
+                                   "logical replication publication list context",
+                                   ALLOCSET_SMALL_SIZES);
+

Knowing that there can be only one pgoutput context running at a given
time and that pubctx would be reset automatically when exiting
pgoutput with its parent context, I find the simplicity of v3
tempting. Now, keeping in the stack a static pointer that could point
to the void depending on where we are makes me really uneasy because
that could be the source of more bugs (think a-la-CVE if the pointer
points to something that gets reallocated later on still is referenced
in this process because of something), so v2 where the pointer is
reset when leaving the pgoutput context has a much better idea of how
to do the job.
--
Michael

#56Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#54)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Fri, Dec 13, 2024 at 1:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Your preference is not completely clear. Are you okay with the idea of
Vignesh's currently proposed patch for back-branches, or do you prefer
to use a memory context reset callback, or do you have a different
idea that should be adopted for back-branches?

IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
case of using logical decoding APIs, as you mentioned.

Right, but note that it wouldn't leak memory on repeated calls to the
API. Only if the backend ever makes a single call for get_changes will
it leak memory once, which is not ideal. Still, we can live with it if
the other approaches are complex for back branches.

I've tried the
idea of using memory context reset callback to reset pubctx. We need
to register the callback to LogicalContextDecodingContext->context,
meaning that we need to pass it to get_rel_sync_entry() (see
fix_memory_leak_v1.patch). I don't prefer this approach as it could
make backpatching complex in the future. Alternatively, we can declare
pubctx as a file static variable, create the memory context at the
startup callback, reset the pubctx at the shutdown callback, and use
the memory context reset callback to ensure the pubctx is reset (see
fix_memory_leak_v2.patch).Or I think we might not necessarily need to
use the memory context reset callback (see fix_memory_leak_v3.patch).
I prefer the latter two approaches.

Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
SQL API? If so, how?

--
With Regards,
Amit Kapila.

#57Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Amit Kapila (#56)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Tue, Dec 17, 2024 at 2:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Dec 13, 2024 at 1:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:

On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit.kapila16@gmail.com>

wrote:

On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21@gmail.com>

wrote:

On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael@paquier.xyz>

wrote:

On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:

It couldn't solve the problem completely even in back-branches. The
SQL API case I mentioned and tested by Hou-San in the email [1]

won't

be solved.

[1] -

/messages/by-id/OS0PR01MB57166A4DA0ABBB94F
2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
thanks!).

Yes, that makes sense. How about something like the attached patch.

- oldctx = MemoryContextSwitchTo(CacheMemoryContext);
- if (data->publications)
- {
- list_free_deep(data->publications);
- data->publications = NIL;
- }
+ static MemoryContext pubctx = NULL;
+
+ if (pubctx == NULL)
+ pubctx = AllocSetContextCreate(CacheMemoryContext,
+    "logical replication publication list context",
+    ALLOCSET_SMALL_SIZES);
+ else
+ MemoryContextReset(pubctx);
+
+ oldctx = MemoryContextSwitchTo(pubctx);

Considering the SQL API case, why is it okay to allocate this context
under CacheMemoryContext?

On further thinking, we can't allocate it under
LogicalDecodingContext->context because once that is freed at the end
of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
dangling memory. One idea is that we use
MemoryContextRegisterResetCallback() to invoke a reset callback
function where we can reset pubctx but not sure if we want to go there
in back branches. OTOH, the currently proposed fix won't leak memory
on repeated calls to pg_logical_slot_get_changes(), so that might be
okay as well.

Thoughts?

Alternative idea is to declare pubctx as a file static variable. And
we create the memory context under LogicalDecodingContext->context in
the startup callback and free it in the shutdown callback.

I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
API, the shutdown callback function is not invoked. This would result in the
static variable not being reset, which, I think, is why Amit mentioned the use
of MemoryContextRegisterResetCallback().

My idea is that since that new context is cleaned up together with its
parent context (LogicalDecodingContext->context), we unconditionally
set that new context to the static variable at the startup callback.
That being said, Amit's idea would be cleaner.

Your preference is not completely clear. Are you okay with the idea of
Vignesh's currently proposed patch for back-branches, or do you prefer
to use a memory context reset callback, or do you have a different
idea that should be adopted for back-branches?

IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
case of using logical decoding APIs, as you mentioned.

Right, but note that it wouldn't leak memory on repeated calls to the
API. Only if the backend ever makes a single call for get_changes will
it leak memory once, which is not ideal. Still, we can live with it if
the other approaches are complex for back branches.

True. I missed that point.

I've tried the
idea of using memory context reset callback to reset pubctx. We need
to register the callback to LogicalContextDecodingContext->context,
meaning that we need to pass it to get_rel_sync_entry() (see
fix_memory_leak_v1.patch). I don't prefer this approach as it could
make backpatching complex in the future. Alternatively, we can declare
pubctx as a file static variable, create the memory context at the
startup callback, reset the pubctx at the shutdown callback, and use
the memory context reset callback to ensure the pubctx is reset (see
fix_memory_leak_v2.patch).Or I think we might not necessarily need to
use the memory context reset callback (see fix_memory_leak_v3.patch).
I prefer the latter two approaches.

Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
SQL API? If so, how?

The pubctx is created as a child of LogicalDecodingContext->context.
On an error, the pubctx is cleaned up altogether when cleaning up
LogicalDecodingContext->context.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#58Amit Kapila
amit.kapila16@gmail.com
In reply to: Masahiko Sawada (#57)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 18, 2024 at 12:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Dec 17, 2024 at 2:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Will fix_memory_leak_v3.patch avoid the leak in case of an ERROR in
SQL API? If so, how?

The pubctx is created as a child of LogicalDecodingContext->context.
On an error, the pubctx is cleaned up altogether when cleaning up
LogicalDecodingContext->context.

The difference between fix_memory_leak_v2 and fix_memory_leak_v3 is
that the earlier one resets the pubctx to NULL along with freeing the
context memory. Resetting a file-level global variable is a good idea,
similar to what we do for RelationSyncCache, so I prefer v2 over v3,
but I am fine if you would like to proceed with v3.

--
With Regards,
Amit Kapila.

#59Michael Paquier
michael@paquier.xyz
In reply to: Amit Kapila (#58)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 19, 2024 at 07:50:57AM +0530, Amit Kapila wrote:

The difference between fix_memory_leak_v2 and fix_memory_leak_v3 is
that the earlier one resets the pubctx to NULL along with freeing the
context memory. Resetting a file-level global variable is a good idea,
similar to what we do for RelationSyncCache, so I prefer v2 over v3,
but I am fine if you would like to proceed with v3.

FWIW, I am not OK with v3. I've raised this exact point a couple of
days ago upthread:
/messages/by-id/Z1t5pXsNEYwS4P5k@paquier.xyz

v2 does not have these weaknesses by design.
--
Michael

#60Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#59)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Wed, Dec 18, 2024 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 19, 2024 at 07:50:57AM +0530, Amit Kapila wrote:

The difference between fix_memory_leak_v2 and fix_memory_leak_v3 is
that the earlier one resets the pubctx to NULL along with freeing the
context memory. Resetting a file-level global variable is a good idea,
similar to what we do for RelationSyncCache, so I prefer v2 over v3,
but I am fine if you would like to proceed with v3.

FWIW, I am not OK with v3. I've raised this exact point a couple of
days ago upthread:
/messages/by-id/Z1t5pXsNEYwS4P5k@paquier.xyz

v2 does not have these weaknesses by design.

I agree that v2 is better than v3 in terms of that.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#61Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#60)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 19, 2024 at 09:27:04AM -0800, Masahiko Sawada wrote:

On Wed, Dec 18, 2024 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:

v2 does not have these weaknesses by design.

I agree that v2 is better than v3 in terms of that.

Okay. In terms of the backbranches, would you prefer that I handle
this patch myself as I have done the HEAD part? This would need a
second, closer, review but I could do that at the beginning of next
week.

Or perhaps you'd prefer doing it yourself? That's up to you, happy to
help as always :D
--
Michael

#62Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Michael Paquier (#61)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Thu, Dec 19, 2024 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 19, 2024 at 09:27:04AM -0800, Masahiko Sawada wrote:

On Wed, Dec 18, 2024 at 6:26 PM Michael Paquier <michael@paquier.xyz> wrote:

v2 does not have these weaknesses by design.

I agree that v2 is better than v3 in terms of that.

Okay. In terms of the backbranches, would you prefer that I handle
this patch myself as I have done the HEAD part? This would need a
second, closer, review but I could do that at the beginning of next
week.

Thanks. Please proceed with this fix as you've already fixed the HEAD part.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#63Michael Paquier
michael@paquier.xyz
In reply to: Masahiko Sawada (#62)
Re: Memory leak in WAL sender with pgoutput (v10~)

On Fri, Dec 20, 2024 at 11:23:30AM -0800, Masahiko Sawada wrote:

Thanks. Please proceed with this fix as you've already fixed the HEAD part.

Thanks. It took me a bit of time to check that across all five stable
branches, including sysbench and the SQL case, and I think that's OK,
so done (added a couple of comments on the way).
--
Michael