Reviewers Guide to Deferred Transactions/Transaction Guarantee

Started by Simon Riggsalmost 19 years ago19 messages
#1Simon Riggs
simon@2ndquadrant.com
1 attachment(s)

transaction_guarantee.v11.patch
- keep current, cleanup, more comments and docs

Brief Performance Analysis
--------------------------

I've tested 3 scenarios:
1. normal
2. wal_writer_delay = 100ms
3. wal_writer_delay = 100ms and transaction_guarantee = off

On my laptop, with a scale=1 pgbench database with 1 connection I
consistently get around 85 tps in mode (1), with a slight performance
drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps,
depending upon how well cached everything is, with 700 tps being fairly
typical. fsync = on gives around 900tps.

Also good speedups with multiple session tests.

make installcheck passes in 120 sec in mode (3), though 155 sec in mode
(1) and 158 sec in mode (2).

Basic Implementation
--------------------

xact.c
xact.h

The basic implementation simply records the LSN of the xlog commit
record in a shared memory area, the deferred fsync cache.

ipci.c

The cache is protected by an LWlock called DeferredFsyncLock.

lwlock.h

A WALWriter process wakes up regularly to perform a background flush of
WAL up to the point of the highest LSN in the deferred fsync cache.

walwriter.c
walwriter.h
postmaster.c

WALWriter can be enabled only at server start.
(All above same as March 11 version)

Correctness
-----------

postgres.c

Only certain code paths can execute transaction_guarantee = off
transactions, though the main code paths for OLTP allow it.

xlog.c

CreateCheckpoint() must protect against starting a checkpoint when
commits are not yet flushed, so an additional flush must occur here.

vacuum.c

VACUUM FULL cannot move tuples until their states are all known, so this
command triggers a background flush also.

clog.c
clog.h
slru.c
slru.h

Changes to Clog and SLRU enforce the basic rule of WAL-before-data,
which otherwise might allow the record of a commit to reach disk before
the flush of the WAL. This is implemented by storing an LSN for each
clog page.

transam.c
transam.h
twophase.c
xact.c

The above files have API changes that allow the LSN at transaction
commit to be passed through to the Clog.

tqual.c
tqual.h
multixact.c
multixact.h

Visibility hint bits must also not be set before the transaction is
flushed, so other changes are required to ensure we store the LSN of
each transaction, not just the maximum LSN. Changes to tqual.c appear
extensive, though this is just refactoring to allow us to make
additional function calls before setting bits - there are no functional
changes to any HeapTupleSatisfies... functions.

xact.c

Contains the module for the Deferred Transaction functions and in
particular the deferred transaction cache. This could be a separate
module, since there is only a slight link with the other xact.c code.

User Interface
--------------

guc.c
postgresql.conf.sample
guc_table.h

New parameters have been added, with a new parameter grouping of
WAL_COMMITS created to control the various commit parameters.

Performance Tuning
------------------

The WALWriter wakes up each eal_writer_delay milliseconds. There are two
protections against mis-setting this parameter.

pmsignal.h

The WALWriter will also be woken by a signal if the DF cache has nearly
filled and flushing would be desirable.

The WALWriter will also loop without any delay if the number of
transactions committed while it was writing WAL is above a threshold
value.

Docs
----
The fsync parameter has been removed from postgresql.conf.sample and the
docs, though it still exists in this patch to allow performance testing
during Beta. It is suggested that fsync=on should mean the same thing as
transaction_guarantee = off, wal_writer_delay = 100ms, if it is
specified in postgresql.conf or on the server command line.

A new section in wal.sgml willd escribe this in more detail, later.

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

All of the above are fairly minor changes.

Any other thoughts/comments/tests welcome.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

transaction_guarantee.v11.patch.gzapplication/x-gzip; name=transaction_guarantee.v11.patch.gzDownload
��oFtransaction_guarantee.v11.patch�\{s�6��[��g��	%K�����I�]����&�v:��$�)R%H�j��}�A��������l7E���9�LBy$�4�R�7�[A��at������(�Gbk����A���;��i�5���e�{w�2�g��������������Ga4�V Z}���<y�tO��8�d�������Qo�����4�p��Z>x����v��<�th
��������	z$��$��X�]�<�}��-�l�������2�Tn�>z\~�_�$�f"
O7�E��Yl��Lb�/��
~b���E��l|fH��I��lv�T,�h����fg��9�|�g�-�>�2CJ�[���#~�'k>��I>���~���O��i�%"�2n4����Kq� ."E�\��i��S�L�"�i���������+������B�(�25O����B�*~.�I��R	LF3~��4�rp�����Dd�T�z{2(���\hn��3�X Xm&P�i�:��"��c���m��P����%����V&Z�S?n!���q��v�E��w�x;������A\(�Z�'`bA
LA���*'T��G��t"QC`�zC�����S�T��j��tG����9q~�N%,��"�-� �L�JDIe���A���$I��������@H>��O�O�@	���z���
�>c`���Q.`2d�Ca�%���xDlG��-$�P���D�i��.d��a�0�C�:�0a��(p4��R��l����`�R�o~�P��>����Vm�%�d?�Q�0�`I&��Qw0�#�v�x�-����T�!#��_�}X�@N+
�E��T�q4��?�,o�+G�������e2'�i�����[��YK��dY1!}a�i����/���Z�J9p\��'r<�`IgI��p�$2�1����V+�I������[�X3�\�d������oo�B3����2$��vv��wZ�p_J$)0-�)��8w���GF�L���:�G	�Q>6
2�����Y�_J��)��<s��E&������(��� �ei��*�#���M�@�r��g�z5I#p���*E��D�)�����GKY�
��[���d�hk�Or����t@.��
n ���}4.����DG��]
���Bg�e�������w�G�"'�h�?��&LL��>(D����M�E���IX	�,�U��� �����g����
	p^�	t��� .%�v��F��|��9 ���(F���!�nY�W��9d�L�Uj�@���Q5]���E�&�P��o�3�|^�N�L��\��,���1�������r�*��{q����@_8��zgG���'D��y��ja��y�'5��/-n<�"�^��`�2�GCv��!h��?N��)�q
�*Db��)�7�]���+r�m�Z�ulboG�Fq��a�j�*`Y~|��s�v��Cm,I$�0��1���P�[L��$����\�Y������&��@��]X��A4i�;.����
�����*��!K{�gx�p2�x�K=�e���@��b,� ��j���8��$�8<D������T!���k�s�������� I}��BY��(�<�$z�xw���zQB�������9������2G2Md ���M<�����lkS;;�^wg�[Ii��[�C�P0Z�� ��-�_����Fx5��������KR �	�42@�Q\ �~k�,Bq�#pq����u������=�2K������P#R�Z��F�Q���`$|��7�7��B��������Dd�O�����`y(qe���`����>����r���&�����a�od������ls$����2��+�����C��-������zG����p��(�6��������ol�76�c`3���F+����/��2��_�r8�nC����������R�j��~��\���=�K�W�����^RmZ����6
)�����(�t]�	�5:�D5'N'��[�jnL@U��D�=&�=D��i��7��Q�d0*��0E't��#�e�;�{�C���,������t�j��fL�%�c�������btYQg;ho{�Lb?��c�L�cR]
+�i���F-��DcU=�|{�b4]k8����
W��3��{b)Kgm��
Kf���=�E���!�����	N]�iZ@�����������-���i�*����� �A��Et;Q�'��qAou<=������9�&��zl[|
Z�������9J��-���bN����RN��D��^���c�'V�:�>�
��)�a�������*�lJ�M[e���|���G^F�`$��
��@>"��d��&�f�������G_s�}��������������e����r�5S���%�<���u�:� �Y3�V*���������!�{Y�:��������~�u�X�?��J�;��$�XG���m#`'������c�i���i����{9���*M(��F)���Q�h�:���	���G��E}���N?��~��.^+���ce�+)�}J���"���[)Y�+��V��������=t���`XS?�G"�Y��1A�F��d[�[�>Nk���!�/��JI(���:8m,�;L�4���a���0+u]���QL�5"�������c�%_�yAo���� �f,#�������>Jxd0�=�������*�%�)���l�����WeH��K�}�2N�e�����:�K �l&u��$�(�1p�~�p���J�S��D�<��`�H[����10w�A�j!�F!�m����0r��SF������{��c7��,)��~M+��u��PX�	b���Dc�(����=���d+�&���8#�>8a�6*F��d~llt�@���� ]�Y�o�/��{%^�����G]?�����E8gV@��n��������m�~E(���-�X�3���DUd	bn5�_�"V� �A�zT��,�Hu:V�ti�	������`d�������"PFl�2���/)K;};)�E�H%��$��$��HbD���4K�Ob����X�M"�N�Z��n��Z�b�:��Oj
�^���@�9�y��2�r{�Ga^�d�!����R��)*�����&$�L�1�v'G���D�e�w�./kn@�������3���\3��v�W�7����'�/����u���p���\�\p��rAoo���o;�5��q�s,�J�/02�l�G��Hu���(�& u=0�����V���
�.0��e@���Q�9>i(�x�a<
��%&�t���+5�g`]��^q��FwOb���*s}j(h�P91ehq8����E�I>�`]�R[n�<f����Q������(�B��c�O��%+S��5{U�����L���]���SK��Q�6�L�o����k�.d���T\�nG,nb���N���������;�v���Xa��ZOf������pl{����n�/�P���o�����;u&�!��4<�sf�#c���>v�;��r���f(�B\���@:1�''�7�O����R7;W�uvb�2�
��P���xfy��rN���k�e����j�T�����4�	��
��R�^
����p	Jm��o�������Z���0�W�y=��`-Yp(%��tx��p��HyB��x��C��l�z��=��A�X�-c`�wT]hn����������,��F����1
��!u^����P��cqA����%��
�	D~��0EO�C	���*���kx������oT��3x��E�2�C�cr���od�:�D��O��:]9��u�B���n�g�����_���y~)��X�w�cO\��y���������i9xP�zB��'����*+����7������Z�`�Wo��ol�F&�i3�6IIy���o(��ZB�)Y`p[����M���3l'C���1�Z��w���������S�8�0q~������2Sc	P�Jc���i��K_��[n�M����Qa�	��#n)�u�q�G����<!l�I:�c@������%~%$����
��q:�|,�5i�bn3���z@^�[�D\^X��*�t���f-���W.���_�����Pq��=,��8��<���'�)�M�O�~�u��p�3|W�l~���O[�}->�v��vK����l�V?����N�"�,�Kh���*�uu��)N�)�.�Jo(��o:�z�j��HT1���)�#/�7\���c!^6V����i�E���|��4|���������������F��g��T�5����v�!R��5�����p}�4�����?����+t�Wo��B�x��7)���e�|:�����o���j������Tv���^���P��}�?�RP�_�&�_
������#�nk	�I��V��/�u�_�{u�o��{����k����c2�A�ol�F����
V���`�8�3k<D����j��g��Y�R���f��[X��WW���:�X�4��W�������5�����FZJ�)��W8���2�T/8�C�4ot�B��}���Nm�i|/u�?� 	�����&{��<�U��Y�t1�3�9��s�i�lI�����m-P���c�1�#������m�*������}y�gP
gQ�?t7�V��q������n�'4�*�|�'${c��,>��u-#��bx#�.��_>��e���*>+����!���K�;�z�����'>�����L���!GI[p���Y��y��$
��s4~��XXy1�������
%�	�5�8��iz����W��E�����^��A�*��`���4�}��H< �zt�I�*=N�Q�:�!������(�#���s,"�����>u���h�]Q�iS�.�^+�F���h~n
�Q,q=�)���U���Oc2��l�:��R#��G?�
��{L+��3\��M�5�$(W���RHr�Y����X���&+b��
�6S���2�^t��� �c��������b52k�K}��
����H�@g���1]��X�>����e�S+K�)?A���q]V�L=�'�)~�l���?G�����W��9=��Ni��&}SFi��>��7��sA������������:�g�������_�L���Y��:Pb�����U���.`�=�������S����������	�o�N=�Q��&D�L<��<���{z���W��BhN��t���P:�q��G����X3��
C��<gern���������=��}.C�>��
&L4��$���y�n��|���_���)<��"����36�h��h����`�z����n��#!�wr|5�<}�$EX��n���0V~�^X�����WLQ+����;/e�����R��U�Ss���m�|��������/��<���������8�lAn���=��(�@��M�R���Wg/m�\����(_�9X��M-�1�T<������-a�6�z����PjR�L6%I�4A��"��,�9��@����fN��#���D�)]����Y�~e�PE?����_l��R�m0���|��/[�3�Mk�������^������^o�������m�Td�j���r����|��������~��S= �Z]R���4��-���yh�U�����j����[|��d~�MQ�����"�wC4)j�����L��p���-!���v!���J�k��p��O4�0���[�c�"�2�����7���4�g�u���>L�@�
��7�MJ�����b�����^G���{j(�ZM��g�7t�
O9~y�e��W��y�|tQ�?�\<.U�R]r>�Y�?��>;�y�-�b�����4=��"�p����`#\ _s����e�����q��,�{�?D�wX���n��J��w�z�t�������}o�v>��d�G�S���O_�<�U<s���&B�����
|S���c��$����#�5��������S�U�����:bCY���a�����R�-��^�sP�lUw	�~�>)��#Q.�E:$�+�;�<L'i1?U<TQ��tg�]���]��~u�����q*�#7�^����2n�Uu��1�W/����O�I�jm����@�����o�j��L~�B��1� ��'9�Y8����$s����0�1���C�9���~T����m��'�f�X*�c�����(�u�]�r�C|+O���f�U���X`�[�y��2�mAeg�9���)s<?\��J��{0��+�)���sfl��D
)������?���9����M.%@]��h�)P/q�y0�[0����rer��P
��Z0F�"� �;-���]�����NHW�<wO[G��Nw�������g������~������r������VeG�{X�����(��F��7���t_s?[��Osv�Q�%5�}7�.��7S,q��lzn��|���������������Yq������<3����/�OI(����`p_�[us�b��J��`�l7�b�������*aDx/Nl�W�O�����^oGC�b��8�W]`>��*^y��E�+7�m�3"���Ft�D1e�&�m&.������Y�=��^����Z���Q��0H��L�`N����F�M���\���L�`���gK,J������m{�m����-7)%emJ�E7��R��?���,���?�M5c�s�O?�<Y��;; �TM���c/j���t��Q[B2���}���Zh����9�q�2
�S�����@�J���Q��2sG�������m�����V��/0q�`��u���3�;��,@�u�J�
7�E�Pc����o)o^!����7L�!-�h���2XZI���Wx�@E|��$,�2�Z�-E���no�t�����Vad��0�H�M�@��[�"�My0Y�$�7���)��5�������B��-�y^���X5��V4O������q4�j�����u�i<���0xq1�{3�N��s���^����og����\��i`G������l��6���h]m�5%������.��q�<p8S
��s�^����h1A�?��v���mR?ia?B��3sO�e�h����'���S7�����]g��i��u��v0��OAvS�Z�|�(L������(����T�Kc��	��H�U�4g��z�H�>����HF
��Q�-L��y�Y�db�`�!-�o��r�s��Z�T�bT�F�>	n�	�x���F��x������Gg"H:�����7���C�8��7;��(��Eo��U�q�8��(:�����~4����G��D��@x�s�y��~
�C��~>�r���`�����r��2w��Q��4�h=�)N'2kQ��(�8��2#��>�9��i��4Y��Ni0�`p�w��A��\���@6��{�A�"��E�D����AT���T��M��]D��)9�b&���%>5�xW>�LH���������9h����Mb���������O�
GGC��W8Fzr�$[_S��	�Z������	]��j�bv�������!C<)�4�C���e0bzL�1
���e��!�s(N�R��:�)f���P4C$�4���������0��|�����H�me�����P#p���k1&��1u�������SV��f�p��-}A�� ���&"slz�����2!�e8���p[��d�!��%a+?QnD�w�����I�������7�ClhM[z��^v���Tn��P�
u�M�������������0���=���~NDl�f@�*���#��xY n`�p��g�UnH�sEg"��lN�����Y�u���Z=�9<�����v�����Z-i������F���I��RM��(D4;V��H?_�twk�H�N�6��w��(t	9�R�����!�H�y,?�667+���B?JfN=d9c
��gl���v2
��8?<�)����(:�Gp21��s�+G"�H�GA��%���:~#��.��
�^P��h��6Y�R.����Of)�@3g:�O��SP����&9���j��v��^Le2g�xN����'�O"�	�������LT�6�f*��I
�b���' {G���g�2�����a�8�b$�s��F��]�?����yH;2��x\��6�H��Q!'�?�AP������Ts��
M.3?9}0�!H����[�)�mvI��/m
d����b
���0�������Y�)V_����K`����(�����k��|��%����)Pw.A2� E�/0��$��C:��&�R��c��d�K8~@Tf�~��=��UvZ��\m����h{Hcz��p���/6�A��_2B��[��J�F��g��hg;��"p�H�+y�g4L<�xu��:�Fxn��C��n���]���C5��0�T�rE�a�����	�X(b�
z�����;T���M�qB�����w@�Ku�1��Pr��h#h�yb�{����7��d�l����>�x�j/D�s��%������s:d�����������?g�
/�.�n��z��6Je�{��$���J�����A/��
9;�av��n:�4+��m���S����@����� �b��A@U����H(��+�Fu#���rif��/)��vL���G�HN��F�2���8��vTr(�T-<��=�
i��"-W���$���%�B�GO|����Il�V��{�J���(�N��b�06�?%�QY)9����Z^�����HK���Fv`J�����D��
r�:E������/��r#|Z��k1@��W�)	8�M��9�[�I5���`�-�����i��6=-@������9�L�*cr~�J��a���A�Q�?�s��>
F|Y�Dj�-���]!s�z�4VCA�c����}�O'�S��0�[��c��b�LN��3Ek*�����m�u'�+Mp�R���������W��s��[t%�wBr_���H�������&�DN�NOrJ�_�")��)K������?�/@X�����%6���6JM���V
��	�����a��a��K�4[[vM�����v�������b!��2�H�d�U����3dL �$���6�Z��-��Ei�����d���OHo7���#�e���`��_a����^����}�����~��F�A^O��0v�l��S���M�\C�������9��8��t��r���k�zN���5�2�+O�����8K
��B��Vv���E�%��_���I���\��E��xXd�']_�Y��HH�M��uA�� b.�2Z���{;RV��"F�K�Ef���T�T� d8'YT��n�[R"���4��rJ*k�#�DA��1l^&;+���?������l�a���Jw��Er_����R�T4�&�G��3E���?~��O,��"�?�zh����p�a�/�lj�Y��F��>�5^f�_��x��f��b�
�R��V����zVo�9+���r���l Nc�)+�;$T�dB}�����e�������.�Y2��x~nh6H��(��?�	�G|
TB�H�*�^����9o���,Jy|}��W>��=��9���[[���uG�|HXq�� x4����[�G$�����P,N����N��E,��a
XI��}�$������T�"�����nG�����N��#p�1MKU��X�/�b�����5�E,�����OG�6��3�����Q�j��+��m���Q9���=�1��e`��$�w�
�����I|�D��/�@y��i.���j�)��x��L>%1���	���M%2
�0g����Qk��n����m��b����(S)����KD�f�����s�w�����*��9��<wrfM��N�������jwy\���z�|���`�0����v����u���zSsC�N��`�o$� ����N�`@g����&�������/���h����|j��O��y�Nv|�Sy��������~�x�'f4�f �WJ�DJL�6���Q�x�����G����$>����e�O�g�V&Z��S��l�Yn8���n��y
�����Z��5�Y6������W�H�|e�r�t��d7������V����d��Ycx�^0}��gP��Ln3�~[J��!���u��G���Y&}��g?L�*�F�+&]2����y�mH�c��{^�[;>�d;=��V=���*u�?���Ri��J��(�$L��O0����/�����Ei�����G�"������~"Tq"6���]��w�0*����3�:QQ	r��2IG �)�)�q����{TXO��R���FS���u	P�H���F42���<���`��Sx���dB�t��)�OOw�w����n��M��}x|��
���fW����\\��������I4�Au}c��O�I��^�<��3���]���}�L�M���A�&��+��
zM�_8�-�Z�z��>��}���}2�Hk��bf���zqC�8O�aH��EyL1W� V#���+;���p6�@���Y�ATS\����(d�]���+���k/��vb�Q0��#��C	�9v����V����{y�#���+��H����|%2��y�\r���sdC�f�r��]m$��3�*����<�kt��[��v�'�T�r��gt$����,Bf`��Y*/iG��S��a��	��1�o����Ap47��|��|��
`M��N��D�f�8
7��n�3�f��fr]S��5��)���qU�l ������V���=���������5}`���~��-�"y/*&����q�"$��X���&�\�R���������C��6\yJ�?W��s�7�h���jNLkmI����z��Gv��+��T���K���y|#Uj�L�������0�����#il'�7�4�e��O���;�6��}` ���EL�.k�tK~+=�d-��#I���H+�#���kO�M���oku\Mf[���g\W.�O�`�)%��#&
x(��_��\�����a0_���`�0�I[����~\ ��:� F��3���A����mNlI �@a�V�w��6�~P�M�I���q�������h���)���H�)?5������1�D���4W�?���y����>��r�g�;��	%��}���|��u6Z��\5x����0���?���������HL�?��(!b���]��h'Q<���;jR89T��[�(�����un/��F�������R��'Y0]i��i���r;��eB�b
��Z)FYPm!��#���,��}��|��H��%��K'6��^�� ��N�����M8�E@H(�:Q�M�:D�����"'�n��^��e���[i�@[�m���AX���b���0rl� �*�7����E>��LH,����3��'�:�XF��t9&�]�Y�����@���A�}�d���kY����O������!���"�:41r#�Gk�
_j-rdd����,E-lI���h�}j��Z$��������j6������+K�Fu���$|�4U
PU���Awi��n����	J� ����|��\pL}}��D���I6I��Cr�O�W�RC�����}��,�Kk��
�Le�&�G������������6���T������t@0�D[X
�*�W�D+����	�l��-���`M1OM�[���3���\:}�C'��&���93�8z�������H������H��,�q"	��B�'���="
"���bW�k��5d������7��=o/���#g2�N�8��H�jX4��
}�qz1�K�0��TO�� cD<1���5�u-���&e���=!6�d���6��Y�$+
����OQ��Y`�&7m(��<MJM�,��x��a6�;K�40�j�j5�bR�ts���d�����jbgg�h��q1��*#$�g
��*�"�^+jH���ZX��1��,��w���J������9��@B��
��R��'[�W�$�LM[;��a����Q
^M�4?Y�E����`���h�EZKq�����{�P��{0��|�#�,0�1�-s0���i����B`J���??���cmgt'�����D�D����3=�/������f �`�}�|5�l�m����*eA�Crt8��n������\\�%C$������3<�b��px�cI�TRU�/D���,��s�IH�
eT#���T����L��,b��X��K
�����d7T�d]tq
�@S.��d�t%��Z�^5/��oAbrsB�DZy�L������
E�H��!m��*S.�E�}<?�Q)��4V3����@@N��L��/>��s� �e �G���dJ(
v�W2]�"�KQEz�N2A��b�/��L�eh�������	x�:G�����I$i�����-s�H��L=3�H���q�0j<�d>S�7�q�g�A���
`���p#g��$�e�Y�FM�J�ba>
����Kr��J�%,['LNY�r��B	�����b�C��b	$�:�Y��Q���iB������c�@������D��N"M���/~(�x������"��!���,�dZY%��������,�����)��mo��1�j!tc��&L�|��]vDW@�/�g�$�w�(�z:��	�?�����c����'�9 ��/nx�0n��2.�����:�������c,��$+�YF��o!�2�#?�r�n�Y��1�gK�,S��i���ei���l����4�U��������*I��5�v)I�Ut��� ���:^��[�~���4�|2�7��L�������o��D���@��^��(���������*��ot�h�����7��SjSj�J�q��%Z*���$���U�vw�"  ��xY7�i.�
7�gnP�4J��j,o�i+�] �bI@��v,������J�����+^U�X��h<�F�70��_���&7[��H��'8�dM�����x�f�d�����mrm�=;l�5D�����������L��P��A� ���K��&�@D9>�/��At=�=��n���l���h9JNp�����|�8�y3s���^�L����j�i*��v�TA�eA����v5!��w��g������,��r����������� U��d�r
��>�G�p&�9��CS<���
�����c*~��[��'��tC�tM	��9;;t����c-��|2
d&u"��h��W,J���>q�d�?�E,<v��R������ �A����?��.�����	^�)���5S�tXYAdN������2�'���g0�lJ����#`P��cM����I�9_E�I���9��Y�%����z>������/gL�c��U�Pw�����(G�x����N�e�rX�����x�[�]+�����6Hb�r"��3G���m�[R�g��G!o�����^���`�%�
z�
��y�=Y9����w���!'N��(���4�3�&G��"��,~�7 ��������G�UoE��@o;�����w��7�6�Us��ja�����cp������j��S��c,a�)���������]��1�H�e�Q1:��E�K���H�@<Kum<��TU���%�����I���Ui�I?����G�y�c���[.���"e�1"���d�.��4	2=� w�����E1�1�zR�I8�����''��]x_�y��$k+�f6n�YEI�xx2l����a�\��.M7a��%O����:Ca|�CXer������V� �@�������@���3�z�9�0z��V����8�Gf�hB(4����+>?��X0y��/���B�3��F���=:�b����v}W��Y���:��w��G�����&�=lu~+��Y����K�G�	p6�p����&�		Y�����*�_�Vu��Ju�=M�^�0��Xio����K-��vQ%��E	���(���5���\������N)�%�a����ahQ ��x�gt�_�7����^.o@O��v?;B3������T)���������X�\�0G��ED�B����4'V��meV�h�Q+kyJ
����i�x�{t�
�v������a���h��mX:H�h�g6:��y���e	����ZX����t6���k�NM���02��e�Bi�1/��F��8�A
hT�'j�^���iY��`�.bT&J��E�K��d3��<�Ad3�m��=���~WK�H�a�x��(�X7r�N�9X�����i��f
�_�^��_��~��9
O��f��b��Wz�	u�,��7Mtr���'�)e��4���~����J�!7��F7�H�7m�U���'�T�3����.��K���3!�@�m1�\�D7H��+���������\�����5dxw9���4�n�T�h��D�"����P���eG�f���U�eZ����������D�������K�������P��]2q��)�L�D��*�,���9��+W�?����f�72����-T�4����5:��o�(g	��~�/1�d4���ERQ���aa_�,%���b��
92V���T~<��Z�����!s0S��i�SNm*b����!Q�����+C�7\���`?m�[�n�{��8)D�y�n���>P��Sa��IC���9��z������!�PTT�Bk�D�gn��`����=n�s/[hE�r���A��P������t����{Ui�^��/WO4�}wI�K4�rV�k��r���0�+���DA�bI�C�����W��=>q���ZZq��{M�X�o]���MH�*�����/��I7]J�"��+��?�}C`��},���(�i��0[����4y�����h������3E������N%���Tx�$b�r��A���"`
�JA��H��|�2`���Wi4\���Dx���\�Zm$�%��}4�L�2����� 
?9���8<��+,%f�}�g��������?]g}s�i�m}#�d���3�D= �:�������VI��	���N�����sP��q�O�q�eQe���
��-����`�)��ak0���������PP�|����(�4P�Np�O.)A�6�	�@���5��~�Sg.�8�:Z2��^.&�(V[��)�����L$m%���Eu��.Zq��("64��W��]�V+��=��w)������_������J��(��H������IK]U��	����?E",�g���b�	�X�F�VX��1a����������4jZ,n���{T���>�jv�����qy�,9b��R���>�@z���No���.L���}���[^��p�gv����5��#�� �/ �K��1���:�\�q���1p1��U���d��V�_����.mrnCKh��oCK�2<h�t�_�	�=�'y@�RV��e����#��v'��yn�����������:���^G'�k��?��f!%7>��@������PjT�Ni
~�%�G�I?{�X�<j���������/,�)u���)}�����M���hO�T�k����Q*��W�;M��d�Rs]*�k������"iW����	���.�+���������2'Q���>��.��W�7~
/��7���u�����������l�5�������y�Q0+=�|���g��6<�'���t�k%
���y
�C�M�U��^U����V�H��\����m��`���5���!������!2?�F���������x�0@k�,�2y���|Zz��J��pV��h��$[R\��V�
����-��K��v�s|�M����@(r8QO��o�O�Bx7Q_>�GC��g{d���]����?9�F�����h�*{Lm���w=i.�=�G��������|Y�C���-���B���q4��xL��eAoq���n�#?���>�f�=��aK���A�??G�H������W���IH�$&�a0����Q4)����l]��;�[�o�����u��t�P��K����6�:'*���w;B[\���r�:��?������������SJ"l��������v��`myj�D���F���7�����g�'��.k�t�����uP�(3���������y|��na�3����1Q���E�k"�������u�B�&G���}^�(����hn.F��Y�v��]��"99~�QlyKrhIH@����kz����}��:�/�����mu��h�p,v��%.nN���@91_�5��hhh�,���^6�W!(��2����,*��xY����u��p�]�p�wK���7�T��m��_�#���;�8E��)i������9�� <��^�~Z73�\�s�t�r��WhF
��[\*��XS+�&wMl��/�?�z�|��Q2~ w����f+���'���a�L�<��CUo1�]��f^�%>��Oi�ezT}���	u_�AT�}�\��ix���1f��{��m����~M��S&������N8�Z���2���$��c���gX��R_�g�����m|5��kxFs����t�&}�:����
��o����g_w1��B�W�0~U����g[/s�{�H*��-x����(L��4�G�4	(�}]�i�N�0����!��{g�)G�@�:WQ_��|�a�HW	����V5�A@�|z81j�88�i�#A�����N`^��	
���w���q�SJ%��X`= ��9���L�>M8c`��)�t7��`�]E�U�e����Z����j�$!'S�5}~��F�DdBIBz5�W����Y���R�
��D$U�4
��
,R�-��:�l����|L�By��t�E/s���j�c$��pF!RZT��.mCg�>�b��u8D��7f�N34��wv$|�EyN��5�V�����t�{@u����.�w��cM0o�0��u���;i�����~��U����l�n��$��[�#j7��
zX���Y�9���EY�����CG�+B�((-����0e�V�c���������\2�����Y��|�����Wy��5�������}���o�O��288������O�+8���?9�j�������)`�����S�3���B��=��QA �I�h:A���"����������(�)��`J,��g��v��!y������V��[W�s�:`������%&���_���b�-���"�_���n����~�OX�{�U���r���[�����R�L�-�l�@��� J���>����r��nH�xxj_�D9%��8�sO���iU��D�&�����
I��^1U��a'��9DJ��W.-��Q��5�)��.��X����@,�<�J2�v�")~i�R����
��U019�
&Q�,�|�����nu.�C�����>��>�\�V*�97��#'ngbz�k��q�\�n�*�ka�v�e��eS�������?�5�L�"N2����$�|���w;;�*�%��|��������.&����y���
��T��TM�-���&�(RA��z�e�CK���sJ�9�g�Fi>1�X�\�Z�,�H=Oc"W6�n�J ��2��5y�������'Gw������:�}8�?��K�����N�1w|�y�*�CJ�"�0?�C��d\V��m5�-�����u�(7f�_x���w�]X����d����]�<Y$�t�N���Z;n������)x��Vt�����K,hb���i�2nl�������Vx������������L��3B������������t�L��]�a\L��\���E|�B����W#z0
N.���0{������W�!���+����4���R�-
��hZ�����+�5 <�S������[Q�^�L	
(���l���Ef���sr�uK|�-I��1�l4V�a������)��q2�Uffb�-�9N����u�0��&|���������a��:�����&��+�K�������)��]V\�m�����=���Ke�VVH����b�Oc�����Wt����Ta�8��������[��(�b�h&m]y���H��Bkr�R��z���a�:J�jY!bE�f��J�*�����F�-7k?T��T��R��M���2{�2����+��^��z��D��+����?Tf�
�Xs�����*�Me�*E8��*���2W�u�R�����^�M���2W@��$f�*sZe��e��8��P��(Tk������U�J�s+����>U�J(^�����*s��p+����+���j���J�	1V������{���R���;�0e^�o�LWku�jH_B,����f��+�zXT�|G~�=����
����f���rk�����:8�0�,m�V�8r�������!I�Ttk��W����������y��,��z	���s>�����7�.�@�y�`�pS��}R�
q3O���j�)=��|���W���S��z7����X�����o��Aw�n�j�;^��Z���������l~���Oaf�,���oT�]���5}mC�^��{<�c�,�� �.A�r��%�����_�v[ ��^���W�NJ��u�V�	�~�6����_��z��a�p�~�Yr�����W���kU�^����lq�R���n��*q�3Y�OW~�c�1tL	&������{�u��Z7�4?�����-���5�M�����
w���^�P������nl
��]u�1�����.�����[O��~;Z��:�-����v�/���~���Q$�?��{T�����,5~(��7*e�O?����r��*n���{����
`���!�r�N������n��(�b��n����P�5 �������oA�nV�n��#������`?��|g*yKK����z����������!�*��g$��"��������$^��7�����T����
�{�-5p���T�V���n�U���'���>hw1{|���x��������h�#+�~���6���b�_�a}�J�xJ��(��FS����[0^��-�)�������
o!��u������_h�x�(��.����w���Y����@|��fh�Y�>�Qbx�q�	8�����g9���/��o��s��2KJ�.=����u��	B2��2r����\�-[���:��|4�"8�$�`J{sP!�8cJY,yp����M�C��h_c�����������*Xv����a����8�����?cas0������K��t�������G#��8�b�s����b�g��C*���w����
R+�o#���!p�����nGE�^7���Yr}���/!�h+�������{P�-hq��C1a!u4�h9g�;��o���e?���C��\��,���J�D%��8��b����s6�7�`-: ��T�@��|I�aT��bi�Y�B�����B/Xe�iK��}��)���`�u��� �@��>>�E�G����L`���|�D�����+����w|����x(:�U���\�����hgg������6�G��
��T�����;Ft����VhW�������aYCU�i21KeZU��*��������C���9D!G����D����>����S�s��H[x�f����
�!��`��@ e��K9�`u��pz�\��A��Ya�9�s=��)�X��He��1cU`�$J��	�y����c�}�]F1o3K�Z��?ouf�r
xD���y�Oi�W�Y8Bt�`^�K��������F�)�����Zp$S��^	�R��r�b�8�u8�*p?\���n%����l�=*v����:��"�`>M��7������	�K����#��L��a�
b��/���P�����D�!��K���������F$�}�Q������v���,
'��h(*�:����?�@P:�V��b}#3M����)��5l�� �b��F���	�6"��8K��Lg7Z-Y��)�M��&X>w�
�70?�

�d�1����|���}��/����,)g?��\�,&�t����Rr�������
p��Rs���>�.e
���oS>�+��r��	p2������sp��{}��M�����m���XE�*;�*>�m��>�~I�~��(�J����G_'t�N�sy\��NZqa\��
l��� �������Zp�A�BZ�:}���M"����/�}�9�Fkb9<�������C�5��J��j��T�b�4F��{��,�nw�u2��:����<��'`�Y��{�������f�~&Y������A �F�Qk���|A�D�A���G�|�~��t��G���/�]4+q1���o��2����osLJ�����[.��r������i�ObPg=�ES�]�!u�rp0y+������%�@+uW�W��p�e6��k �<�R�,��
��>�g�t�{&���m
�zAIB�W�
^����`�VQ��fD^��y �
AMYKz������0�^?����?`j�k^E�s���^q�z� )���wf�]���?�/�Uf�Yh���0��(7��5�:=pV�b��������
����L�?��>0�����}��}����S�n��!���3F�Md�Iq�eQ9�@�E�z2������>]�i6� �����5�,�P�5qv�~8�����+Q�����hz�:����1�O'���[ ����^?���tc�����������o�4^�Dg�K$��^|�Pe�|���
^��oq(L
�����
���T9���G�?�Na�8�#����BN�)��I�&�����r	p�����������=
���d�Ki�J��/������<��R:�"k,�_tL�����)&����"n/��Orf���a\=�����+:�L��Vt�i�H��B��+@����^�����uHb��+�g+	��7S��X[�L]��a�|����pp���Y'����2j���)���_1�������J3X��*�;4���]g��w�AP���1�<qZ
z�WX�j�,���i���V�-�^J����x��}���|�9p����
���]���-&����"����,��z�O9��c����U��7��b3����<��\�X���m��-�>�������V��������^������'����7
�Y�~����^��JB�4���@L�n��n2��#l�� ��W!IX�%�R��e7�	�c�.����4����Mi����?�+^�
�$��t�4cO^�k�="����@�����3�t�������6IK����b����G�s��1&�~p3���>���"��a4��@�b6c�
��&�R�	�)��	s#�q�+�8�q���>C#���J;�G������'Y�����S�����^��u�=��; �A�P0������\*��O�����3y��[T�ihF�RE��m��L�}���P���y6��3������ x��>�k�/����|�������z�G�E� :@t��?�T[�����/d�~i�Z��<?1��QMz�E�|����4A�����s"�p�������n�-��K)E���������}Q��&���T��R�y�^V�vV��Z���,��v����}+�Nzn%q|D��kk��/.8O���=�.�P]�K������#���h�@j�'/���|
�
��o�P�[�$��W��"�z�#�=�p��F�m�m���C�?����lS2llC��a��,=�\4�LQ��:j�i$K���%R�m{�m��"P���K����\�a��[����x����$�����Wx?����`��K�g�'�%��A�TC�Q���S8������12{6����M8��'���L��kz���S��y�X�:������{'���~w����.���U���Q1+��-H�6�<�J����	
+{���E4����P�\u=M���I�������Q�v��y|���y�:�k��( ������c�c���! ���o��>��vO�G�i��;���sZrx�h�i�A���~Z���������qw����n>�~��?q����~�+�t���#4���`z6���V��;8�{��N:�kB����1d��4B!�|B���:]8��Y��]k�w�:;�ySC�N27^���;#>�_f'ev���N���G��r�X���:�>����D>M7]M��c�=}��=�p	�zy�b��g�M,]���UKnU#��3!�������N���-<j}l��:�nw�h�C��wgG5����x�H<��Y������|�y�;lw��ZI�P�� ��X ������W5G������lr����Ju�Hu�Q��v�uj�����"P��Nm��&4.
q1��n��i1�fd����m���w���,�&���=hUKQ�h���`<��.�Z�"y�B��i�}�����0'C
R��z���?/����V�#�������\��
�HOZyg��<S�8�������.4�x�7\�z*B|��3�0@���@
c=BoE�bK�g
��f�������?�xr�t����p�C�����)`~�
#2Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#1)
1 attachment(s)
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote:

transaction_guarantee.v11.patch

correct files attached

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

I now think we should update the LSN of the page, but not changed yet.

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

Not that important

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

Some further discussion required here, I think. That change may actually
have introduced a slight risk into the patch. Will raise at review.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

Not necessary

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachments:

tg.tar.gzapplication/x-compressed-tar; name=tg.tar.gzDownload
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#2)
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

"Simon Riggs" <simon@2ndquadrant.com> writes:

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

Some further discussion required here, I think. That change may actually
have introduced a slight risk into the patch. Will raise at review.

Given that you're going to be gone for the next two weeks, I'm wondering
when you think that discussion will happen.

regards, tom lane

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

On Sun, 2007-04-08 at 11:05 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

Some further discussion required here, I think. That change may actually
have introduced a slight risk into the patch. Will raise at review.

Given that you're going to be gone for the next two weeks, I'm wondering
when you think that discussion will happen.

Well, now is good... but I would never say "this must happen now".

I'm sorry my schedule is busy at this time, I really thought the change
of dates would mean I'd avoid my normal disappearing trick. Previously
its been family holidays, now its just other business I am called to.

My concern was this:

If we flush the currently outstanding deferred transactions then that
doesn't guarantee they have all reached the clog. Previously, a deferred
transaction would not release the CheckpointStartLock until after the
clog had been updated.

If we wait for all currently inCommit transactions to end this will
cover all deferred transactions also. So I think I just need to flush
deferred transactions prior to the wait and this will be valid. Would
you agree?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#5Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#4)
Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

On Sun, 2007-04-08 at 17:02 +0100, Simon Riggs wrote:

My concern was this:

If we flush the currently outstanding deferred transactions then that
doesn't guarantee they have all reached the clog. Previously, a deferred
transaction would not release the CheckpointStartLock until after the
clog had been updated.

If we wait for all currently inCommit transactions to end this will
cover all deferred transactions also. So I think I just need to flush
deferred transactions prior to the wait and this will be valid. Would
you agree?

I'm good with this now, sorry for the noise.

From the existing code in CreateCheckpoint, just need to add a

background flush immediately prior to the newly added waits. That would
replace what I've got in the current patch where I hold the lock across
the calculation the WAL insert pointer for the checkpoint which was too
safe - there is no need for prior WAL to be flushed at that point.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#6ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Simon Riggs (#2)
1 attachment(s)
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

"Simon Riggs" <simon@2ndquadrant.com> wrote:

transaction_guarantee.v11.patch

correct files attached

This is a small fix to transaction_guarantee patch.
WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
Other changes are only for suppressing warnings.

We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer,
but I didn't change it in the patch. (I don't know why the value is 3
-- bgwriter, autovacuum launcher and ... what?)

BTW, the following TODO item comes to my mind:
| Allow WAL traffic to be streamed to another server for stand-by replication
We have to open sockets to another server when we want to stream WAL.
If there were WAL writer, we can save the number of those sockets.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachments:

transaction_guarantee.v11fix.patchapplication/octet-stream; name=transaction_guarantee.v11fix.patchDownload
diff -cpr transaction_guarantee.v11/src/backend/postmaster/postmaster.c transaction_guarantee.v11fix/src/backend/postmaster/postmaster.c
*** transaction_guarantee.v11/src/backend/postmaster/postmaster.c	Tue Apr 10 19:55:56 2007
--- transaction_guarantee.v11fix/src/backend/postmaster/postmaster.c	Tue Apr 10 19:56:08 2007
*************** SubPostmasterMain(int argc, char *argv[]
*** 3400,3405 ****
--- 3400,3406 ----
  	if (strcmp(argv[1], "--forkbackend") == 0 ||
  		strcmp(argv[1], "--forkavlauncher") == 0 ||
  		strcmp(argv[1], "--forkavworker") == 0 ||
+ 		strcmp(argv[1], "--forkwalwriter") == 0 ||
  		strcmp(argv[1], "--forkboot") == 0)
  		PGSharedMemoryReAttach();
  
diff -cpr transaction_guarantee.v11/src/include/access/gist_private.h transaction_guarantee.v11fix/src/include/access/gist_private.h
*** transaction_guarantee.v11/src/include/access/gist_private.h	Mon Jan 22 13:08:11 2007
--- transaction_guarantee.v11fix/src/include/access/gist_private.h	Tue Apr 10 19:56:08 2007
*************** typedef struct GistSplitVector
*** 200,207 ****
  								 * distributed between left and right pages */
  } GistSplitVector;
  
- #define XLogRecPtrIsInvalid( r )	( (r).xlogid == 0 && (r).xrecoff == 0 )
- 
  typedef struct
  {
  	Relation	r;
--- 200,205 ----
diff -cpr transaction_guarantee.v11/src/include/access/xlogdefs.h transaction_guarantee.v11fix/src/include/access/xlogdefs.h
*** transaction_guarantee.v11/src/include/access/xlogdefs.h	Tue Apr 10 19:55:56 2007
--- transaction_guarantee.v11fix/src/include/access/xlogdefs.h	Tue Apr 10 19:56:08 2007
*************** typedef struct XLogRecPtr
*** 33,40 ****
  	uint32		xrecoff;		/* byte offset of location in log file */
  } XLogRecPtr;
  
! #define XLogRecPtrIsInvalid(p) \
! 			(((p).xlogid == 0 && (p).xrecoff == 0) ? true : false)
  
  /*
   * Macros for comparing XLogRecPtrs
--- 33,39 ----
  	uint32		xrecoff;		/* byte offset of location in log file */
  } XLogRecPtr;
  
! #define XLogRecPtrIsInvalid(r)	((r).xlogid == 0 && (r).xrecoff == 0)
  
  /*
   * Macros for comparing XLogRecPtrs
diff -cpr transaction_guarantee.v11/src/include/postmaster/walwriter.h transaction_guarantee.v11fix/src/include/postmaster/walwriter.h
*** transaction_guarantee.v11/src/include/postmaster/walwriter.h	Thu Mar 29 21:30:37 2007
--- transaction_guarantee.v11fix/src/include/postmaster/walwriter.h	Tue Apr 10 19:58:12 2007
*************** extern int WALWriterDelay;
*** 16,18 ****
--- 16,22 ----
  #define WALWriterActive() (WALWriterDelay > 0)
  
  extern int StartWALWriter(void);
+ 
+ #ifdef EXEC_BACKEND
+ extern void WALWriterMain(int argc, char *argv[]);
+ #endif
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/Transaction Guarantee

"Simon Riggs" <simon@2ndquadrant.com> writes:

transaction_guarantee.v11.patch

I can't help feeling that this is enormously overcomplicated.

The "DFC" in particular seems to not be worth its overhead. Why wouldn't
we simply track the newest commit record at all times, and then whenever
the wal writer wakes up, it would write/fsync that far (or write/fsync
all completed WAL pages, if there's no new commit record to worry
about)?

I see the concern about not letting clog pages go to disk before the
corresponding WAL data is flushed, but that could be handled much more
simply: just force a flush through the newest commit record before any
write of a clog page. Those writes are infrequent enough (every 32K
transactions or one checkpoint) that this seems not a serious problem.

The other interesting issue is not letting hint-bit updates get to disk
in advance of the WAL flush, but I don't see a need to track those at
a per-transaction level: just advance page LSN to latest commit record
any time a hint bit is updated. The commit will likely be flushed
before we'd be interested in writing the buffer out anyway. Moreover,
the way you are doing it creates a conflict in that the DFC has to
guarantee to remember every unflushed transaction, whereas it really
needs to be just an approximate cache for its performance to be good.

AFAIK there is no need to associate any forced flush with multixacts;
there is no state saved across crashes for those anyway.

I don't see a point in allowing the WAL writer to be disabled ---
I believe it will be a performance win just like the bgwriter,
independently of whether transaction_guarantee is used or not,
by helping to keep down the number of dirty WAL buffers. That in
turn allows some other simplifications, like not needing an assign hook
for transaction_guarantee.

I disagree with your desire to remove the fsync parameter. It may have
less use than before with this feature, but that doesn't mean it has
none.

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

regards, tom lane

#8Simon Riggs
simon@2ndquadrant.com
In reply to: ITAGAKI Takahiro (#6)
Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote:

"Simon Riggs" <simon@2ndquadrant.com> wrote:

transaction_guarantee.v11.patch

correct files attached

This is a small fix to transaction_guarantee patch.
WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
Other changes are only for suppressing warnings.

Thanks

BTW, the following TODO item comes to my mind:
| Allow WAL traffic to be streamed to another server for stand-by replication
We have to open sockets to another server when we want to stream WAL.
If there were WAL writer, we can save the number of those sockets.

I'll be looking at designs for that in the next cycle.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#9Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#7)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/Transaction Guarantee

I agree with Tom's reasoning about the suggested simplifications, sorry.

3. Should the WALWriter also do the wal_buffers half-full write at

the

start of XLogInsert() ?

That should go away entirely; to me the main point of the
separate wal-writer process is to take over responsibility
for not letting too many dirty wal buffers accumulate.

That also sounds a lot simpler, but I think Bruce wanted to be able to
give
some time guarantee to the not waiting for fsync txns.
When a commit only half-filled the page and no more WAL comes in for
a long time, there is only WALWriter to do the IO.
The WALWriter would need to only flush a half-full page after timeout
iff it contains a commit record.

One more question on autocommit:
Do we wait for a flush for an autocommitted DML ?
Seems we generally should not.

Andreas

#10Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#8)
Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee

Simon Riggs wrote:

On Tue, 2007-04-10 at 20:46 +0900, ITAGAKI Takahiro wrote:

"Simon Riggs" <simon@2ndquadrant.com> wrote:

transaction_guarantee.v11.patch

correct files attached

This is a small fix to transaction_guarantee patch.
WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
Other changes are only for suppressing warnings.

Thanks

BTW, the following TODO item comes to my mind:
| Allow WAL traffic to be streamed to another server for stand-by replication
We have to open sockets to another server when we want to stream WAL.
If there were WAL writer, we can save the number of those sockets.

I'll be looking at designs for that in the next cycle.

Already a TODO:

* Allow WAL traffic to be streamed to another server for stand-by
replication

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

On Thu, 2007-04-12 at 15:56 -0400, Tom Lane wrote:

"Simon Riggs" <simon@2ndquadrant.com> writes:

transaction_guarantee.v11.patch

Thanks for the review.

I can't help feeling that this is enormously overcomplicated.

I agree with all but one of your comments, see below.

The "DFC" in particular seems to not be worth its overhead. Why wouldn't
we simply track the newest commit record at all times, and then whenever
the wal writer wakes up, it would write/fsync that far (or write/fsync
all completed WAL pages, if there's no new commit record to worry
about)?

The other interesting issue is not letting hint-bit updates get to disk
in advance of the WAL flush, but I don't see a need to track those at
a per-transaction level: just advance page LSN to latest commit record
any time a hint bit is updated. The commit will likely be flushed
before we'd be interested in writing the buffer out anyway. Moreover,
the way you are doing it creates a conflict in that the DFC has to
guarantee to remember every unflushed transaction, whereas it really
needs to be just an approximate cache for its performance to be good.

I've spent a few hours thinking on this and I'm happy with it now. The
lure of removing that much code is too strong to resist; its certainly
easier to remove code after freeze than it is to add it.

Advancing the LSN too far was a worry of mine, but we have the code now
to cope if that shows to be a problem in testing. So lets strip that
out.

I see the concern about not letting clog pages go to disk before the
corresponding WAL data is flushed, but that could be handled much more
simply: just force a flush through the newest commit record before any
write of a clog page. Those writes are infrequent enough (every 32K
transactions or one checkpoint) that this seems not a serious problem.

This bit I'm not that happy with. You're right its fairly infrequent,
but the clog pages are typically written when we extend the clog. That
happens while holding XidGenLock and ProcArrayLock, so holding those
across an additional (and real) I/O is going to make that blockage
worse. We've been to great pains in other places to remove logjams and
we know that the follow-on effects of logjams are not swift to clear
when the system is running at full load on multiple CPU systems.

The code to implement this is pretty clean: a few extra lines in
clog/slru and bubbled-up API changes.

I was actually thinking of adding something to the bgwriter to clean the
LRU block of the clog, if it was dirty, once per cycle, to further
reduce the possibility of I/O at that point.

AFAIK there is no need to associate any forced flush with multixacts;
there is no state saved across crashes for those anyway.

Agreed.

I don't see a point in allowing the WAL writer to be disabled ---
I believe it will be a performance win just like the bgwriter,
independently of whether transaction_guarantee is used or not,
by helping to keep down the number of dirty WAL buffers. That in
turn allows some other simplifications, like not needing an assign hook
for transaction_guarantee.

That would be pleasant. The other changes make hint bit setting need a
LWlock request, so I wanted to include a way of saying "I never ever
want to use transaction_guarantee = off". I see the beauty of your
suggestion and agree.

So keep the parameter, but let it default to 100ms?
Range 10-1000ms?

I disagree with your desire to remove the fsync parameter. It may have
less use than before with this feature, but that doesn't mean it has
none.

OK

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

Yes

I'll make the agreed changes by next Wed/Thurs.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#12Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
Re: Reviewers Guide to Deferred Transactions/Transaction Guarantee

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Simon Riggs wrote:

transaction_guarantee.v11.patch
- keep current, cleanup, more comments and docs

Brief Performance Analysis
--------------------------

I've tested 3 scenarios:
1. normal
2. wal_writer_delay = 100ms
3. wal_writer_delay = 100ms and transaction_guarantee = off

On my laptop, with a scale=1 pgbench database with 1 connection I
consistently get around 85 tps in mode (1), with a slight performance
drop in mode (2). In mode (3) I get anywhere from 200tps - 900 tps,
depending upon how well cached everything is, with 700 tps being fairly
typical. fsync = on gives around 900tps.

Also good speedups with multiple session tests.

make installcheck passes in 120 sec in mode (3), though 155 sec in mode
(1) and 158 sec in mode (2).

Basic Implementation
--------------------

xact.c
xact.h

The basic implementation simply records the LSN of the xlog commit
record in a shared memory area, the deferred fsync cache.

ipci.c

The cache is protected by an LWlock called DeferredFsyncLock.

lwlock.h

A WALWriter process wakes up regularly to perform a background flush of
WAL up to the point of the highest LSN in the deferred fsync cache.

walwriter.c
walwriter.h
postmaster.c

WALWriter can be enabled only at server start.
(All above same as March 11 version)

Correctness
-----------

postgres.c

Only certain code paths can execute transaction_guarantee = off
transactions, though the main code paths for OLTP allow it.

xlog.c

CreateCheckpoint() must protect against starting a checkpoint when
commits are not yet flushed, so an additional flush must occur here.

vacuum.c

VACUUM FULL cannot move tuples until their states are all known, so this
command triggers a background flush also.

clog.c
clog.h
slru.c
slru.h

Changes to Clog and SLRU enforce the basic rule of WAL-before-data,
which otherwise might allow the record of a commit to reach disk before
the flush of the WAL. This is implemented by storing an LSN for each
clog page.

transam.c
transam.h
twophase.c
xact.c

The above files have API changes that allow the LSN at transaction
commit to be passed through to the Clog.

tqual.c
tqual.h
multixact.c
multixact.h

Visibility hint bits must also not be set before the transaction is
flushed, so other changes are required to ensure we store the LSN of
each transaction, not just the maximum LSN. Changes to tqual.c appear
extensive, though this is just refactoring to allow us to make
additional function calls before setting bits - there are no functional
changes to any HeapTupleSatisfies... functions.

xact.c

Contains the module for the Deferred Transaction functions and in
particular the deferred transaction cache. This could be a separate
module, since there is only a slight link with the other xact.c code.

User Interface
--------------

guc.c
postgresql.conf.sample
guc_table.h

New parameters have been added, with a new parameter grouping of
WAL_COMMITS created to control the various commit parameters.

Performance Tuning
------------------

The WALWriter wakes up each eal_writer_delay milliseconds. There are two
protections against mis-setting this parameter.

pmsignal.h

The WALWriter will also be woken by a signal if the DF cache has nearly
filled and flushing would be desirable.

The WALWriter will also loop without any delay if the number of
transactions committed while it was writing WAL is above a threshold
value.

Docs
----
The fsync parameter has been removed from postgresql.conf.sample and the
docs, though it still exists in this patch to allow performance testing
during Beta. It is suggested that fsync=on should mean the same thing as
transaction_guarantee = off, wal_writer_delay = 100ms, if it is
specified in postgresql.conf or on the server command line.

A new section in wal.sgml willd escribe this in more detail, later.

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

All of the above are fairly minor changes.

Any other thoughts/comments/tests welcome.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#2)
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Simon Riggs wrote:

On Thu, 2007-04-05 at 22:56 +0100, Simon Riggs wrote:

transaction_guarantee.v11.patch

correct files attached

Open Questions
--------------

1. Should the DFC use a standard hash table? Custom code allows both
additional speed and the ability to signal when it fills.

2. Should tqual.c update the LSN of a heap page with the LSN of the
transaction commit that it can read from the DF cache?

I now think we should update the LSN of the page, but not changed yet.

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

Not that important

4. The recent changes to remove CheckpointStartLock haven't changed the
code path for deferred transactions, so a similar solution might be
possible there also.

Some further discussion required here, I think. That change may actually
have introduced a slight risk into the patch. Will raise at review.

5. Is it correct to do WAL-before-flush for clog only, or should this
be multixact also?

Not necessary

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14Bruce Momjian
bruce@momjian.us
In reply to: ITAGAKI Takahiro (#6)
Re: Reviewers Guide to Deferred Transactions/TransactionGuarantee

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:

"Simon Riggs" <simon@2ndquadrant.com> wrote:

transaction_guarantee.v11.patch

correct files attached

This is a small fix to transaction_guarantee patch.
WAL writer needs PGSharedMemoryReAttach() on EXEC_BACKEND platforms.
Other changes are only for suppressing warnings.

We might also need to increase NUM_AUXILIARY_PROCS (=3) for WAL writer,
but I didn't change it in the patch. (I don't know why the value is 3
-- bgwriter, autovacuum launcher and ... what?)

BTW, the following TODO item comes to my mind:
| Allow WAL traffic to be streamed to another server for stand-by replication
We have to open sockets to another server when we want to stream WAL.
If there were WAL writer, we can save the number of those sockets.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#15Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#11)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

Simon Riggs wrote:

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

Yes

I'll make the agreed changes by next Wed/Thurs.

I have seen no patch yet with the agreed changes.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: [PATCHES] Reviewers Guide to DeferredTransactions/TransactionGuarantee

On Thu, 2007-04-26 at 21:14 -0400, Bruce Momjian wrote:

Simon Riggs wrote:

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

Yes

I'll make the agreed changes by next Wed/Thurs.

I have seen no patch yet with the agreed changes.

True, will be at least a few more days yet.

I've got a few questions I'll post later today.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#17Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#16)
Re: [PATCHES] Reviewers Guide to DeferredTransactions/TransactionGuarantee

Simon Riggs wrote:

On Thu, 2007-04-26 at 21:14 -0400, Bruce Momjian wrote:

Simon Riggs wrote:

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

Yes

I'll make the agreed changes by next Wed/Thurs.

I have seen no patch yet with the agreed changes.

True, will be at least a few more days yet.

I've got a few questions I'll post later today.

Again, I have seen no follup patch for this. I would keep it for 8.4
but I am unsure what the issue even is, so I am deleting this message.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#11)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

Simon Riggs wrote:

3. Should the WALWriter also do the wal_buffers half-full write at the
start of XLogInsert() ?

That should go away entirely; to me the main point of the separate
wal-writer process is to take over responsibility for not letting too
many dirty wal buffers accumulate.

Yes

I'll make the agreed changes by next Wed/Thurs.

Ah, here is the item Simon was talking about. Simon, when are we
getting the updated patch? If not soon, the entire patch will be kept
for 8.4.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#19Simon Riggs
simon@2ndquadrant.com
In reply to: Simon Riggs (#11)
Re: [PATCHES] Reviewers Guide to Deferred Transactions/TransactionGuarantee

On Fri, 2007-04-13 at 16:09 +0100, Simon Riggs wrote:

I'll make the agreed changes by next Wed/Thurs.

I am actively working on this now, after some delays because of other
calls on my time. The suggested changes have needed more rework than I
estimated, touching most lines of the patch, but I don't see any
problems in changing it as agreed.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com