Spin Lock sleep resolution
While stracing a process that was contending for a spinlock, I noticed that
the sleep time would often have a longish sequence of 1 and 2 msec sleep
times, rather than the rapidly but randomly increasing sleep time intended.
(At first it looked like it was sleeping on a different attempt each time,
rather than re-sleeping, but some additional logging showed this was not
the case, it was re-sleeping on the same lock attempt.)
The problem is that the state is maintained only to an integer number of
milliseconds starting at 1, so it can take a number of attempts for the
random increment to jump from 1 to 2, and then from 2 to 3.
If the state was instead maintained to an integer number of microseconds
starting at 1000, this granularity of the jump would not be a problem.
And once the state is kept to microseconds, I see no point in not passing
the microsecond precision to pg_usleep.
The attached patch changes the resolution of the state variable to
microseconds, but keeps the starting value at 1msec, i.e. 1000 usec.
Arguably the MIN_DELAY_USEC value should be reduced from 1000 as well given
current hardware, but I'll leave that as a separate issue.
There is the comment:
* The pg_usleep() delays are measured in milliseconds because 1
msec is a
* common resolution limit at the OS level for newer platforms. On
older
* platforms the resolution limit is usually 10 msec, in which case
the
* total delay before timeout will be a bit more.
I think the first sentence is out of date (CentOS 6.3 on "Intel(R) Xeon(R)
CPU E5-2650 0 @ 2.00GHz" not exactly bleeding edge, seems to have a
resolution of 100 usec), and also somewhat bogus (what benefit do we get by
preemptively rounding to some level just because some common platforms will
do so anyway?). I've just removed the whole comment, though maybe some
version of the last sentence needs to be put back.
I have no evidence that the current granularity of the random escalation is
actually causing performance problems, only that is causes confusion
problems for people following along. But I think the change can be
justified based purely on it being cleaner code and more future proof.
This could be made obsolete by changing to futexes, but I doubt that that
is going to happen soon, and certainly not for all platforms.
I will add to commitfest Next
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
The problem is that the state is maintained only to an integer number of
milliseconds starting at 1, so it can take a number of attempts for the
random increment to jump from 1 to 2, and then from 2 to 3.
Hm ... fair point, if you assume that the underlying OS has a sleep
resolution finer than 1ms. Otherwise it would not matter.
The attached patch changes the resolution of the state variable to
microseconds, but keeps the starting value at 1msec, i.e. 1000 usec.
No patch seen here ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 2, 2013 at 1:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
The problem is that the state is maintained only to an integer number of
milliseconds starting at 1, so it can take a number of attempts for the
random increment to jump from 1 to 2, and then from 2 to 3.Hm ... fair point, if you assume that the underlying OS has a sleep
resolution finer than 1ms. Otherwise it would not matter.
I would guess it does matter for the cumulative error when re-sleeping
several times.
In any case, the resolution is limited on tick-based kernels, which
are few nowadays. However, I've found evidence[0]http://lists.freebsd.org/pipermail/freebsd-arch/2012-March/012423.html that FreeBSD is
still on that boat.
[0]: http://lists.freebsd.org/pipermail/freebsd-arch/2012-March/012423.html
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Monday, April 1, 2013, Tom Lane wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
The problem is that the state is maintained only to an integer number of
milliseconds starting at 1, so it can take a number of attempts for the
random increment to jump from 1 to 2, and then from 2 to 3.Hm ... fair point, if you assume that the underlying OS has a sleep
resolution finer than 1ms. Otherwise it would not matter.
Let's say you get a long stretch of increments that are all a ratio of <1.5
fold, for simplicity let's say they are all 1.3 fold. When you do
intermediate truncations of the state variable, it never progresses at all.
perl -le '$foo=1; foreach (1..10) {$foo*=1.3; print int $foo}'
perl -le '$foo=1; foreach (1..10) {$foo*=1.3; $foo=int $foo; print int
$foo}'
Obviously the true stochastic case is not quite that stark.
No patch seen here ...
Sorry. I triple checked that the patch was there, but it seems like if you
save a draft with an attachment, when you come back later to finish and
send it, the attachment may not be there anymore. The Gmail Offline teams
still has a ways to go. Hopefully it is actually there this time.
Cheers,
Jeff
Attachments:
spin_delay_microsecond_v1.patchapplication/octet-stream; name=spin_delay_microsecond_v1.patchDownload
diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
new file mode 100644
index ed1f56a..ccdf38f
*** a/src/backend/storage/lmgr/s_lock.c
--- b/src/backend/storage/lmgr/s_lock.c
*************** s_lock(volatile slock_t *lock, const cha
*** 81,101 ****
* so minutes. It seems better to fix the total number of tries (and thus
* the probability of unintended failure) than to fix the total time
* spent.
- *
- * The pg_usleep() delays are measured in milliseconds because 1 msec is a
- * common resolution limit at the OS level for newer platforms. On older
- * platforms the resolution limit is usually 10 msec, in which case the
- * total delay before timeout will be a bit more.
*/
#define MIN_SPINS_PER_DELAY 10
#define MAX_SPINS_PER_DELAY 1000
#define NUM_DELAYS 1000
! #define MIN_DELAY_MSEC 1
! #define MAX_DELAY_MSEC 1000
int spins = 0;
int delays = 0;
! int cur_delay = 0;
while (TAS_SPIN(lock))
{
--- 81,96 ----
* so minutes. It seems better to fix the total number of tries (and thus
* the probability of unintended failure) than to fix the total time
* spent.
*/
#define MIN_SPINS_PER_DELAY 10
#define MAX_SPINS_PER_DELAY 1000
#define NUM_DELAYS 1000
! #define MIN_DELAY_USEC 1000L
! #define MAX_DELAY_USEC 1000000L
int spins = 0;
int delays = 0;
! long cur_delay = 0;
while (TAS_SPIN(lock))
{
*************** s_lock(volatile slock_t *lock, const cha
*** 109,117 ****
s_lock_stuck(lock, file, line);
if (cur_delay == 0) /* first time to delay? */
! cur_delay = MIN_DELAY_MSEC;
! pg_usleep(cur_delay * 1000L);
#if defined(S_LOCK_TEST)
fprintf(stdout, "*");
--- 104,112 ----
s_lock_stuck(lock, file, line);
if (cur_delay == 0) /* first time to delay? */
! cur_delay = MIN_DELAY_USEC;
! pg_usleep(cur_delay);
#if defined(S_LOCK_TEST)
fprintf(stdout, "*");
*************** s_lock(volatile slock_t *lock, const cha
*** 119,129 ****
#endif
/* increase delay by a random fraction between 1X and 2X */
! cur_delay += (int) (cur_delay *
((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);
/* wrap back to minimum delay when max is exceeded */
! if (cur_delay > MAX_DELAY_MSEC)
! cur_delay = MIN_DELAY_MSEC;
spins = 0;
}
--- 114,124 ----
#endif
/* increase delay by a random fraction between 1X and 2X */
! cur_delay += (long) (cur_delay *
((double) random() / (double) MAX_RANDOM_VALUE) + 0.5);
/* wrap back to minimum delay when max is exceeded */
! if (cur_delay > MAX_DELAY_USEC)
! cur_delay = MIN_DELAY_USEC;
spins = 0;
}
On Tue, 2 Apr 2013 09:01:36 -0700
Jeff Janes <jeff.janes@gmail.com> wrote:
Sorry. I triple checked that the patch was there, but it seems like if you
save a draft with an attachment, when you come back later to finish and
send it, the attachment may not be there anymore. The Gmail Offline teams
still has a ways to go. Hopefully it is actually there this time.
I'll give the patch a try, I have a workload that is impacted by spinlocks
fairly heavily sometimes and this might help or at least give me more
information. Thanks!
-dg
--
David Gould daveg@sonic.net
If simplicity worked, the world would be overrun with insects.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi David,
On 02.04.2013 22:58, David Gould wrote:
On Tue, 2 Apr 2013 09:01:36 -0700
Jeff Janes<jeff.janes@gmail.com> wrote:Sorry. I triple checked that the patch was there, but it seems like if you
save a draft with an attachment, when you come back later to finish and
send it, the attachment may not be there anymore. The Gmail Offline teams
still has a ways to go. Hopefully it is actually there this time.I'll give the patch a try, I have a workload that is impacted by spinlocks
fairly heavily sometimes and this might help or at least give me more
information. Thanks!
Did you ever get around to test this?
I repeated these pgbench tests I did earlier:
/messages/by-id/5190E17B.9060804@vmware.com
I concluded in that thread that on this platform, the TAS_SPIN macro
really needs a non-locked test before the locked one. That fixes the big
fall in performance with more than 28 clients. So I repeated that test
with four versions:
master - no patch
spin-delay-ms - Jeff's patch
nonlocked-test - master with the non-locked test added to TAS_SPIN
spin-delay-ms-nonlocked-test - both patches
Jeff's patch seems to somewhat alleviate the huge fall in performance
I'm otherwise seeing without the nonlocked-test patch. With the
nonlocked-test patch, if you squint you can see a miniscule benefit.
I wasn't expecting much of a gain from this, just wanted to verify that
it's not making things worse. So looks good to me.
- Heikki
Attachments:
clients-sets.pngimage/png; name=clients-sets.pngDownload
�PNG
IHDR � � ,� PLTE��� ���� � ��� � ���@ �� Ai��� �@���0`�� @� ������**�� @�� 333MMMfff�������������������22�������U��������������� � d �"�".�W � �p � ���� ��� � �����P����E ��r��z�����k������� �� ���� �����P@Uk/� ��@�@��`��`��� ��@��@��`��p������������������������|�@�� �����K �IDATx�������E����;82���^���v:b%��Q� �M����>��^�/��m A�>D�Y�Ug����O�v�
T�w*��*�� �E�O���|��N�'��I����1^T�5�py���/�o=����Q��9����!��\�����\XT@�t�e��������jp����n6V�����������7��
��>u�X�{8bq��L��+ �pZ �t���p>�E���8������Am\��� �8���}��]0�J��x���� 40<�b?%�!
�Q)"�a%�>�`x�*JX}0V��*�U0���`���U�`(!����P(
w���x����_D��t���y|���t�&�8�/"��<_�z]�d�������_DM�Z0k*����m���;7^�ZC�-Mi��3�-������#�@�"��k]��������>����� h�uV3g�s�r�O!U�
�5"��_DM9G�<R������@@1��slm�9��nP\��
���p�����)fJw�t���8@�WO�y�1Z@19�;���)�~��/"P~��/"P~���S@@���w����E
����E
��������^�R�Z����'���w��+�������?�E�/��<�sW"8��_y�~A���.������x���������9�Z�'����n������z�p�;/��7��:/V=�9~����?3�I�U��0�����4�q^�j�g��W�{T��(VWj
h��6��k�Z��?w��+kG�_A�l��
�� �~xv�j������� �n�p��g{~��/��X��^�|Kw����Z�Sj�ho�v�@���xR�5�!9�b�jnK�v�8F�6�(b����:6��n~�2���?�����ED���qTp�_u��q��Z��X����/[���r�7��s�qVp��}~��/"V���9T���uh-�W��"
L����H�s.ey���^��5����E��P����WH@kP���{��{dw�������?F����K�S��2!����NN�.��#[��+a�u1M�[Bt�[
�%*������������,oR��&k�����|�����o�&;�i�~��_\����;7�����Oj����Va�t��?k��oq?���)�y`G�S��\�98����oNW�vv������mS��g}�����]�����*E�4
1��t~���L�6��}5z�g�#��oe�y����=�
���7iw����[rehO����&�������W�!y�.���|�����O�4�A,?��}�x����ii�t����w��C{�]���<;j�a1+S�����*�]�*��E���o����u�in'e7������,��2���A����Tv�i��H������
��c���~O:u�����&�>7�t�.G��xe+��W+2��I�n.u��Fm~���D9d-Q�����
��x4��Yika�xt�(���u>y���8� ��[�����:�L|�6�Q����G��Y
3O �� �U]HN��F��Z����JBw|��%��8�,���y��I�q1��G��W}��z0����u��%J��LX��fk�yt�\�1 ����]������(����c�
�U��p��h{�m��xe��$s�������bd�:��:���n~�g����j�_�<��h��]�}~���G�*�W����Qg���6�_`rr@3�*&���:���<V����&�U����Yb����4a���;]'�Y���/�K�>/����_%�.k>-�R�
��R�u������v1
�[���O@e�|A�U�����hwz��RK@�1 ����%v/�s�����b����G�����]0���7���AC]�Q�Y+#��������! 1Km����s���u�u����G�l��x��Tf��Z�_�l���c �=7�r� ���4^��*�U��s@��f0�Ae���y�
��q9#�����Et��,�f����E��r���,�����Y��K������%���!
R�2��}����2��Q>�|
-`?�|���_�_u��(�������$���_D��Z?}���Z�&�W����� ��8�����F���\�8�[�a����_DY��/�2����9 �����_D� �������E�����c0~�������U��0�;=6^��y��.��g�v,D�W��"J�~��&�W��"J3N��;� `���]9��0��J���y���~��<8# de�T����N��:C���D��)���@V���wn_��]FN��9��B�H�1t'B�iu;W~
<���R�3�@�s�*y��r���'�����0�A��_�������� i��.��,��rKw���o�3�����
��d�@z����I�0�t�BXh3�W�o��E�����o��0
�]��5���M +s `~y��%��4���_D^�>8����n~y9��aOA~���P_�cH]�m�U��0�[�3���PB�$b,��
���o�^ �W��"���>@��_u����� f�S���7��� ���d�J���y��c���@v���90��N����N����E��� @@"�U7��.����n�9`~�
f��!��]��,�� de��/"�+�B��_u���$f&�W��"2�� j `~��:�� *��I ��R.&�j��]q4m������ ���5�ia~��/��� ���`�2.������P���$���mx=T�- r���zL�B���h�N���t�`���n~���3� �U7��47f` `~��3�� jf�0 �J��C�b$���L����E����)�U7���4�0����>�9`�,e�$G��t<H 5������E��V0����v+�g�@�(|�{�Q8�o��9`c���<��X�B���]5&����
>�[�L��'��Q(o���K��nV�u+�$0
��Z�i�z�`�e�)y'�$��@�0S�q'�$0
i��H�%�jF��I���������U�$��32o�3�1(!�#�.�1�!���@��9�"P��3���IQ���q�(��&�?.�tY�"�tki\�a����!8�L�� $���GDO�������MXDD���QXT�����1��1XT��po �r@\!5 �pt��p)r|e�.8��vh��E�0F��vi�x�0�W���9�yI��#]�.E��w!���G��OO� #@@���#��^��v�t9 ���k�V@����K�+�-Jt����@@�C 04(0��08�Xo�(�h�9`��K !`�R��{=��<��$Ju���c�V�r^y
[ X�?![���r,3� ���b�X�@���p;��f��=�sU@�raR�l����1C�4�%
���a���5h"`Q� `�k9`
�����h��%:D��/��]���� d�M0���K�+`&DWAM��� W�`�GBhn�������hR@� b�a2S�I�W6��z��V�����!�K�+`&4�I��� �����!���8�$F!!�v�Y)Jr
U�E# r�T0��B���������t�!�,`�E�0���j�A� bL�(90r�����/X����JD��'+$�rm �@'�����b~����R4J���;�|���F�3H�w��k]��^D*�CP��G��j�S���0���I1D��)��������� � d�����h��R��-��O�rs\x�C�{���.�?�.o���9
��X�0�����OB�����+`$E�0���z��^>(���P@��C!`
�G��F�{���^��~>'`���#V�P�r�����"�AY/&��UN���hD�:`�>8��l���-������\���s@ 9�%o�� �"W@-;`\�(��ltdI���X�:~ ���R�[0e��\@��5�p���R}��$~3�Z����\�7 ��>>!`���$��>�
h�(��0�����kP@��}�A��$��D<������p�zU�~pi ��O�c�Q��)���/}�D@��5���b�PG�~p]������"h��pMQ�G�u=9`
H|��Q�� m�
���F�C����S=F!'D��4$����Q�+���P�(�@��k�@����"���p\���."f�f5�u�, �����i��
uv(W@���up���+�|s�r�����K{+�IL�u�G�����J�n}��.�?Tt�)���+ MH/����g�%����C����c�����+���������n�6��ker�B�#�����!�����w����7i;���R��<�I|c�-�=��2���G�����/� ���f[�b�$�x�1�������_M�Q�&`��|���~��6[���q$$^����$,�8.�R����L��}�����n�,�����C5�����G��
(��;��������u��}����i�L�����0K�*�1�����~��4g�����h)MZ@"G�t�l�*g?Q��[�hj�����<�NA�5���WG��x���+4���%�����xa����<
b������*�����)N@�
�*�^����[��[�U�jKl(M��?�9��v"�����Q�AH>�<:`�����B�(�"���="cd �
� ��0��(�D�n#E��u����b�Ah1����^1�b�qbp+p1����?
\��|� �P��9`�b�Z`\�h1�5�!F@�(@%���cp��=��Jt$�������%�� ��#B�Q�J�� �Jt$�p%����%W�������y������^��/X%:�A� 7���e��@@�X /�X��
��y��Q�X%:�A? ���,��P�g�����?�V�����|<L6���e��|[@t�Rx���O
��k���� �������ffD���DGb1|s��,&`N���[�Xg7���M:`A���{* ��DIu��OE��{j"�z�5��oH�����?��������Dn�y<
�����%�w��DD��e�"G�#a�H����� �$���q�Jt$��pG�~�QX}��������]�,�$�{7. �� �pGVa�I�K�`V���b>�^*���,��� ��O$����5���k���DGb1�/���e��|K@t�Ry���O*�������Y%:��Rx�d�-K,�C�My���O2��9�*��X�r�'�����%� �y�8"�����o�y�hE�[@����U�#��o���������%� q5���]�4�����c��H,�9 �\�h1��L�1���T@��
L|~�U�#��9���{n���,����#�G`* F�_����3��n��X%:���o\�Q�����_�
"V@�bX��V�����5|k1���������D�3*Ed1����kt�o.�U0�v|����5�= �J^
xX����p��Q��������%j+d��=/h��i8D��c���h��f-��J���.r�-w4v[�E�T�CJ`���2H
Z����"(�"����I1�g2�a%�w���2(
���Y1FM���,�z"���{^�6�a�� ;�h���z����D��T�&��D�'N��Q9���j���4Q4d3Tfi |���d'