Auto Vacuum Daemon (again...)

Started by Matthew T. O'Connorabout 23 years ago17 messages
#1Matthew T. O'Connor
matthew@zeut.net
1 attachment(s)

Several months ago tried to implement a special postgres backend as an
Auto Vacuum Daemon (AVD), somewhat like the stats collector. I failed
due to my lack of experience with the postgres source.

On Sep 23, Shridhar Daithankar released an AVD written in C++ that acted
as a client program rather than part of the backend. I rewrote it in C,
and have been playing with it ever since. At this point I need feedback
and direction from the hacker group.

First: Do we want AVD integrated into the main source tree, or should it
remain a "tool" that can be downloaded from gborg. I would think it
should be controlled by the postmaster, and configured from GUC (at
least basic on off settings)

Second: Assuming we want it integrated into the source tree, can it
remain a client app? Can a non backend program that connects to the
postmaster using libpq be a child of the postmaster that the postmaster
can control (start and stop).

Third: If a special backend version is preferred, I don't personally
know how to have a backend monitor and vacuum multiple databases. I
guess it could be similar to the client app and fire up new back
everytime a database needs to be vacuumed.

Fourth: I think AVD is a feature that is needed in some form or
fashion. I am willing to work on it, but if it needs to be a backend
version I will probably need some help.

Anyway for you reading pleasure, I have attached a plot of results from
a simple test program I wrote. As you can see from the plot, AVD keeps
the file size under control. Also, the first few Xacts are faster in
the non AVD case, but after that AVD keeps the average Xact time down.
The periodic spikes in the AVD run correspond to when the AVD has fired
off a vacuum. Also when the table file gets to approx 450MB performance
drops off horribly I assume this is because my system can no longer
cache the whole file (I have 512M in my machine). Also, I had been
developing against 7.2.3 until recently, and I wound up doing some of
these benchmarks against both 7.2.3 and 7.3devel and 7.3 perfoms much
better, that is it 7.2 slowed down much sooner under this test.

Thanks,

Matthew

ps, The test program performs the following:

create table pgavdtest_table (id int,num numeric(10,2),txt char(512))

while i<1000
insert into pgavdtest_table (id,num,txt) values (i,i.i,'string i')

while i<1000
update pgavdtest_table set num=num+i, txt='update string %i'

pps, I can post the source (both the AVD and the test progam) to the
list, or email it to individuals if they would like.

Attachments:

avdtest.pngimage/png; name=avdtest.pngDownload
�PNG


IHDR��,�sBIT��O�)PLTE��������������@�@@��  ���`���@���`��� � 0`�@@@@���`�`�``�`���`@��`��`�`�����` �```��    @@ @�`� `�``����@����������  �`����`�����`�@�@@�����`������������������������``������������ ����  � �� �  �@�@ �@��`��`�������`�����@��@��`��p�� ������������7�@tEXtSoftwaregnuplot version 3.7 patchlevel 2 on Linux 2.4.18-17.8.0{��/IDATx�������`�d����w�2'EA)c�{v��a�AAAAAAAA����<��DB�e�7��r<�onN��te����z�k��w��o'8]�Z�>�B!��PJt�x����

pY�;}t�����?G����F\�����>��G\t��������E�&7��B�����8/�7��O �Y��������L�.���ka���������i��n�;��r&�0���E�> -�����O�rN�pj�����$mz# �F�!lY�LX
��\�#�N�GV�6\?<�
^���U@g��W�#���l���D>
"� � � � �4}l�n��<p�O�C��X3��7�?$=��������#F���Cf��q�'��(�(x�5Tj����?��=��V�8D�����,��Lw��T*|�5���P��J�����;*��i 	o��+�I*(�P}%���I�p�������ByWzWW��'q�LpUc�1 �����ie�i]���F�?��L�/����;�E4��n����y��s�o�� �?t���V������f��"���i�����VL��V���<��tf��/�������VL�:8�r5#3DH1�NC�$�O����	���@�n�����9IZ���S�gb�����OK��F���,����;(� tL��j�r��d�i]���F���:�3@��/w��h�%���I���'I�K��y[}&`������O�k�cd%�(H����
��V�7�?���i��h[	��V'�������+��3�P��GZ1e���:���h������;�$�O���y��j7
���{��"�\��Y�?R��{�#���F�%�:�����L�I^�m���j�7�>�'��'�L����?6�Y��:���M���,�BP�C��{��+���ULI��}�6�G*�����[$��������@�.<Zq���S�i��	��%�(�GU R����k��|�dm���������#�iI��l����V�;��G�@p]�d�g��N��j"�@��r��bY����)�H��p�G�p�T(}�����[8g���~���w7��vR��5�\�ljuUR��m���m>�I���7&�����{���7��Fx{�����M��� ���<�m�g����fcC�5JK��0\����[�P� �6���q��*��[�>��w���&��^�����/�;�Z����
I!�N��7���%�(H�1�8P>x�}i�W�<�0�6�����+�
��*���	I�D��� ��m [_�X�;Y��e�S2��h����at��t�������l �I�U�h��p�|�,L�\�[�<_�g^�����"�Om>��V�\>��g���#�����������/�����.l���>�����A@x,~�������{�@u�#t>���F���
��������Po[��Ay��?.�����wDM�R��>��N|@�$��v���'> �R�����wP�T�H �Y���{�8U��_;@�%������R�l����S��X*{i�*���{G�"G� �_������f����6�mS+w�@�|����G��g��������<��J��>uE�-~�~�@Y��wP\�7��)�6'���J"O�\T����P�sS0�m(�s�2�9������r�Eoo
��9@�)�5"P���wt���\�M��������^��+A�� _��:�hy���U:@�l�b+x�����r_~���
�pJu��+��P�4�\��������/~H��j����[���9S���0��-D�)
p��5�����H,v�,
�(�xl~����e� k|8y��k�z
@��'P^�>@��_U�"�($�kH&��������D����y���H^�e�4�H.ze� �P2� '%�I�����L8��/
���L�t������S���~��-d� @��N�<�4[�I�8���M� @�
Pm�^���NYJ�L����P:���8@��U��O|�!�dP��i����3�q������mVW����)q�B:hO|~P��&	�p
��x_�*���L1%�?i��D�a���M�����MF
P/����iq ��y��M0��yD~q������
D�\<FE��@�>B�;�E�O�/���V�p3+w�F�(@�=@u}�~�@HN���������HO��HW������"��_iGNr	PX����PD��lQ�>��� _���{i���gJ2�������n�#�h~�a�:`�E�0[7��9�~��b<�4W|>h�ZI�v$�	@s�M$�p{@D'���.����I�5������PP������(0��gJ ��pv!;�X`�V�y�).@��x���,��{��//8S��KN�����@�FN�KWDn'��#�)�D�7F
�w�W�@�������q�)�=s�{ym r#!�l�/�
Dn$`&�"������qg�3�����i�ap�8x���O�x�)����O��g
���>��m<�Ep�l���O��g��������LQ���s�?����L9��j*8Sdc����"���S�9~!�5�z�fwH�l����B�D,����sy�w�C�e�?��B\������Cj���/�&t�^3�-��#����G���d�K!�������3_B	o�D�#�2V����/�oG�-����C^����W�d���V�;�P�9�*��<���4;����7�Ku���=��9��[$������X3��}\�C6���j	�y�9���k&�`����`n�g�!9��o�wg?X�fw�N������y������}���� ~/�fN���7�h���0
 �D�o� �r�����c���5��J���{om�Sb<��u�����e�O�F+������\}��o�v7W��u��^�z��ZR���7@�k9IW��`�%�;I��P�\*�H�5�z�]ws$��k���q��a���%����)��%����E�e^r�dn��tS)��/��O����jwC���K��.0�<A,�{?��=��S�_����~�0����jn����1V���; F�zy�so�|~>>����`o���g?w	��(Om�w�g��c��<����/D�`#�J
��t�Ov���)�c��|���x7%����$E�����FI!�|�
o�(��F��``7L�u0;�u0��#��)��w���`|/e�u���zOa�Y)�0'��?�=��G�09/�����_�����?����0-���`�|����6/
�{��u�e��7.@.���WF(�W�M
����W�u���j�%+;�����'z�����F��05����zj1�\�8��������:���1,���S�~�������?��=����	���j���zx�xr����
�J��O��|+��x'�����0��Z���_Jz�j��� .�[>5������OU~�N*� 6B�S��CWF��a����'z�v����z�[�Yw������f�	?@�o��0���(>��������
0�&�,g��_��������|��w��E-��[������1?����������~��s��_|+��`i�N�	�������oX��R�y��=�v�`3�:�aw-�~�)����oX��W��`S�p:�M�k���������3����
����Lk�����}���_J�� �F/�����?1��R��Q8�V�
0}%p���� .�{�v���������O�g�i������#�� ��K���xrJ~r+l�_��p���?�=��}�NF���Y.���<[/���~#�n��f8|���t��5����<H+
fO���9�{�Yz��?�g'�,ow��w��6����u����Mo�8�����
�b�����Jw-�G��n�%���v6L���x�|��m�WC���W�+�a�4�?�n�?�=�G��m1=������7����,!}���:u����"��;��@��r|�cI�>�4;�����v<�v���~%�t���O4pO`,��W�L���@X|��n2�>��rh+���z��z8Xw�2?��]�2�	��]���{8?����4�	������w�,c
�w�(������ww?�
�ww;C��^���ww/���>���ww#��w����`��������./������d��O`��22�?l���Y�S����%f�&�Y�`��R2��y���V�L�oZ�
�Z�L��X��3��I�k������_k+a`�4cO���9�{��������Ze*�N��?�.����dN�	�_��)���`�L0m�����s���%q��d6��;_�N&����d.����Nf�u��d"�y���N��{��d���`�L0����)�9�j���Q���3��N&x��S����=��dt�����:���}`�
�����NF��{o�d}`��~��dX�O���*���������c�{���U�N�X��X',s��������pK]�
��q<�.�W���o,w�5�W��'��M��y`�k�`��G��������]���$���%z����F���<�:�\�(�fw�R���X'�Epo������u�o�,��{��J�k�`����DD����_��_Y��=6�]T�;��+P~
���0i���k���L��k������k�508	2�)i����������J,���v��>V������	p
�}�������P����c��G��O�@��=�R����O���>�@v@��[[�<)������o���P���D+�"8x"K�/`���-@�U6�_��)��]L�P"K�A8�����\C-{�0:�r'��9���M�2&\�w��M��Y�=�k���'����Ys��~d���?y.������_�������g��D�Wk�6�5��y���?W5�����w@�+�F=����{;x2gW��U��o}�mt{�
0���\���'����DO��O�d3�x6������)���D&���%�>�5�c���I������;�4U@��h�J����,����������)@��{e��{���2��+f����?v�{`V:��nn����r�U��y5�jD��M�;�mz������o���L��Gp	�(<V��f�������=���p9|������������R��&	@(O�^:�;5��s�k���]��~C����/J��P�J����Y������<��.�Z���
�?�T�r�4�d
�������j�C���Ox�-��M��^%hv(K����s]-���3���?�
���#&�<^Um�O1`<�������(5@�|����U����m��"}<���w�]]#d�����w���3�	�{Ih�oy?���RA��W�2���J��8��_�
��],�8���� 9���e��.@S��~V���^�[��f��L��:	����d�u���E���&��G�PjN�Twh$fc���'�I�L����������n���������p��@o�"w��0=�c�s�� @uj����|�����Q<�tB��?���Y=>����G^���n�|��8@>�%i"	 ����>��e���b�xL��o���@�
\a�\O���
�4:-�
Po}��m���>(�N�����y�{E���0t��Mh�.Z�!@���o���G����|h��|���K��'��_�@xj����uv���.�}�_2@>7��i�:�}q�8�" ��;U��<��&t?�'e@���'@��p�::��J��������,@3����~�KoJ���[�`��6���o<6V@��jI�X��.�]���$@u�A�\���Z��jcx���{��7V���<���i�E�s>#.������1!�����7a��=21�-$�2Q�{c�Jc�i����orl���WN�����%JU�8�x��:�Z������X1^w��g���[��j���y�>.\����[0�i���4�����������+����*0d�]��:���yZ�������:<��5�a�<��
6yk%����f��^��0�5�Z;v0���5�>�9��S@����9h�w��"M�S��I
�fN�����c��n��]�$��p���	��J�+�-����,V��~K���o9��f�������j���(�V$��-��d��~`�`W��J&��c���XUFO�;�W�� 5uy�	;��m�,B�	#C��y����Q�j�?2F��M��H���	����U/���8�$|p#oH�{q�=����O��-JJN����6w�_�aB42,B��K�����C������]��&�q�'`>=�iZy�=��d_�3�l�� oS���;���E�9�v�&��-����>���g�~�cp�_����})�	)�#W�]��F�	;���v[E��K���e��|�Q��;2�f�x�b?a���y_�<��"#�m����R��XM;+�^��;���:R��KU��}���VI�NC:�g�'�X�
������Z���`5X�����g]�~�t;��Y���fM��?Rmesme���>����^=�;��E��n���}�R�����g����=�������d����t��nz�e�LO%��� ��2�����
Yu�@4����t??s��U]4E�S{{��T�Rf���r7��T�R���^h/K���Z@�9*-�SO��T�w�u6Q�I%��L������h)p����t����TG��Z�aA��AA�x2W2Z)�./�f%5}Y������nL���j���'[a$s3[M��_b]^��Jj��{���?����5#nT
}�5���j���vPfL5*-f��W��R^%;�Y��^O�3�N��'���Nr�,�Rj]�	gU7*�?���)�����$�+������J�i{�8\`J�F��TJ��G��J�bfOj�W��N%U�FO7*>�Gm�����h�V�N�%����vO�+��<qcF>�\����sg#�����;~��[3"X�����)�
��!� � � � � H+	|���
���^��$�l��G�	GN��1�#�p�=F-�
�������z@zB�z��+b����.�VA�-��M�2s(*sJ�����<���GNc��qG5�)p��B�\�'�Q���,�UA��f�t�j�����T@A�.�Wa���&�IEND�B`�
#2Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Matthew T. O'Connor (#1)
Re: Auto Vacuum Daemon (again...)

On 26 Nov 2002 at 21:54, Matthew T. O'Connor wrote:

First: Do we want AVD integrated into the main source tree, or should it
remain a "tool" that can be downloaded from gborg. I would think it
should be controlled by the postmaster, and configured from GUC (at
least basic on off settings)

Since you have rewritten in C, I think it can be safely added to contrib, after
core team agrees. It is a good place for such things.

Second: Assuming we want it integrated into the source tree, can it
remain a client app? Can a non backend program that connects to the
postmaster using libpq be a child of the postmaster that the postmaster
can control (start and stop).

I would not like postmaster forking into pgavd app. As far as possible, we
should not touch the core. This is a client app. and be it that way. Once we
integrate it into backend, we need to test the integration as well. Why bother?

Anyway for you reading pleasure, I have attached a plot of results from
a simple test program I wrote. As you can see from the plot, AVD keeps
the file size under control. Also, the first few Xacts are faster in
the non AVD case, but after that AVD keeps the average Xact time down.
The periodic spikes in the AVD run correspond to when the AVD has fired
off a vacuum. Also when the table file gets to approx 450MB performance
drops off horribly I assume this is because my system can no longer
cache the whole file (I have 512M in my machine). Also, I had been
developing against 7.2.3 until recently, and I wound up doing some of
these benchmarks against both 7.2.3 and 7.3devel and 7.3 perfoms much
better, that is it 7.2 slowed down much sooner under this test.

Good to know that it works.

I would like to comment w.r.t to my original effort.

1) I intentionally left vacuum full to admin. Disk space is cheap and we all
know that but IMO no application should lock a table without admin knowing it.
This is kinda microsoftish assumption of user friendliness to make decision on
behalf of users. Of course, sending admin a notigication is a good idea..

2)In a cluster if there are many databases and time taken for serial vacuum is
more than time gap between two wake-up intervals of AVD, it would get into a
continous vacuum. At some point of time, we are going to need one connection
per database in separate process/thread.

Thanks for your work..

Bye
Shridhar

--
Distinctive, adj.: A different color or shape than our competitors.

#3Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Shridhar Daithankar (#2)
Re: Auto Vacuum Daemon (again...)

On 27 Nov 2002 at 13:01, Matthew T. O'Connor wrote:

On Wed, 2002-11-27 at 01:59, Shridhar Daithankar wrote:

I would not like postmaster forking into pgavd app. As far as possible, we
should not touch the core. This is a client app. and be it that way. Once we
integrate it into backend, we need to test the integration as well. Why bother?

I understand and agree that a non-integrated version is simpler, but I
think there is much to gain by integrating it. First, the
non-integrated version has to constantly poll the server for stats
updates this creates unnecessary over head. A more integrated version
could be signaled, or gather the stats information in much the same
manner as the stats system does. Also, having the postmaster control
the AVD is logical since it doesn't make sense to have AVD running when
the postmaster is not running, also, we what happens when multiple
postmaster are running on the same machine, I would think each should
have it's on AVD. Integrating it in I think would be much better.

There are differences in approach here. The reason I prefer polling rather than
signalig is IMO vacuum should always be a low priority activity and as such it
does not deserve a signalling overhead.

A simpler way of integrating would be writing a C trigger on pg_statistics
table(forgot the exact name). For every insert/update watch the value and
trigger the vacuum daemon from a separate thread. (Assuming that you can create
a trigger on view)

But Tom has earlier pointed out that even a couple of lines of trigger on such
a table/view would be a huge performance hit in general..

I would still prefer polling. It would serve the need for foreseeable future..

I agree vacuum full should be left to admin, my version does the same.

Good. I just wanted to confirm that we follow same policy. Thanks..

Well the way I have it running is that the AVD blocks and waits for the
vacuum process to finish. This way you are guaranteed to never be
running more than one vacuum process at a time. I can send you the code
if you would like, I am interested in feedback.

The reason I brought up issue of multiple processes/connection is starvation of
a DB.

Say there are two DBs which are seriously hammered. Now if a DB starts
vacuuming and takes long, another DB just keeps waiting for his turn for
vacuuming and by the time vacuum is triggered, it might already have suffered
some performance hit.

Of course these things are largely context dependent and admin should be abe to
make better choice but the app. should be able to handle the worst situation..

The other way round is make AVD vacuum only one database. DBA can launch
multiple instances of AVD for each database as he sees fit. That would be much
simpler..

Please send me the code offlist. I would go thr. it and get back to you by
early next week(bit busy, right now)

Bye
Shridhar

--
union, n.: A dues-paying club workers wield to strike management.

#4Matthew T. O'Connor
matthew@zeut.net
In reply to: Shridhar Daithankar (#3)
Re: Auto Vacuum Daemon (again...)

On Thu, 2002-11-28 at 01:58, Shridhar Daithankar wrote:

There are differences in approach here. The reason I prefer polling rather than
signalig is IMO vacuum should always be a low priority activity and as such it
does not deserve a signalling overhead.

A simpler way of integrating would be writing a C trigger on pg_statistics
table(forgot the exact name). For every insert/update watch the value and
trigger the vacuum daemon from a separate thread. (Assuming that you can create
a trigger on view)

But Tom has earlier pointed out that even a couple of lines of trigger on such
a table/view would be a huge performance hit in general..

I would still prefer polling. It would serve the need for foreseeable future..

Well this is a debate that can probably only be solved after doing some
legwork, but I was envisioning something that just monitored the same
messages that get send to the stats collector, I would think that would
be pretty lightweight, or even perhaps extending the stats collector to
also fire off the vacuum processes since it already has all the
information we are polling for.

The reason I brought up issue of multiple processes/connection is starvation of
a DB.

Say there are two DBs which are seriously hammered. Now if a DB starts
vacuuming and takes long, another DB just keeps waiting for his turn for
vacuuming and by the time vacuum is triggered, it might already have suffered
some performance hit.

Of course these things are largely context dependent and admin should be abe to
make better choice but the app. should be able to handle the worst situation..

agreed

The other way round is make AVD vacuum only one database. DBA can launch
multiple instances of AVD for each database as he sees fit. That would be much
simpler..

interesting thought. I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well. I understand what you are saying about starvation, I
was erring on the conservative side by only allowing one vacuum at a
time (also simplicity of code :-) Where the worst case scenario is that
you "suffer some performance hit" but the hit would be finite since
vacuum will get to it fairly soon.

Please send me the code offlist. I would go thr. it and get back to you by
early next week(bit busy, right now)

already sent.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthew T. O'Connor (#4)
Re: Auto Vacuum Daemon (again...)

"Matthew T. O'Connor" <matthew@zeut.net> writes:

interesting thought. I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well.

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

regards, tom lane

#6Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Tom Lane (#5)
Re: Auto Vacuum Daemon (again...)

On 28 Nov 2002 at 10:45, Tom Lane wrote:

"Matthew T. O'Connor" <matthew@zeut.net> writes:

interesting thought. I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well.

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

Hmm.. We would need to take care of that as well..

Bye
Shridhar

--
In most countries selling harmful things like drugs is punishable.Then howcome
people can sell Microsoft software and go unpunished?(By hasku@rost.abo.fi,
Hasse Skrifvars)

#7Matthew T. O'Connor
matthew@zeut.net
In reply to: Shridhar Daithankar (#6)
Re: Auto Vacuum Daemon (again...)

On Thursday 28 November 2002 23:26, Shridhar Daithankar wrote:

On 28 Nov 2002 at 10:45, Tom Lane wrote:

"Matthew T. O'Connor" <matthew@zeut.net> writes:

interesting thought. I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well.

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

Hmm.. We would need to take care of that as well..

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

#8Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Matthew T. O'Connor (#7)
Re: Auto Vacuum Daemon (again...)

On 29 Nov 2002 at 7:59, Matthew T. O'Connor wrote:

On Thursday 28 November 2002 23:26, Shridhar Daithankar wrote:

On 28 Nov 2002 at 10:45, Tom Lane wrote:

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

Hmm.. We would need to take care of that as well..

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

Right.. But I will still keep option open for parallel vacuum which is most
useful for reusing tuples in shared buffers.. And stale updated tuples are what
causes performance drop in my experience..

You know.. just enough rope to hang themselves..;-)

Bye
Shridhar

--
Auction: A gyp off the old block.

#9Matthew T. O'Connor
matthew@zeut.net
In reply to: Shridhar Daithankar (#3)
Re: Auto Vacuum Daemon (again...)

----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: "Matthew T. O'Connor" <matthew@zeut.net>
Sent: Monday, December 02, 2002 11:12 AM
Subject: Re: [HACKERS] Auto Vacuum Daemon (again...)

On 28 Nov 2002 at 3:02, Matthew T. O'Connor wrote:
I went thr. it today and I have some comments to make.

1. The idea of using single database is real great. I really liked that
idea which keeps configuration simple.

I'm no longer think this is a good idea. Tom Lane responded to our thread
on the hacker list saying that it would never be a good idea to have more
than one vacuum process running at a time, even on different databases as
vacuum is typically io bound. Since never want to run more than one vacuum
at a time, it is much simpler to have it all managed by one AVD, rather than
one AVD for each database on a server.

2. You are fetching all the statistics in the list. This could get big if
there are thousands of table or for a hosting companies where there are

tons

of databases. That is the reason I put a table in there..

Of course not that it won't work, but by putting a table I thought it
cause some less code in the app.

I don't see how putting a table in is any different than checking the view.
First I don't like the idea of having to have tables in someones database, I
find that intrusive. I know that some packages such as PGAdmin do this, and
I never liked it as a developer. Second, the only reason that it would be
less work for the server is that you may not have an entry in your table for
all tables in the database. This can be accomplished through some type of
exclusion list that could be part of the configuration system.

I will hack in a add-on for parallel vacuums by tom. and send you. Just
put a command line switch(never played with getopt). Basically,after list

of

database is read, fork a child that sleeps and vacuums only one database.

See comments above.

Besides I have couple of bugreports which I will check against your
version as well..

Please let me know what you find, I know it's far from a polished piece of
work yet :-)

After a thorough look of code, I will come up with more of these but next
time I will send you patched rather than comments..

I look forward to it.

Also, I wanted to let you know that I am working on integrating it into the
main Postgres source tree right now. From what I have heard on the hackers
list it seems that they are hoping to have this be a core feature that they
can depend on so that they can guarantee that databases are vacuumed every
so often as required for 24x7 operation. Basically I will still have it as
a separate executable, but the postmaster will take care of launching it
with proper arguments, restarting it if it dies (much like the stats
collector) and stop the AVD on shutdown. This should be fairly easy to
do, I still don't know if others think this is a good idea, as I got to
response to that part of my other email, but it is the best idea I have
right now.

Sorry for late reply. Still fighting with some *very* stupid bugs in my
daytime jobs ( like 'if (k < 60)' evaluating to false for k=0 in release

version

only etc..)

Good luck with your work, I hope you find all the bugs quickly, Its not the
fun part of coding.

Thanks again for the feedback, I really want this feature in postgres.

Matthew

#10Greg Copeland
greg@CopelandConsulting.Net
In reply to: Matthew T. O'Connor (#7)
Re: Auto Vacuum Daemon (again...)

On Fri, 2002-11-29 at 06:59, Matthew T. O'Connor wrote:

On Thursday 28 November 2002 23:26, Shridhar Daithankar wrote:

On 28 Nov 2002 at 10:45, Tom Lane wrote:

"Matthew T. O'Connor" <matthew@zeut.net> writes:

interesting thought. I think this boils down to how many knobs do we
need to put on this system. It might make sense to say allow upto X
concurrent vacuums, a 4 processor system might handle 4 concurrent
vacuums very well.

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

Hmm.. We would need to take care of that as well..

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

I can easily imagine larger systems with multiple CPUs and multiple disk
and card bundles to support multiple databases. In this case, I have a
hard time figuring out why you'd not want to allow multiple concurrent
vacuums. I guess I can understand a recommendation of only allowing a
single vacuum, however, should it be mandated that AVD will ONLY be able
to perform a single vacuum at a time?

Greg

#11Greg Copeland
greg@CopelandConsulting.Net
In reply to: Shridhar Daithankar (#8)
Re: Auto Vacuum Daemon (again...)

On Fri, 2002-11-29 at 07:19, Shridhar Daithankar wrote:

On 29 Nov 2002 at 7:59, Matthew T. O'Connor wrote:

On Thursday 28 November 2002 23:26, Shridhar Daithankar wrote:

On 28 Nov 2002 at 10:45, Tom Lane wrote:

This is almost certainly a bad idea. vacuum is not very
processor-intensive, but it is disk-intensive. Multiple vacuums running
at once will suck more disk bandwidth than is appropriate for a
"background" operation, no matter how sexy your CPU is. I can't see
any reason to allow more than one auto-scheduled vacuum at a time.

Hmm.. We would need to take care of that as well..

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

Right.. But I will still keep option open for parallel vacuum which is most
useful for reusing tuples in shared buffers.. And stale updated tuples are what
causes performance drop in my experience..

You know.. just enough rope to hang themselves..;-)

Right. This is exactly what I was thinking about. If someone shoots
their own foot off, that's their problem. The added flexibility seems
well worth it.

Greg

#12Rod Taylor
rbt@rbt.ca
In reply to: Greg Copeland (#10)
Re: Auto Vacuum Daemon (again...)

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

I can easily imagine larger systems with multiple CPUs and multiple disk
and card bundles to support multiple databases. In this case, I have a
hard time figuring out why you'd not want to allow multiple concurrent
vacuums. I guess I can understand a recommendation of only allowing a
single vacuum, however, should it be mandated that AVD will ONLY be able
to perform a single vacuum at a time?

Hmm.. CPU time (from what I've seen) isn't an issue. Strictly disk. The
big problem with multiple vacuums is determining which tables are in
common areas.

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

#13Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Rod Taylor (#12)
Re: Auto Vacuum Daemon (again...)

On 10 Dec 2002 at 9:42, Rod Taylor wrote:

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

Sorry I am talking without doing much of it(Stuck to windows for job) But
actually when I was talking with Matthew offlist, he mentioned that if properly
streamlined pgavd_c could be in pg sources. But I have these plans of making
pgavd a central point of management. i.e. where you can vacuum all your
machines and all databases on them from one place. Like network management
console.

I hope to finish things fast but can't commit. Still tied here..

Bye
Shridhar

--
QOTD: "It's a cold bowl of chili, when love don't work out."

#14Greg Copeland
greg@CopelandConsulting.Net
In reply to: Rod Taylor (#12)
Re: Auto Vacuum Daemon (again...)

On Tue, 2002-12-10 at 08:42, Rod Taylor wrote:

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

I can easily imagine larger systems with multiple CPUs and multiple disk
and card bundles to support multiple databases. In this case, I have a
hard time figuring out why you'd not want to allow multiple concurrent
vacuums. I guess I can understand a recommendation of only allowing a
single vacuum, however, should it be mandated that AVD will ONLY be able
to perform a single vacuum at a time?

Hmm.. CPU time (from what I've seen) isn't an issue. Strictly disk. The
big problem with multiple vacuums is determining which tables are in
common areas.

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

But tablespace is planned for 7.4 right? Since tablespace is supposed
to go in for 7.4, I think you've hit the nail on the head. One AVD per
tablespace sounds just right to me.

--
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting

#15Rod Taylor
rbt@rbt.ca
In reply to: Greg Copeland (#14)
Re: Auto Vacuum Daemon (again...)

On Tue, 2002-12-10 at 12:00, Greg Copeland wrote:

On Tue, 2002-12-10 at 08:42, Rod Taylor wrote:

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

I can easily imagine larger systems with multiple CPUs and multiple disk
and card bundles to support multiple databases. In this case, I have a
hard time figuring out why you'd not want to allow multiple concurrent
vacuums. I guess I can understand a recommendation of only allowing a
single vacuum, however, should it be mandated that AVD will ONLY be able
to perform a single vacuum at a time?

Hmm.. CPU time (from what I've seen) isn't an issue. Strictly disk. The
big problem with multiple vacuums is determining which tables are in
common areas.

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

But tablespace is planned for 7.4 right? Since tablespace is supposed
to go in for 7.4, I think you've hit the nail on the head. One AVD per
tablespace sounds just right to me.

Planned if someone implements it and manages to have it committed prior
to release.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

#16scott.marlowe
scott.marlowe@ihs.com
In reply to: Rod Taylor (#12)
Re: Auto Vacuum Daemon (again...)

On 10 Dec 2002, Rod Taylor wrote:

Not sure what you mean by that, but it sounds like the behaviour of my AVD
(having it block until the vacuum command completes) is fine, and perhaps
preferrable.

I can easily imagine larger systems with multiple CPUs and multiple disk
and card bundles to support multiple databases. In this case, I have a
hard time figuring out why you'd not want to allow multiple concurrent
vacuums. I guess I can understand a recommendation of only allowing a
single vacuum, however, should it be mandated that AVD will ONLY be able
to perform a single vacuum at a time?

Hmm.. CPU time (from what I've seen) isn't an issue. Strictly disk. The
big problem with multiple vacuums is determining which tables are in
common areas.

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

But Postgresql can already place different databases on different data
stores. I.e. initlocation and all. If someone was using multiple SCSI
cards with multiple JBOD or RAID boxes hanging off of a box, they would
have the same thing, effectively, that you are talking about.

So, someone out there may well be able to use a multiple process AVD right
now. Imagine m databases on n different drive sets for large production
databases.

#17Greg Copeland
greg@CopelandConsulting.Net
In reply to: scott.marlowe (#16)
Re: Auto Vacuum Daemon (again...)

On Tue, 2002-12-10 at 13:09, scott.marlowe wrote:

On 10 Dec 2002, Rod Taylor wrote:

Perhaps a more appropriate rule would be 1 AVD per tablespace? Since
PostgreSQL only has a single tablespace at the moment....

But Postgresql can already place different databases on different data
stores. I.e. initlocation and all. If someone was using multiple SCSI
cards with multiple JBOD or RAID boxes hanging off of a box, they would
have the same thing, effectively, that you are talking about.

So, someone out there may well be able to use a multiple process AVD right
now. Imagine m databases on n different drive sets for large production
databases.

That's right. I always forget about that. So, it seems, regardless of
the namespace effort, we shouldn't be limiting the number of concurrent
AVD's.

--
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting