Re: review: autovacuum_work_mem

Started by Nigel Heronabout 12 years ago3 messages
#1Nigel Heron
nheron@querymetrics.com

On Sun, Oct 20, 2013 at 7:21 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Oct 20, 2013 at 2:22 AM, Peter Geoghegan <pg@heroku.com> wrote:

It seemed neater to me to create a new flag, so that in principle any
vacuum() code path can request autovacuum_work_mem, rather than having
lazyvacuum.c code call IsAutoVacuumWorkerProcess() for the same
purpose. To date, that's only been done within vacuumlazy.c for things
like logging.

But I'd suggest just a:
int vac_work_mem = (IsAutoVacuumWorkerProcess() && autovacuum_work_mem
!= -1) ? autovacuum_work_mem : maintenance_work_mem;

and not sending around a boolean flag through a bunch of places when
it really means just the same thing,

I agree with Magnus here, calling IsAutoVacuumWorkerProcess() seems
cleaner than the new flag and the function parameter changes.

Also, isn't this quite confusing:
+ # Note:  autovacuum only prefers autovacuum_work_mem over maintenance_work_mem
+ #autovacuum_work_mem = -1 # min 1MB, or -1 to disable

Where does the "only" come from? And we don't really use the term
"prefers over" anywhere else there. And -1 doesn't actually disable
it. I suggest following the pattern of the other parameters that work
the same way, which would then just be:

#autovacuum_work_mem = -1 # amount of memory for each autovacuum
worker. -1 means use maintenance_work_mem

+1

here's my review of the v1 patch,

patch features tested:
- regular VACUUM * commands ignore autovacuum_work_mem.
- if autovacuum_work_mem = -1 then maintenance_work_mem is used by autovacuum.
- if autovacuum_work_mem is set then it is used instead of
maintenance_work_mem by autovacuum.
- the autovacuum_work_mem guc has a "sighup" context and the global
variable used in the code is correctly refreshed during a reload.
- a 1MB lower limit for autovacuum_work_mem is enforced.

build (platform is fedora 18):
- patch (context format) applies to current HEAD with offsets (please rebase).
- documentation patches included.
- "make" doesn't produce any extra warnings.
- "make check" passes all tests (no extra regression tests).

questions/comments:
- should the category of the guc be "RESOURCES_MEM" (as in the patch)
or "AUTOVACUUM"? seems to fit in both, but it's really autovacuum
specific.
- could you also add documentation to the autovacuum section of
maintenance.sgml? I think people tuning their autovacuum are likely to
look there for guidance.
- could you update the comments at the top of vacuumlazy.c to reflect
this new feature?

-nigel.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Peter Geoghegan
pg@heroku.com
In reply to: Nigel Heron (#1)
1 attachment(s)

Please reply to the original thread in future (even if the Reply-to
Message-ID is the same, I see this as a separate thread).

On Fri, Nov 15, 2013 at 5:37 PM, Nigel Heron <nheron@querymetrics.com> wrote:

On Sun, Oct 20, 2013 at 7:21 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Sun, Oct 20, 2013 at 2:22 AM, Peter Geoghegan <pg@heroku.com> wrote:

It seemed neater to me to create a new flag, so that in principle any
vacuum() code path can request autovacuum_work_mem, rather than having
lazyvacuum.c code call IsAutoVacuumWorkerProcess() for the same
purpose. To date, that's only been done within vacuumlazy.c for things
like logging.

I agree with Magnus here, calling IsAutoVacuumWorkerProcess() seems
cleaner than the new flag and the function parameter changes.

Well, what I did was analogous to the existing coding for
VACOPT_NOWAIT. But it's hardly worth discussing much further. Patch
updated along these lines.

Also, isn't this quite confusing:
+ # Note:  autovacuum only prefers autovacuum_work_mem over maintenance_work_mem
+ #autovacuum_work_mem = -1 # min 1MB, or -1 to disable

Where does the "only" come from? And we don't really use the term
"prefers over" anywhere else there.

"Only" could be replaced by "merely" here. In any case, a more
succinct wording is now used.

here's my review of the v1 patch,

patch features tested:
- regular VACUUM * commands ignore autovacuum_work_mem.
- if autovacuum_work_mem = -1 then maintenance_work_mem is used by autovacuum.
- if autovacuum_work_mem is set then it is used instead of
maintenance_work_mem by autovacuum.
- the autovacuum_work_mem guc has a "sighup" context and the global
variable used in the code is correctly refreshed during a reload.

Right.

- a 1MB lower limit for autovacuum_work_mem is enforced.

Right, just like maintenance_work_mem. The difference being that we
cannot enforce it with the same standard mechanism, because we still
need to make -1 values possible. This happens in our callback, in the
style of wal_buffers.

build (platform is fedora 18):
- patch (context format) applies to current HEAD with offsets (please rebase).

It's been rebased.

questions/comments:
- should the category of the guc be "RESOURCES_MEM" (as in the patch)
or "AUTOVACUUM"? seems to fit in both, but it's really autovacuum
specific.

Well, log_autovacuum_min_duration is also very autovacuum specific,
but is LOGGING_WHAT, so I've left autovacuum_work_mem RESOURCES_MEM.
It's not a fixed allocation of shared memory.

- could you also add documentation to the autovacuum section of
maintenance.sgml? I think people tuning their autovacuum are likely to
look there for guidance.

I don't want to add a reference there as long as there is no
maintenance_work_mem reference. I think that perform.sgml is a more
likely candidate, though I haven't added anything there either, since
it's just talking about increasing maintenance_work_mem in a
controlled context.

- could you update the comments at the top of vacuumlazy.c to reflect
this new feature?

Seems reasonable.

Revision attached.
--
Peter Geoghegan

Attachments:

autovacuum_work_mem.v2.2013_11_18.patchtext/x-patch; charset=US-ASCII; name=autovacuum_work_mem.v2.2013_11_18.patchDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index 77a9303..1200ad6
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** include 'filename'
*** 1156,1163 ****
         </para>
         <para>
          Note that when autovacuum runs, up to
!         <xref linkend="guc-autovacuum-max-workers"> times this memory may be
!         allocated, so be careful not to set the default value too high.
         </para>
        </listitem>
       </varlistentry>
--- 1156,1181 ----
         </para>
         <para>
          Note that when autovacuum runs, up to
!         <xref linkend="guc-autovacuum-max-workers"> times this memory
!         may be allocated, so be careful not to set the default value
!         too high.  It may be useful to control for this by separately
!         setting <xref linkend="guc-autovacuum-work-mem">.
!        </para>
!       </listitem>
!      </varlistentry>
! 
!      <varlistentry id="guc-autovacuum-work-mem" xreflabel="autovacuum_work_mem">
!       <term><varname>autovacuum_work_mem</varname> (<type>integer</type>)</term>
!       <indexterm>
!        <primary><varname>autovacuum_work_mem</> configuration parameter</primary>
!       </indexterm>
!       <listitem>
!        <para>
!         Specifies the maximum amount of memory to be used by each
!         autovacuum worker process.  It defaults to -1, indicating that
!         the value of <xref linkend="guc-maintenance-work-mem"> should
!         be used instead.  The setting has no effect on the behavior of
!         <command>VACUUM</command> when run in other contexts.
         </para>
        </listitem>
       </varlistentry>
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
new file mode 100644
index e1d6d1c..62bd2f0
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***************
*** 10,22 ****
   * relations with finite memory space usage.  To do that, we set upper bounds
   * on the number of tuples and pages we will keep track of at once.
   *
!  * We are willing to use at most maintenance_work_mem memory space to keep
!  * track of dead tuples.  We initially allocate an array of TIDs of that size,
!  * with an upper limit that depends on table size (this limit ensures we don't
!  * allocate a huge area uselessly for vacuuming small tables).	If the array
!  * threatens to overflow, we suspend the heap scan phase and perform a pass of
!  * index cleanup and page compaction, then resume the heap scan with an empty
!  * TID array.
   *
   * If we're processing a table with no indexes, we can just vacuum each page
   * as we go; there's no need to save up multiple tuples to minimize the number
--- 10,22 ----
   * relations with finite memory space usage.  To do that, we set upper bounds
   * on the number of tuples and pages we will keep track of at once.
   *
!  * We are willing to use at most maintenance_work_mem (or perhaps
!  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
!  * initially allocate an array of TIDs of that size, with an upper limit that
!  * depends on table size (this limit ensures we don't allocate a huge area
!  * uselessly for vacuuming small tables).  If the array threatens to overflow,
!  * we suspend the heap scan phase and perform a pass of index cleanup and page
!  * compaction, then resume the heap scan with an empty TID array.
   *
   * If we're processing a table with no indexes, we can just vacuum each page
   * as we go; there's no need to save up multiple tuples to minimize the number
*************** static void
*** 1581,1590 ****
  lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
  {
  	long		maxtuples;
  
  	if (vacrelstats->hasindex)
  	{
! 		maxtuples = (maintenance_work_mem * 1024L) / sizeof(ItemPointerData);
  		maxtuples = Min(maxtuples, INT_MAX);
  		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
  
--- 1581,1593 ----
  lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
  {
  	long		maxtuples;
+ 	int			vac_work_mem =  IsAutoVacuumWorkerProcess() &&
+ 									autovacuum_work_mem != -1 ?
+ 								autovacuum_work_mem : maintenance_work_mem;
  
  	if (vacrelstats->hasindex)
  	{
! 		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
  		maxtuples = Min(maxtuples, INT_MAX);
  		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
  
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
new file mode 100644
index 8c14d0f..0b881c0
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 110,115 ****
--- 110,116 ----
   */
  bool		autovacuum_start_daemon = false;
  int			autovacuum_max_workers;
+ int			autovacuum_work_mem = -1;
  int			autovacuum_naptime;
  int			autovacuum_vac_thresh;
  double		autovacuum_vac_scale;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index 54d8078..d33048f
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static const char *show_tcp_keepalives_c
*** 192,197 ****
--- 192,198 ----
  static bool check_maxconnections(int *newval, void **extra, GucSource source);
  static bool check_max_worker_processes(int *newval, void **extra, GucSource source);
  static bool check_autovacuum_max_workers(int *newval, void **extra, GucSource source);
+ static bool check_autovacuum_work_mem(int *newval, void **extra, GucSource source);
  static bool check_effective_io_concurrency(int *newval, void **extra, GucSource source);
  static void assign_effective_io_concurrency(int newval, void *extra);
  static void assign_pgstat_temp_directory(const char *newval, void *extra);
*************** static struct config_int ConfigureNamesI
*** 2347,2352 ****
--- 2348,2364 ----
  	},
  
  	{
+ 		{"autovacuum_work_mem", PGC_SIGHUP, RESOURCES_MEM,
+ 			gettext_noop("Sets the maximum memory to be used by each autovacuum worker process."),
+ 			NULL,
+ 			GUC_UNIT_KB
+ 		},
+ 		&autovacuum_work_mem,
+ 		-1, -1, MAX_KILOBYTES,
+ 		check_autovacuum_work_mem, NULL, NULL
+ 	},
+ 
+ 	{
  		{"tcp_keepalives_idle", PGC_USERSET, CLIENT_CONN_OTHER,
  			gettext_noop("Time between issuing TCP keepalives."),
  			gettext_noop("A value of 0 uses the system default."),
*************** check_autovacuum_max_workers(int *newval
*** 8751,8756 ****
--- 8763,8791 ----
  	return true;
  }
  
+ static bool
+ check_autovacuum_work_mem(int *newval, void **extra, GucSource source)
+ {
+ 	/*
+ 	 * -1 indicates fallback.
+ 	 *
+ 	 * If we haven't yet changed the boot_val default of -1, just let it be.
+ 	 * Autovacuum will look to maintenance_work_mem instead.
+ 	 */
+ 	if (*newval == -1)
+ 		return true;
+ 
+ 	/*
+ 	 * We clamp manually-set values to at least 1MB.  Since
+ 	 * maintenance_work_mem is always set to at least this value, do the same
+ 	 * here.
+ 	 */
+ 	if (*newval < 1024)
+ 		*newval = 1024;
+ 
+ 	return true;
+ }
+ 
  static bool
  check_max_worker_processes(int *newval, void **extra, GucSource source)
  {
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
new file mode 100644
index 34a2d05..4737ede
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 122,127 ****
--- 122,128 ----
  # actively intend to use prepared transactions.
  #work_mem = 1MB				# min 64kB
  #maintenance_work_mem = 16MB		# min 1MB
+ #autovacuum_work_mem = -1		# min 1MB, or -1 to use maintenance_work_mem
  #max_stack_depth = 2MB			# min 100kB
  #dynamic_shared_memory_type = posix # the default is the first option
  					# supported by the operating system:
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
new file mode 100644
index e96f07a..92560fe
*** a/src/include/postmaster/autovacuum.h
--- b/src/include/postmaster/autovacuum.h
***************
*** 18,23 ****
--- 18,24 ----
  /* GUC variables */
  extern bool autovacuum_start_daemon;
  extern int	autovacuum_max_workers;
+ extern int	autovacuum_work_mem;
  extern int	autovacuum_naptime;
  extern int	autovacuum_vac_thresh;
  extern double autovacuum_vac_scale;
#3Nigel Heron
nheron@querymetrics.com
In reply to: Peter Geoghegan (#2)

On Mon, Nov 18, 2013 at 11:36 PM, Peter Geoghegan <pg@heroku.com> wrote:

Please reply to the original thread in future (even if the Reply-to
Message-ID is the same, I see this as a separate thread).

sorry about that, when i added "review" to the subject gmail removed
the thread info.
for reference the original thread started here:
</messages/by-id/CAM3SWZTwLA8Ef2DTvbwM1b1zEVU_eN3N4rReGNU5_zFyjNGi6w@mail.gmail.com&gt;

Revision attached.
--
Peter Geoghegan

Review for Peter Geoghegan's v2 patch in CF 2013-11:
https://commitfest.postgresql.org/action/patch_view?id=1262

Submission review
-----------------
* Is the patch in a patch format which has context?
Yes

* Does it apply cleanly to the current git master
(04eee1fa9ee80dabf7cf4b8b9106897272e9b291)?
patching file src/backend/commands/vacuumlazy.c
Hunk #2 succeeded at 1582 (offset 1 line).

* Does it include reasonable tests, necessary doc patches, etc?
Documentation patches included.
No additional tests.

Usability review
-----------------
* Does the patch actually implement that?
Yes.

* Do we want that?
Yes. The original thread has references, see
</messages/by-id/CAM3SWZTwLA8Ef2DTvbwM1b1zEVU_eN3N4rReGNU5_zFyjNGi6w@mail.gmail.com&gt;

* Do we already have it?
No.

* Does it follow SQL spec, or the community-agreed behavior?
SQL spec: n/a
community: Yes. The original thread has references, see
</messages/by-id/CAM3SWZTwLA8Ef2DTvbwM1b1zEVU_eN3N4rReGNU5_zFyjNGi6w@mail.gmail.com&gt;

* Does it include pg_dump support (if applicable)?
n/a

Feature test
-----------------
* Does the feature work as advertised?
Yes.

* Are there corner cases the author has failed to consider?
None that i can see.

* Are there any assertion failures or crashes?
No.

Performance review
-----------------
* Does the patch slow down simple tests?
No.

* If it claims to improve performance, does it?
n/a

* Does it slow down other things?
No.

Coding review
-----------------
* Does it follow the project coding guidelines?
Yes.

* Are there portability issues?
None that i can see.

* Will it work on Windows/BSD etc?
None that i can see. (I only tested it on linux though)

* Does it do what it says, correctly?
Yes.

* Does it produce compiler warnings?
No.

* Can you make it crash?
No.

Architecture review
-----------------
* Is everything done in a way that fits together coherently with other
features/modules?
Yes.

* Are there interdependencies that can cause problems?
No.

-nigel.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers