Making autovacuum logs indicate if insert-based threshold was the triggering condition

Started by Peter Geogheganover 3 years ago8 messages
1 attachment(s)

It's quite possible (and probably very common) for certain tables to
have autovacuum scheduling trigger autovacuums based on both the
"regular" bloat-orientated thresholds, and the newer insert-based
thresholds. It may be far from obvious which triggering condition
autovacuum.c must have applied to trigger any given autovacuum, since
that information isn't currently passed down to lazyvacuum.c. This
seems like a problem to me; how are users supposed to tune
autovacuum's thresholds without even basic feedback about how the
thresholds get applied?

Attached patch teaches autovacuum.c to pass the information down to
lazyvacuum.c, which includes the information in the autovacuum log.
The approach I've taken is very similar to the existing approach with
anti-wraparound autovacuum. It's pretty straightforward. Note that a
VACUUM that is an "automatic vacuum for inserted tuples" cannot also
be an antiwraparound autovacuum, nor can it also be a "regular"
autovacuum/VACUUM -- there are now 3 distinct "triggering conditions"
for autovacuum.

Although this patch doesn't touch antiwraparound autovacuums at all, I
will note in passing that I think that anti-wraparound autovacuums
should become just another triggering condition for autovacuum -- IMV
they shouldn't be special in *any* way. We'd still need to keep
antiwraparound's "cannot automatically cancel autovacuum worker"
behavior in some form, but that would become dynamic, a little like
the failsafe is today, and would trigger on its own timeline (probably
*much* later than we first trigger antiwraparound autovacuum). We'd
also need to decouple "aggressiveness" (the behaviors that we
associate with aggressive mode in vacuumlazy.c) from the condition of
the table/system when VACUUM first began -- those could all become
dynamic, too.

--
Peter Geoghegan

Attachments:

v1-0001-Show-when-autovacuum-is-insert-driven-in-log.patchapplication/octet-stream; name=v1-0001-Show-when-autovacuum-is-insert-driven-in-log.patchDownload
From 2d6560e2da9635d06d8fceba6ef8410eb89bdd01 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Sun, 17 Jul 2022 18:50:13 -0700
Subject: [PATCH v1] Show when autovacuum is insert driven in log.

Teach the logging code controlled by log_autovacuum_min_duration to
display more information about why its autovacuum worker was launched in
the first place.  Show the user distinct output when the insert-based
thresholds were the triggering condition.

It's sometimes far from obvious why an autovacuum worker was launched
without this new instrumentation.  It's useful to give feedback on the
triggering condition, to guide tuning efforts by DBAs.

Note that we consider insert-driven autovacuums to be a distinct
category from antiwraparound autovacuums in all cases (though of course
any VACUUM can end up as an aggressive VACUUM).
---
 src/include/commands/vacuum.h        |  3 ++-
 src/backend/access/heap/vacuumlazy.c | 12 +++++++++-
 src/backend/commands/vacuum.c        |  3 ++-
 src/backend/postmaster/autovacuum.c  | 36 +++++++++++++++++++---------
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index f38e1148f..7e2d8e7b5 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -220,7 +220,8 @@ typedef struct VacuumParams
 											 * use default */
 	int			multixact_freeze_table_age; /* multixact age at which to scan
 											 * whole table */
-	bool		is_wraparound;	/* force a for-wraparound vacuum */
+	bool		is_wraparound;	/* anti-wraparound autovacuum? */
+	bool		is_insert;		/* autovacuum for inserted tuples? */
 	int			log_min_duration;	/* minimum execution threshold in ms at
 									 * which autovacuum is logged, -1 to use
 									 * default */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b802ed247..653ea28e0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -659,7 +659,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				 * Aggressiveness already reported earlier, in dedicated
 				 * VACUUM VERBOSE ereport
 				 */
-				Assert(!params->is_wraparound);
+				Assert(!params->is_wraparound && !params->is_insert);
 				msgfmt = _("finished vacuuming \"%s.%s.%s\": index scans: %d\n");
 			}
 			else if (params->is_wraparound)
@@ -670,13 +670,23 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 				 * implies aggressive.  Produce distinct output for the corner
 				 * case all the same, just in case.
 				 */
+				Assert(!params->is_insert);
 				if (aggressive)
 					msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
 				else
 					msgfmt = _("automatic vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n");
 			}
+			else if (params->is_insert)
+			{
+				/* Inserted tuples crossed autovacuum.c's threshold */
+				if (aggressive)
+					msgfmt = _("automatic aggressive vacuum for inserted tuples of table \"%s.%s.%s\": index scans: %d\n");
+				else
+					msgfmt = _("automatic vacuum for inserted tuples of table \"%s.%s.%s\": index scans: %d\n");
+			}
 			else
 			{
+				/* Dead tuples crossed autovacuum.c's threshold */
 				if (aggressive)
 					msgfmt = _("automatic aggressive vacuum of table \"%s.%s.%s\": index scans: %d\n");
 				else
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8df25f59d..a623678c0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -265,8 +265,9 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		params.multixact_freeze_table_age = -1;
 	}
 
-	/* user-invoked vacuum is never "for wraparound" */
+	/* user-invoked vacuum is never "for wraparound" or "for inserts" */
 	params.is_wraparound = false;
+	params.is_insert = false;
 
 	/* user-invoked vacuum uses VACOPT_VERBOSE instead of log_min_duration */
 	params.log_min_duration = -1;
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 70a9176c5..ad90e8e67 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -329,12 +329,14 @@ static autovac_table *table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 static void recheck_relation_needs_vacanalyze(Oid relid, AutoVacOpts *avopts,
 											  Form_pg_class classForm,
 											  int effective_multixact_freeze_max_age,
-											  bool *dovacuum, bool *doanalyze, bool *wraparound);
+											  bool *dovacuum, bool *doanalyze,
+											  bool *wraparound, bool *inserts);
 static void relation_needs_vacanalyze(Oid relid, AutoVacOpts *relopts,
 									  Form_pg_class classForm,
 									  PgStat_StatTabEntry *tabentry,
 									  int effective_multixact_freeze_max_age,
-									  bool *dovacuum, bool *doanalyze, bool *wraparound);
+									  bool *dovacuum, bool *doanalyze,
+									  bool *wraparound, bool *inserts);
 
 static void autovacuum_do_vac_analyze(autovac_table *tab,
 									  BufferAccessStrategy bstrategy);
@@ -2063,6 +2065,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		bool		inserts;
 
 		if (classForm->relkind != RELKIND_RELATION &&
 			classForm->relkind != RELKIND_MATVIEW)
@@ -2103,7 +2106,8 @@ do_autovacuum(void)
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
-								  &dovacuum, &doanalyze, &wraparound);
+								  &dovacuum, &doanalyze,
+								  &wraparound, &inserts);
 
 		/* Relations that need work are added to table_oids */
 		if (dovacuum || doanalyze)
@@ -2156,6 +2160,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		bool		inserts;
 
 		/*
 		 * We cannot safely process other backends' temp tables, so skip 'em.
@@ -2186,7 +2191,8 @@ do_autovacuum(void)
 
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
-								  &dovacuum, &doanalyze, &wraparound);
+								  &dovacuum, &doanalyze,
+								  &wraparound, &inserts);
 
 		/* ignore analyze for toast tables */
 		if (dovacuum)
@@ -2761,7 +2767,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 	bool		dovacuum;
 	bool		doanalyze;
 	autovac_table *tab = NULL;
-	bool		wraparound;
+	bool		wraparound,
+				inserts;
 	AutoVacOpts *avopts;
 
 	/* fetch the relation's relcache entry */
@@ -2788,7 +2795,8 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 
 	recheck_relation_needs_vacanalyze(relid, avopts, classForm,
 									  effective_multixact_freeze_max_age,
-									  &dovacuum, &doanalyze, &wraparound);
+									  &dovacuum, &doanalyze,
+									  &wraparound, &inserts);
 
 	/* OK, it needs something done */
 	if (doanalyze || dovacuum)
@@ -2869,6 +2877,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
 		tab->at_params.multixact_freeze_table_age = multixact_freeze_table_age;
 		tab->at_params.is_wraparound = wraparound;
+		tab->at_params.is_insert = inserts;
 		tab->at_params.log_min_duration = log_min_duration;
 		tab->at_vacuum_cost_limit = vac_cost_limit;
 		tab->at_vacuum_cost_delay = vac_cost_delay;
@@ -2904,7 +2913,8 @@ recheck_relation_needs_vacanalyze(Oid relid,
 								  int effective_multixact_freeze_max_age,
 								  bool *dovacuum,
 								  bool *doanalyze,
-								  bool *wraparound)
+								  bool *wraparound,
+								  bool *inserts)
 {
 	PgStat_StatTabEntry *tabentry;
 
@@ -2914,7 +2924,8 @@ recheck_relation_needs_vacanalyze(Oid relid,
 
 	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
 							  effective_multixact_freeze_max_age,
-							  dovacuum, doanalyze, wraparound);
+							  dovacuum, doanalyze,
+							  wraparound, inserts);
 
 	/* ignore ANALYZE for toast tables */
 	if (classForm->relkind == RELKIND_TOASTVALUE)
@@ -2967,7 +2978,8 @@ relation_needs_vacanalyze(Oid relid,
  /* output params below */
 						  bool *dovacuum,
 						  bool *doanalyze,
-						  bool *wraparound)
+						  bool *wraparound,
+						  bool *inserts)
 {
 	bool		force_vacuum;
 	bool		av_enabled;
@@ -3058,6 +3070,7 @@ relation_needs_vacanalyze(Oid relid,
 			MultiXactIdPrecedes(classForm->relminmxid, multiForceLimit);
 	}
 	*wraparound = force_vacuum;
+	*inserts = false;			/* for now */
 
 	/* User disabled it in pg_class.reloptions?  (But ignore if at risk) */
 	if (!av_enabled && !force_vacuum)
@@ -3104,8 +3117,9 @@ relation_needs_vacanalyze(Oid relid,
 				 vactuples, vacthresh, anltuples, anlthresh);
 
 		/* Determine if this table needs vacuum or analyze. */
-		*dovacuum = force_vacuum || (vactuples > vacthresh) ||
-			(vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
+		*inserts = !force_vacuum && vac_ins_base_thresh >= 0 &&
+			instuples > vacinsthresh;
+		*dovacuum = force_vacuum || (vactuples > vacthresh) || *inserts;
 		*doanalyze = (anltuples > anlthresh);
 	}
 	else
-- 
2.32.0

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Peter Geoghegan (#1)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 06, 2022 at 01:03:57PM -0700, Peter Geoghegan wrote:

thresholds. It may be far from obvious which triggering condition
autovacuum.c must have applied to trigger any given autovacuum, since
that information isn't currently passed down to lazyvacuum.c. This
seems like a problem to me; how are users supposed to tune
autovacuum's thresholds without even basic feedback about how the
thresholds get applied?

+1

This sounded familiar, and it seems like I anticipated that it might be an
issue. Here, I was advocating for the new insert-based GUCs to default to -1,
to have insert-based autovacuum fall back to the thresholds specified by the
pre-existing GUCs (20% + 50), which would (in my proposal) remain be the normal
way to tune any type of vacuum.

/messages/by-id/20200317233218.GD26184@telsasoft.com

I haven't heard of anyone who had trouble setting the necessary GUC, but I'm
not surprised if most postgres installations are running versions before 13.

Note that a VACUUM that is an "automatic vacuum for inserted tuples" cannot
[...] also be a "regular" autovacuum/VACUUM

Why not ?

$ ./tmp_install/usr/local/pgsql/bin/postgres -D src/test/regress/tmp_check/data -c log_min_messages=debug3 -c autovacuum_naptime=9s&
DROP TABLE t; CREATE TABLE t (i int); INSERT INTO t SELECT generate_series(1,99999); DELETE FROM t; INSERT INTO t SELECT generate_series(1,99999);

2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG: t: vac: 99999 (threshold 50), ins: 99999 (threshold 1000), anl: 199998 (threshold 50)

--
Justin

In reply to: Justin Pryzby (#2)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 6, 2022 at 2:50 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

This sounded familiar, and it seems like I anticipated that it might be an
issue. Here, I was advocating for the new insert-based GUCs to default to -1,
to have insert-based autovacuum fall back to the thresholds specified by the
pre-existing GUCs (20% + 50), which would (in my proposal) remain be the normal
way to tune any type of vacuum.

/messages/by-id/20200317233218.GD26184@telsasoft.com

I haven't heard of anyone who had trouble setting the necessary GUC, but I'm
not surprised if most postgres installations are running versions before 13.

ISTM that having insert-based triggering conditions is definitely a
good idea, but what we have right now still has problems. It currently
won't work very well unless the user goes out of their way to tune
freezing to do the right thing. Typically we miss out on the
opportunity to freeze early, because without sophisticated
intervention from the user there is only a slim chance of *any*
freezing taking place outside of the inevitable antiwraparound
autovacuum.

Note that a VACUUM that is an "automatic vacuum for inserted tuples" cannot
[...] also be a "regular" autovacuum/VACUUM

Why not ?

Well, autovacuum.c should have (and/or kind of already has) 3
different triggering conditions. These are mutually exclusive
conditions -- technically autovacuum.c always launches an autovacuum
against a table because exactly 1 of the 3 thresholds were crossed. My
patch makes sure that it always gives exactly one reason why
autovacuum.c decided to VACUUM, so by definition there is only one
relevant piece of information for vacuumlazy.c to report in the log.
That's fairly simple and high level, and presumably something that
users won't have much trouble understanding.

Right now antiwraparound autovacuum "implies aggressive", in that it
almost always makes vacuumlazy.c use aggressive mode, but this seems
totally arbitrary to me -- they don't have to be virtually synonymous.
I think that antiwraparound autovacuum could even be rebranded as "an
autovacuum that takes place because the table hasn't had one in a long
time". This is much less scary, and makes it clearer that autovacuum.c
shouldn't be expected to really understand what will turn out to be
important "at runtime". That's the time to make important decisions
about what work to do -- when we actually have accurate information.

My antiwraparound example is just that: an example. There is a broader
idea: we shouldn't be too confident that the exact triggering
condition autovacuum.c applied to launch an autovacuum worker turns
out to be the best reason to VACUUM, or even a good reason --
vacuumlazy.c should be able to cope with that. The user is kept in the
loop about both, by reporting the triggering condition and the details
of what really happened at runtime. Maybe lazyvacuum.c can be taught
to speed up and slow down based on the conditions it observes as it
scans the heap -- there are many possibilities.

This broader idea is pretty much what you were getting at with your
example, I think.

--
Peter Geoghegan

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Peter Geoghegan (#3)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 06, 2022 at 03:41:57PM -0700, Peter Geoghegan wrote:

Note that a VACUUM that is an "automatic vacuum for inserted tuples" cannot
[...] also be a "regular" autovacuum/VACUUM

Why not ?

I think maybe you missed my intent in trimming the "anti-wraparound" part of
your text.

My point was concerning your statement that "autovacuum for inserted tuples ..
cannot also be a regular autovacuum" (meaning triggered by dead tuples).

Well, autovacuum.c should have (and/or kind of already has) 3
different triggering conditions. These are mutually exclusive
conditions -- technically autovacuum.c always launches an autovacuum
against a table because exactly 1 of the 3 thresholds were crossed.

The issue being that both thresholds can be crossed:

2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG: t: VAC: 99999 (THRESHOLD 50), INS: 99999 (THRESHOLD 1000), anl: 199998 (threshold 50)

--
Justin

In reply to: Justin Pryzby (#4)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 6, 2022 at 3:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Well, autovacuum.c should have (and/or kind of already has) 3
different triggering conditions. These are mutually exclusive
conditions -- technically autovacuum.c always launches an autovacuum
against a table because exactly 1 of the 3 thresholds were crossed.

The issue being that both thresholds can be crossed:

2022-08-06 16:47:47.674 CDT autovacuum worker[12707] DEBUG: t: VAC: 99999 (THRESHOLD 50), INS: 99999 (THRESHOLD 1000), anl: 199998 (threshold 50)

What are the chances that both thresholds will be crossed at *exactly*
(not approximately) the same time in a real world case, where the
table isn't tiny (tiny relative to the "autovacuum_naptime quantum")?
This is a very narrow case.

Besides, the same can already be said with how autovacuum.c crosses
the XID-based antiwraparound threshold. Yet we still arbitrarily
report that it's antiwraparound in the logs, which (at least right
now) is generally assumed to mostly be about advancing relfrozenxid.
(Or maybe it's the other way around; it doesn't matter.)

It might make sense to *always* show how close we were to hitting each
of the thresholds, including the ones that we didn't end up hitting
(we may come pretty close quite often, which seems like it might
matter). But showing multiple conditions together just because the
planets aligned (we hit multiple thresholds together) emphasizes the
low-level mechanism, which is pretty far removed from anything that
matters. You might as well pick either threshold at random once this
happens -- even an expert won't be able to tell the difference.

--
Peter Geoghegan

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Peter Geoghegan (#5)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 06, 2022 at 04:09:28PM -0700, Peter Geoghegan wrote:

What are the chances that both thresholds will be crossed at *exactly*
(not approximately) the same time in a real world case, where the
table isn't tiny (tiny relative to the "autovacuum_naptime quantum")?
This is a very narrow case.

The threshold wouldn't need to be crossed within autovacuum_naptime, since all
the workers might be busy. Consider autovacuum delay, analyze on long/wide
tables, multiple extended stats objects, or autovacuum parameters which are
changed off-hours by a cronjob.

It might make sense to *always* show how close we were to hitting each
of the thresholds, including the ones that we didn't end up hitting
(we may come pretty close quite often, which seems like it might
matter). But showing multiple conditions together just because the
planets aligned (we hit multiple thresholds together) emphasizes the
low-level mechanism, which is pretty far removed from anything that
matters. You might as well pick either threshold at random once this
happens -- even an expert won't be able to tell the difference.

I don't have strong feelings about it; I'm just pointing out that the
two of the conditions aren't actually exclusive.

It seems like it could be less confusing to show both. Consider someone who is
trying to *reduce* how often autovacuum runs, or give priority to some tables
by raising the thresholds for other tables.

--
Justin

In reply to: Justin Pryzby (#6)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Fri, Sep 9, 2022 at 11:11 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

It might make sense to *always* show how close we were to hitting each
of the thresholds, including the ones that we didn't end up hitting
(we may come pretty close quite often, which seems like it might
matter). But showing multiple conditions together just because the
planets aligned (we hit multiple thresholds together) emphasizes the
low-level mechanism, which is pretty far removed from anything that
matters. You might as well pick either threshold at random once this
happens -- even an expert won't be able to tell the difference.

I don't have strong feelings about it; I'm just pointing out that the
two of the conditions aren't actually exclusive.

Fair enough. I'm just pointing out that the cutoffs are continuous for
all practical purposes, even if there are cases where they seem kind
of discrete, due only to implementation details (e.g.
autovacuum_naptime stuff). Displaying only one reason for triggering
an autovacuum is consistent with the idea that the cutoffs are
continuous. It's not literally true that they're continuous, but it
might as well be.

I think of it as similar to how it's not literally true that a coin
toss is always either heads or tails, though it might as well be true.
Sure, even a standard fair coin toss might result in the coin landing
on its side. That'll probably never happen even once, but if does:
just flip the coin again! The physical coin toss was never the
important part.

It seems like it could be less confusing to show both. Consider someone who is
trying to *reduce* how often autovacuum runs, or give priority to some tables
by raising the thresholds for other tables.

My objection to that sort of approach is that it suggests a difference
in what each VACUUM actually does -- as if autovacuum.c actually
decided on a particular runtime behavior for the VACUUM up front,
based on its own considerations that come from statistics about the
workload. I definitely want to avoid creating that false impression in
the minds of users.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#1)
Re: Making autovacuum logs indicate if insert-based threshold was the triggering condition

On Sat, Aug 6, 2022 at 1:03 PM Peter Geoghegan <pg@bowt.ie> wrote:

Attached patch teaches autovacuum.c to pass the information down to
lazyvacuum.c, which includes the information in the autovacuum log.
The approach I've taken is very similar to the existing approach with
anti-wraparound autovacuum. It's pretty straightforward.

I have formally withdrawn this patch. I still think it's a good idea,
and I'm not abandoning the idea. The patch has just been superseded by
another patch of mine:

/messages/by-id/CAH2-Wz=hj-RCr6fOj_L3_0J1Ws8fOoxTQLmtM57gPc19beZz=Q@mail.gmail.com

This other patch has a much broader goal: it decouples
"antiwraparound-ness vs regular-ness" from the criteria that triggered
autovacuum (which includes a "table age" trigger criteria). Since the
other patch has to invent the whole concept of an autovacuum trigger
criteria (which it reports on via a line in autovacuum server log
reports), it seemed best to do everything together.

Thanks
--
Peter Geoghegan