GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

Started by Peter Geogheganover 4 years ago5 messages

pg@bowt.ie

over 4 years ago

We generally only expect amvacuumcleanup() routines to be called
during VACUUM. But some ginvacuumcleanup() calls are an exception to
that general rule -- these are calls made during autoanalyze, where
ginvacuumcleanup() does real pending list cleanup work (not just a
no-op return). That'll only happen within an autoanalyze, and only
when no VACUUM took place before the ANALYZE. The high level goal is
to make sure that the av worker process won't neglect to call
ginvacuumcleanup() for pending list cleanup, even when there was no
VACUUM. This behavior was added when the GIN fastupdate/pending list
stuff was first introduced, in commit ff301d6e69.

The design of ANALYZE differs from the design of VACUUM in that only
ANALYZE will allocate an XID (typically in a call to
update_attstats()). ANALYZE can also hold an MVCC snapshot. This is
why ANALYZE holds back cleanup by VACUUM in another process, which
sometimes causes problems (say during pgbench) -- this much is fairly
well known. But there is also a pretty nasty interaction between this
aspect of ANALYZE, and the special GIN pending list cleanup path I
mentioned. This interaction makes the VACUUM-OldestXmin-held-back
situation far worse. The special analyze_only ginvacuumcleanup() calls
happen fairly late during the ANALYZE, during the window that ANALYZE
holds back OldestXmin values in other VACUUMs. This greatly increases
the extent of the problem, in the obvious way. GIN index pending list
cleanup will often take a great deal longer than the typical ANALYZE
tasks take -- it's a pretty resource intensive maintenance operation.
Especially if there are a couple of GIN indexes on the table.

This issue was brought to my attention by Nikolay Samokhvalov. He
reached out privately about it. He mentioned one problematic case
involving an ANALYZE lasting 45 minutes, or longer (per
log_autovacuum_min_duration output for the autoanalyze). That was
correlated with VACUUMs on other tables whose OldestXmin values were
all held back to the same old XID. I think that this issue ought to be
treated as a bug.

Jaime Casanova wrote a patch that does pending list cleanup using the
AV worker item infrastructure [1]/messages/by-id/20210405063117.GA2478@ahch-to -- Peter Geoghegan. It's in the CF queue. Sounds like a
good idea to me. The goal of that patch is to take work out of the
insert path, when our gin_pending_list_limit-based limit is hit, but
offhand I imagine that the same approach could be used as a fix for
this issue, at least on HEAD.

[1]: /messages/by-id/20210405063117.GA2478@ahch-to -- Peter Geoghegan
--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

over 4 years ago

In reply to: Peter Geoghegan (#1)

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

On Thu, Oct 7, 2021 at 10:34 PM Peter Geoghegan <pg@bowt.ie> wrote:

This issue was brought to my attention by Nikolay Samokhvalov. He
reached out privately about it. He mentioned one problematic case
involving an ANALYZE lasting 45 minutes, or longer (per
log_autovacuum_min_duration output for the autoanalyze). That was
correlated with VACUUMs on other tables whose OldestXmin values were
all held back to the same old XID. I think that this issue ought to be
treated as a bug.

It's hard to think of a non-invasive bug fix. The obvious approach is
to move the index_vacuum_cleanup()/ginvacuumcleanup() calls in
analyze.c to some earlier point in ANALYZE, in order to avoid doing
lots of VACUUM-ish work while we hold an MVCC snapshot that blocks
cleanup in other tables. The code in question is more or less supposed
to be run during VACUUM already, and so the idea of moving it back to
when the autoanalyze worker backend state "still looks like the usual
autovacuum case" makes a certain amount of sense. But that doesn't
work, at least not without lots of invasive changes.

While I'm no closer to a backpatchable fix than I was on Thursday, I
do have some more ideas about what to do on HEAD. I now lean towards
completely ripping analyze_only calls out, there -- the whole idea of
calling amvacuumcleanup() routines during autoanalyze (but not plain
ANALYZE) seems bolted on. It's not just the risk of similar problems
cropping up in the future -- it's that the whole approach seems
obsolete. We now generally expect autovacuum to run against
insert-only tables. That might not be a perfect fit for this, but it
still seems far better.

Does anyone have any ideas for a targeted fix?

Here's why the "obvious" fix is impractical, at least for backpatch:

To recap, a backend running VACUUM is generally able to avoid the need
to be considered inside GetOldestNonRemovableTransactionId(), which is
practically essential for any busy database -- without that, long
running VACUUM operations would behave like conventional long running
transactions, causing all sorts of chaos. The problem here is that we
can have ginvacuumcleanup() calls that take place without avoiding the
same kind of chaos, just because they happen to take place during
autoanalyze. It seems like the whole GIN autoanalyze mechanism design
was based on the assumption that it didn't make much difference *when*
we reach ginvacuumcleanup(), as long as it happened regularly. But
that's just not true.

We go to a lot of trouble to make VACUUM have this property. This
cannot easily be extended or generalized to cover this special case
during ANALYZE. For one thing, the high level vacuum_rel() entry point
sets things up carefully, using session-level locks for relations. For
another, it relies on special PROC_IN_VACUUM flag logic -- that status
is stored in MyProc->statusFlags.

--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Peter Geoghegan (#2)

1 attachment(s)

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote:

While I'm no closer to a backpatchable fix than I was on Thursday, I
do have some more ideas about what to do on HEAD. I now lean towards
completely ripping analyze_only calls out, there -- the whole idea of
calling amvacuumcleanup() routines during autoanalyze (but not plain
ANALYZE) seems bolted on. It's not just the risk of similar problems
cropping up in the future -- it's that the whole approach seems
obsolete. We now generally expect autovacuum to run against
insert-only tables.

Attached patch removes calls to each index's amvacuumcleanup() routine
that take place during ANALYZE. These days we can just rely on
autovacuum to run against insert-only tables (assuming the user didn't
go out of their way to disable that behavior).

Having thought about it some more, I have arrived at the conclusion
that we should backpatch this to Postgres 13, the first version that
had insert-driven autovacuums (following commit b07642db). This
approach is unorthodox, because it amounts to disabling a
theoretically-working feature in the backbranches. Also, I'd be
drawing the line at Postgres 13, due only to the quite accidental fact
that that's the first major release that clearly doesn't need this
mechanism. (As it happens Nikolay was on 12 anyway, so this won't work
for him, but he already has a workaround IIUC.)

I reached this conclusion because I can't think of a non-invasive fix,
and I really don't want to go there. At the same time, this behavior
is barely documented, and is potentially very harmful indeed. I'm sure
that we should get rid of it on HEAD, but getting rid of it a couple
of years earlier seems prudent.

Does anybody have any opinion on this, either in favor or against my
backpatch-to-13 proposal?

Although this is technically the first problem report about this since
the GIN fastupdate stuff was introduced over a decade ago, I highly
doubt that that tells us much, given the specifics. We only added
instrumentation to autovacuum that showed each VACUUM's OldestXmin in
Postgres 10 -- that's relatively recent. Nikolay is as sophisticated a
Postgres user as anybody, and it was only through sheer luck that we
managed to figure this out -- he had access to that OldestXmin
instrumentation, and also had access to my input on it. While the
issue itself was very hard to spot, the negative ramifications
certainly were not.

Many users bend over backwards to avoid long running transactions, and
the fact that there is this highly obscure path in which autoanalyze
creates very long running transactions carelessly is pretty
distressing to me. I remember hearing complaints about how slow GIN
pending list cleanup by VACUUM was years ago, back in my consulting
days. When the feature was relatively new. I just accepted the general
wisdom at the time, which is that the mechanism itself is slow. But I
now suspect that that issue has far more to do with holding back
VACUUM/other cleanup generally, and not with the efficiency of GIN
itself.

--
Peter Geoghegan

Attachments:

v1-0001-Desupport-analyze_only-calls-to-amvacuumcleanup.patchapplication/octet-stream; name=v1-0001-Desupport-analyze_only-calls-to-amvacuumcleanup.patchDownload

From 54447b4d869818cbb906b3de22fe2505f6e0d267 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 13 Oct 2021 15:01:44 -0700
Subject: [PATCH v1] Desupport analyze_only calls to amvacuumcleanup().

This can cause big problems with large GIN indexes that are run during
autoanalyze. Other backends that run VACUUM have their OldestXmin cutoff
held back senselessly.

The behavior in question has been around since commit ff301d6e69, which
introduced GIN's fastupdate mechanism.  The idea at the time was to make
sure that autovacuum workers that just ran ANALYZE would bulk insert
pending entries, despite never performing a conventional VACUUM.  This
has been totally unnecessary since commit b07642db, which introduced
autovacuum_vacuum_insert_threshold-driven autovacuums.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkjrK556enVtFLmyXEdw91xGuwiyZVep2kp5yQT_-3JDg@mail.gmail.com
---
 src/include/access/genam.h            |  1 -
 src/backend/access/brin/brin.c        |  4 ----
 src/backend/access/gin/ginfast.c      |  4 ++--
 src/backend/access/gin/ginvacuum.c    | 14 -----------
 src/backend/access/gist/gistvacuum.c  |  4 ----
 src/backend/access/hash/hash.c        |  1 -
 src/backend/access/heap/vacuumlazy.c  |  2 --
 src/backend/access/nbtree/nbtree.c    |  4 ----
 src/backend/access/spgist/spgvacuum.c |  4 ----
 src/backend/catalog/index.c           |  1 -
 src/backend/commands/analyze.c        | 34 +++------------------------
 contrib/bloom/blvacuum.c              |  3 ---
 doc/src/sgml/gin.sgml                 |  2 +-
 doc/src/sgml/indexam.sgml             |  9 -------
 14 files changed, 6 insertions(+), 81 deletions(-)

diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 480a4762f..90da48f91 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -44,7 +44,6 @@ typedef struct IndexBuildResult
 typedef struct IndexVacuumInfo
 {
 	Relation	index;			/* the index being vacuumed */
-	bool		analyze_only;	/* ANALYZE (without any actual vacuum) */
 	bool		report_progress;	/* emit progress.h status reports */
 	bool		estimated_count;	/* num_heap_tuples is an estimate */
 	int			message_level;	/* ereport level for progress messages */
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index ccc9fa095..2d336355a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -940,10 +940,6 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
 	Relation	heapRel;
 
-	/* No-op in ANALYZE ONLY mode */
-	if (info->analyze_only)
-		return stats;
-
 	if (!stats)
 		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
 	stats->num_pages = RelationGetNumberOfBlocks(info->index);
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index e0d994094..a1a045cdd 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -801,8 +801,8 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
 	if (forceCleanup)
 	{
 		/*
-		 * We are called from [auto]vacuum/analyze or gin_clean_pending_list()
-		 * and we would like to wait concurrent cleanup to finish.
+		 * We are called from [auto]vacuum or gin_clean_pending_list() and we
+		 * would like to wait for concurrent cleanup to finish
 		 */
 		LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock);
 		workMemory =
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index a276eb020..f92e7cd5d 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -695,20 +695,6 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	GinState	ginstate;
 	GinStatsData idxStat;
 
-	/*
-	 * In an autovacuum analyze, we want to clean up pending insertions.
-	 * Otherwise, an ANALYZE-only call is a no-op.
-	 */
-	if (info->analyze_only)
-	{
-		if (IsAutoVacuumWorkerProcess())
-		{
-			initGinState(&ginstate, index);
-			ginInsertCleanup(&ginstate, false, true, true, stats);
-		}
-		return stats;
-	}
-
 	/*
 	 * Set up all-zero stats and cleanup pending inserts if ginbulkdelete
 	 * wasn't called
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 066319353..96f5ed8ee 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -74,10 +74,6 @@ gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
 IndexBulkDeleteResult *
 gistvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
-	/* No-op in ANALYZE ONLY mode */
-	if (info->analyze_only)
-		return stats;
-
 	/*
 	 * If gistbulkdelete was called, we need not do anything, just return the
 	 * stats from the latest gistbulkdelete call.  If it wasn't called, we
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index eb3810494..c296355f0 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -647,7 +647,6 @@ hashvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	BlockNumber num_pages;
 
 	/* If hashbulkdelete wasn't called, return NULL signifying no change */
-	/* Note: this covers the analyze_only case too */
 	if (stats == NULL)
 		return NULL;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05221cc1d..8d98dd686 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3023,7 +3023,6 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	pg_rusage_init(&ru0);
 
 	ivinfo.index = indrel;
-	ivinfo.analyze_only = false;
 	ivinfo.report_progress = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
@@ -3079,7 +3078,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
 	pg_rusage_init(&ru0);
 
 	ivinfo.index = indrel;
-	ivinfo.analyze_only = false;
 	ivinfo.report_progress = false;
 	ivinfo.estimated_count = estimated_count;
 	ivinfo.message_level = elevel;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 40ad0956e..3adf54b4a 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -815,10 +815,6 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
 	BlockNumber num_delpages;
 
-	/* No-op in ANALYZE ONLY mode */
-	if (info->analyze_only)
-		return stats;
-
 	/*
 	 * If btbulkdelete was called, we need not do anything (we just maintain
 	 * the information used within _bt_vacuum_needs_cleanup() by calling
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 76fb0374c..5f4c41956 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -938,10 +938,6 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 {
 	spgBulkDeleteState bds;
 
-	/* No-op in ANALYZE ONLY mode */
-	if (info->analyze_only)
-		return stats;
-
 	/*
 	 * We don't need to scan the index if there was a preceding bulkdelete
 	 * pass.  Otherwise, make a pass that won't delete any live tuples, but
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 26bfa74ce..864683768 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3299,7 +3299,6 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
 	 * Scan the index and gather up all the TIDs into a tuplesort object.
 	 */
 	ivinfo.index = indexRelation;
-	ivinfo.analyze_only = false;
 	ivinfo.report_progress = true;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = DEBUG2;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8bfb2ad95..5d621ff09 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -681,6 +681,9 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 							in_outer_xact);
 	}
 
+	/* Done with indexes */
+	vac_close_indexes(nindexes, Irel, NoLock);
+
 	/*
 	 * Now report ANALYZE to the stats collector.  For regular tables, we do
 	 * it only if not doing inherited stats.  For partitioned tables, we only
@@ -696,37 +699,6 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		pgstat_report_analyze(onerel, 0, 0, (va_cols == NIL));
 
-	/*
-	 * If this isn't part of VACUUM ANALYZE, let index AMs do cleanup.
-	 *
-	 * Note that most index AMs perform a no-op as a matter of policy for
-	 * amvacuumcleanup() when called in ANALYZE-only mode.  The only exception
-	 * among core index AMs is GIN/ginvacuumcleanup().
-	 */
-	if (!(params->options & VACOPT_VACUUM))
-	{
-		for (ind = 0; ind < nindexes; ind++)
-		{
-			IndexBulkDeleteResult *stats;
-			IndexVacuumInfo ivinfo;
-
-			ivinfo.index = Irel[ind];
-			ivinfo.analyze_only = true;
-			ivinfo.estimated_count = true;
-			ivinfo.message_level = elevel;
-			ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
-			ivinfo.strategy = vac_strategy;
-
-			stats = index_vacuum_cleanup(&ivinfo, NULL);
-
-			if (stats)
-				pfree(stats);
-		}
-	}
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
-
 	/* Log the action if appropriate */
 	if (IsAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
 	{
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 88b0a6d29..97b4a1b79 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -172,9 +172,6 @@ blvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	BlockNumber npages,
 				blkno;
 
-	if (info->analyze_only)
-		return stats;
-
 	if (stats == NULL)
 		stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
 
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index d68d12d51..09cf95220 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -510,7 +510,7 @@
    from the indexed item).
    <acronym>GIN</acronym> is capable of postponing much of this work by inserting
    new tuples into a temporary, unsorted list of pending entries.
-   When the table is vacuumed or autoanalyzed, or when
+   When the table is vacuumed, or when the
    <function>gin_clean_pending_list</function> function is called, or if the
    pending list becomes larger than
    <xref linkend="guc-gin-pending-list-limit"/>, the entries are moved to the
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index cf359fa9f..0c86dcc27 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -395,15 +395,6 @@ amvacuumcleanup (IndexVacuumInfo *info,
    be returned.
   </para>
 
-  <para>
-   <function>amvacuumcleanup</function> will also be called at completion of an
-   <command>ANALYZE</command> operation.  In this case <literal>stats</literal> is always
-   NULL and any return value will be ignored.  This case can be distinguished
-   by checking <literal>info-&gt;analyze_only</literal>.  It is recommended
-   that the access method do nothing except post-insert cleanup in such a
-   call, and that only in an autovacuum worker process.
-  </para>
-
   <para>
 <programlisting>
 bool
-- 
2.30.2

Masahiko Sawada

sawada.mshk@gmail.com

about 4 years ago

In reply to: Peter Geoghegan (#3)

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

On Thu, Oct 14, 2021 at 7:58 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Sat, Oct 9, 2021 at 5:51 PM Peter Geoghegan <pg@bowt.ie> wrote:

While I'm no closer to a backpatchable fix than I was on Thursday, I
do have some more ideas about what to do on HEAD. I now lean towards
completely ripping analyze_only calls out, there -- the whole idea of
calling amvacuumcleanup() routines during autoanalyze (but not plain
ANALYZE) seems bolted on. It's not just the risk of similar problems
cropping up in the future -- it's that the whole approach seems
obsolete. We now generally expect autovacuum to run against
insert-only tables.

Attached patch removes calls to each index's amvacuumcleanup() routine
that take place during ANALYZE. These days we can just rely on
autovacuum to run against insert-only tables (assuming the user didn't
go out of their way to disable that behavior).

Looking at the original commit, as you mentioned, ISTM performing
pending list cleanup during (auto)analyze (and analyze_only) was
introduced to perform the pending list cleanup on insert-only tables.
Now that we have autovacuum_vacuum_insert_threshold, we don’t
necessarily need to rely on that.

On the other hand, I still see a little value in performing the
pending list cleanup during autoanalyze. For example, if the user
wants to clean up the pending list frequently in the background (so
that it's not triggered in the INSERT path), it might be better to do
that during autoanalyze than autovacuum. If the table has garbage,
autovacuum has to vacuum all indexes and the table, taking a very long
time. But autoanalyze can be done in a shorter time. If we trigger
autoanalyze frequently and perform pending list cleanup, the pending
list cleanup can also be done in a relatively short time, preventing
MVCC snapshots from being held for a long time.

Therefore, I personally think that it's better to eliminate
analyze_only code after introducing a way that allows us to perform
the pending list cleanup more frequently. I think that the idea of
Jaime Casanova's patch is a good solution.

Having thought about it some more, I have arrived at the conclusion
that we should backpatch this to Postgres 13, the first version that
had insert-driven autovacuums (following commit b07642db). This
approach is unorthodox, because it amounts to disabling a
theoretically-working feature in the backbranches. Also, I'd be
drawing the line at Postgres 13, due only to the quite accidental fact
that that's the first major release that clearly doesn't need this
mechanism. (As it happens Nikolay was on 12 anyway, so this won't work
for him, but he already has a workaround IIUC.)

I reached this conclusion because I can't think of a non-invasive fix,
and I really don't want to go there. At the same time, this behavior
is barely documented, and is potentially very harmful indeed. I'm sure
that we should get rid of it on HEAD, but getting rid of it a couple
of years earlier seems prudent.

Does anybody have any opinion on this, either in favor or against my
backpatch-to-13 proposal?

I'm not very positive about back-patching. The first reason is what I
described above; I still see little value in performing pending list
cleanup during autoanalyze. Another reason is that if the user relies
on autoanalyze to perform pending list cleanup, they have to enable
autovacuum_vacuum_insert_threshold instead during the minor upgrade.
Since it also means to trigger autovacuum in more cases I think it
will have a profound impact on the existing system.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Peter Geoghegan

pg@bowt.ie

about 4 years ago

In reply to: Masahiko Sawada (#4)

Re: GIN pending list cleanup during autoanalyze blocks cleanup by VACUUM

On Thu, Oct 14, 2021 at 12:50 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Looking at the original commit, as you mentioned, ISTM performing
pending list cleanup during (auto)analyze (and analyze_only) was
introduced to perform the pending list cleanup on insert-only tables.
Now that we have autovacuum_vacuum_insert_threshold, we don’t
necessarily need to rely on that.

Right.

On the other hand, I still see a little value in performing the
pending list cleanup during autoanalyze. For example, if the user
wants to clean up the pending list frequently in the background (so
that it's not triggered in the INSERT path), it might be better to do
that during autoanalyze than autovacuum. If the table has garbage,
autovacuum has to vacuum all indexes and the table, taking a very long
time. But autoanalyze can be done in a shorter time. If we trigger
autoanalyze frequently and perform pending list cleanup, the pending
list cleanup can also be done in a relatively short time, preventing
MVCC snapshots from being held for a long time.

I agree that that's true -- there is at least a little value. But,
that's just an accident of history.

Today, ginvacuumcleanup() won't need to scan the whole index in the
autoanalyze path that I'm concerned about - it will just do pending
list insertion. This does mean that the autoanalyze path taken within
ginvacuumcleanup() should be a lot faster than a similar cleanup-only
call to ginvacuumcleanup(). But...is there actually a good reason for
that? Why should a cleanup-only VACUUM ANALYZE (i.e. a V-A where the
VACUUM does not see any heap-page LP_DEAD items) be so much slower
than a similar ANALYZE against the same table, under the same
conditions? I see no good reason.

Ideally, ginvacuumcleanup() would behave like btvacuumcleanup() and
hashvacuumcleanup(). That is, it should not unnecessarily scan the
index (even when used by VACUUM). In other words, it should have
something like the "Skip full index scan" mechanism that you added to
nbtree in commit 857f9c36. That way it just wouldn't make sense to
have this autoanalyze path anymore -- it would no longer have this
accidental advantage over a regular ginvacuumcleanup() call made from
VACUUM.

More generally, I think it's a big problem that ginvacuumcleanup() has
so many weird special cases. Why does the full_clean argument to
ginInsertCleanup() even exist? It makes the behavior inside
ginInsertCleanup() vary based on whether we're in autovacuum (or
autoanalyze) for no reason at all. I think that the true reason for
this is simple: the code in ginInsertCleanup() is *bad*. full_clean
was just forgotten about by one of its many bug fixes since the code
quality started to go down. Somebody (who shall remain nameless) was
just careless when maintaining that code.

VACUUM should be in charge of index AMs -- not the other way around.
It's good that the amvacuumcleanup() interface is so flexible, but I
think that GIN is over relying on this flexibility. Ideally, VACUUM
wouldn't have to think about the specific index AMs involved at all --
why should GIN be so different to GiST, nbtree, or hash? If GIN (or
any other index AM) behaves like a special little snowflake, with its
own unique performance characteristics, then it is harder to improve
certain important things inside VACUUM. For example, the conveyor belt
index vacuuming design from Robert Haas won't work as well as it
could.

Therefore, I personally think that it's better to eliminate
analyze_only code after introducing a way that allows us to perform
the pending list cleanup more frequently. I think that the idea of
Jaime Casanova's patch is a good solution.

I now think that it would be better to fix ginvacuumcleanup() in the
way that I described (I changed my mind). Jaime's patch does seem like
a good idea to me, but not for this. It makes sense to have that, for
the reasons that Jaime said himself.

I'm not very positive about back-patching. The first reason is what I
described above; I still see little value in performing pending list
cleanup during autoanalyze. Another reason is that if the user relies
on autoanalyze to perform pending list cleanup, they have to enable
autovacuum_vacuum_insert_threshold instead during the minor upgrade.
Since it also means to trigger autovacuum in more cases I think it
will have a profound impact on the existing system.

I have to admit that when I wrote my earlier email, I was still a
little shocked by the problem -- which is not a good state of mind
when making a decision about backpatching. But, I now think that I
underappreciated the risk of making the problem worse in the
backbranches, rather than better. I won't be backpatching anything
here.

The problem report from Nikolay was probably an unusually bad case,
where the pending list cleanup/insertion was particularly expensive.
This *is* really awful behavior, but that alone isn't a good enough
reason to backpatch.

--
Peter Geoghegan