ANALYZE counts LP_DEAD line pointers as n_dead_tup

Started by Masahiko Sawadaover 4 years ago4 messages
#1Masahiko Sawada
sawada.mshk@gmail.com
1 attachment(s)

Hi all,

If we create a table with vacuum_index_cleanup = off or execute VACUUM
with INDEX_CLEANUP = off, vacuum updates pg_stat_all_tables.n_dead_tup
to the number of HEAPTUPLE_RECENTLY_DEAD tuples. Whereas analyze
updates it to the sum of the number of HEAPTUPLE_DEAD/RECENTLY_DEAD
tuples and LP_DEAD line pointers. So if the table has many LP_DEAD
line pointers due to skipping index cleanup, autovacuum is triggered
every time after analyze/autoanalyze. This issue seems to happen also
on back branches, probably from 12 where INDEX_CLEANUP option was
introduced.

I think we can have heapam_scan_analyze_next_tuple() not count LP_DEAD
line pointer as lazy_scan_prune() does. Attached the patch for that.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachments:

not_count_lp_dead_by_analyze.patchapplication/octet-stream; name=not_count_lp_dead_by_analyze.patchDownload
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 7a9a640989..e171b87a2a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1039,17 +1039,12 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
 		itemid = PageGetItemId(targpage, hscan->rs_cindex);
 
 		/*
-		 * We ignore unused and redirect line pointers.  DEAD line pointers
-		 * should be counted as dead, because we need vacuum to run to get rid
-		 * of them.  Note that this rule agrees with the way that
-		 * heap_page_prune() counts things.
+		 * We ignore unused and redirect line pointers.  We don't count LP_DEAD
+		 * line pointers as deadrows.  Note that this rule agrees with the way
+		 * that lazy_scan_prune() counts things.
 		 */
 		if (!ItemIdIsNormal(itemid))
-		{
-			if (ItemIdIsDead(itemid))
-				*deadrows += 1;
 			continue;
-		}
 
 		ItemPointerSet(&targtuple->t_self, hscan->rs_cblock, hscan->rs_cindex);
 
In reply to: Masahiko Sawada (#1)
Re: ANALYZE counts LP_DEAD line pointers as n_dead_tup

On Wed, Apr 14, 2021 at 7:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

If we create a table with vacuum_index_cleanup = off or execute VACUUM
with INDEX_CLEANUP = off, vacuum updates pg_stat_all_tables.n_dead_tup
to the number of HEAPTUPLE_RECENTLY_DEAD tuples. Whereas analyze
updates it to the sum of the number of HEAPTUPLE_DEAD/RECENTLY_DEAD
tuples and LP_DEAD line pointers. So if the table has many LP_DEAD
line pointers due to skipping index cleanup, autovacuum is triggered
every time after analyze/autoanalyze. This issue seems to happen also
on back branches, probably from 12 where INDEX_CLEANUP option was
introduced.

Hmm.

I think we can have heapam_scan_analyze_next_tuple() not count LP_DEAD
line pointer as lazy_scan_prune() does. Attached the patch for that.

lazy_scan_prune() is concerned about what the state of the table *will
be* when VACUUM finishes, based on its working assumption that index
vacuuming and heap vacuuming always go ahead. This is exactly the same
reason why lazy_scan_prune() will set LVPagePruneState.hastup to
'false' in the presence of an LP_DEAD item -- this is not how
count_nondeletable_pages() considers if the same page 'hastup' much
later on, right at the end of the VACUUM (it will only consider the
page safe to truncate away if it now only contains LP_UNUSED items --
LP_DEAD items make heap/table truncation unsafe).

In general accounting rules like this may need to work slightly
differently across near-identical functions because of "being versus
becoming" issues. It is necessary to distinguish between "being" code
(e.g., this ANALYZE code, count_nondeletable_pages() and its hastup
issue) and "becoming" code (e.g., lazy_scan_prune() ands its approach
to counting "remaining" dead tuples as well as hastup-ness). I tend to
doubt that your patch is the right approach because the two code paths
already "agree" once you assume that the LP_DEAD items that
lazy_scan_prune() sees will be gone at the end of the VACUUM. I do
agree that this is a problem, though.

Generally speaking, the "becoming" code from lazy_scan_prune() is not
100% sure that it will be correct in each case, for a large variety of
reasons. But I think that we should expect it to be mostly correct. We
definitely cannot allow it to be quite wrong all the time with some
workloads. And so I agree that this is a problem for the INDEX_CLEANUP
= off case, though it's equally an issue for the recently added
failsafe mechanism. I do not believe that it is a problem for the
bypass-indexes optimization, though, because that is designed to only
be applied when there are practically zero LP_DEAD items. The
optimization can make VACUUM record that there are zero dead tuples
after the VACUUM finishes, even though there were in fact a very small
non-zero number of dead tuples -- but that's not appreciably different
from any of the other ways that the count of dead tuples could be
inaccurate (e.g. concurrent opportunistic pruning). The specific tests
that we apply inside lazy_vacuum() should make sure that autovacuum
scheduling is never affected. The autovacuum scheduling code can
safely "believe" that the indexes were vacuumed, because it really is
the same as if there were precisely zero LP_DEAD items (or the same
for all practical purposes).

I'm not sure what to do, though. Both the INDEX_CLEANUP = off case and
the failsafe case are only intended for emergencies. And it's hard to
know what to do in a code path that is designed to rarely or never be
used.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#2)
1 attachment(s)
Re: ANALYZE counts LP_DEAD line pointers as n_dead_tup

On Fri, Apr 16, 2021 at 1:16 PM Peter Geoghegan <pg@bowt.ie> wrote:

I'm not sure what to do, though. Both the INDEX_CLEANUP = off case and
the failsafe case are only intended for emergencies. And it's hard to
know what to do in a code path that is designed to rarely or never be
used.

How about just documenting it in comments, as in the attached patch? I
tried to address all of the issues with LP_DEAD accounting together.
Both the issue raised by Masahiko, and one or two others that were
also discussed recently on other threads. They all seem kind of
related to me.

I didn't address the INDEX_CLEANUP = off case in the comments directly
(I just addressed the failsafe case). There is no good reason to think
that the situation will resolve with INDEX_CLEANUP = off, so it didn't
seem wise to mention it too. But that should still be okay --
INDEX_CLEANUP = off has practically been superseded by the failsafe,
since it is much more flexible. And, anybody that uses INDEX_CLEANUP =
off cannot expect to never do index cleanup without seriously bad
consequences all over the place.

--
Peter Geoghegan

Attachments:

0001-VACUUM-accounting-Explain-LP_DEAD-accounting.patchapplication/x-patch; name=0001-VACUUM-accounting-Explain-LP_DEAD-accounting.patchDownload
From 5d24338e90f0f3eec7134c328d40270f2ddddc50 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 9 Apr 2021 18:50:25 -0700
Subject: [PATCH] VACUUM accounting: Explain LP_DEAD accounting.

---
 src/backend/access/heap/vacuumlazy.c | 70 +++++++++++++++++++---------
 1 file changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9f9ba5d308..ad37f25e2a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -686,7 +686,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 						new_min_multi,
 						false);
 
-	/* report results to the stats collector, too */
+	/*
+	 * Report results to the stats collector, too.
+	 *
+	 * We deliberately avoid telling the stats collector about LP_DEAD items
+	 * that remain in the table when index/heap vacuuming has been bypassed by
+	 * the failsafe mechanism.  Avoid behaving too aggressively once the
+	 * danger of wraparound failure subsides.  Autoanalyze should notice any
+	 * remaining LP_DEAD items and schedule an autovacuum if nothing else
+	 * does.
+	 */
 	pgstat_report_vacuum(RelationGetRelid(rel),
 						 rel->rd_rel->relisshared,
 						 Max(new_live_tuples, 0),
@@ -1334,6 +1343,9 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		 */
 		lazy_scan_prune(vacrel, buf, blkno, page, vistest, &prunestate);
 
+		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
+		Assert(!all_visible_according_to_vm || prunestate.all_visible);
+
 		/* Remember the location of the last page with nonremovable tuples */
 		if (prunestate.hastup)
 			vacrel->nonempty_pages = blkno + 1;
@@ -1404,7 +1416,6 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
 		 * Handle setting visibility map bit based on what the VM said about
 		 * the page before pruning started, and using prunestate
 		 */
-		Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
 		if (!all_visible_according_to_vm && prunestate.all_visible)
 		{
 			uint8		flags = VISIBILITYMAP_ALL_VISIBLE;
@@ -1737,19 +1748,22 @@ retry:
 		/*
 		 * LP_DEAD items are processed outside of the loop.
 		 *
-		 * Note that we deliberately don't set hastup=true in the case of an
-		 * LP_DEAD item here, which is not how lazy_check_needs_freeze() or
+		 * Our working assumption is that any LP_DEAD items we encounter here
+		 * will become LP_UNUSED inside lazy_vacuum_heap_page() before VACUUM
+		 * finishes.
+		 *
+		 * This is why the number of LP_DEAD items won't be reported to the
+		 * stats collector -- we simply won't leave any behind.  (Actually,
+		 * cases where we bypass index vacuuming will in fact leave behind
+		 * LP_DEAD items, but that only happens in special circumstances.  We
+		 * avoid trying to compensate for that in our later report to the
+		 * stats collector.)
+		 *
+		 * This is also why we deliberately don't set hastup=true in the case
+		 * of an LP_DEAD item.  This is not how lazy_check_needs_freeze() or
 		 * count_nondeletable_pages() do it -- they only consider pages empty
 		 * when they only have LP_UNUSED items, which is important for
 		 * correctness.
-		 *
-		 * Our assumption is that any LP_DEAD items we encounter here will
-		 * become LP_UNUSED inside lazy_vacuum_heap_page() before we actually
-		 * call count_nondeletable_pages().  In any case our opinion of
-		 * whether or not a page 'hastup' (which is how our caller sets its
-		 * vacrel->nonempty_pages value) is inherently race-prone.  It must be
-		 * treated as advisory/unreliable, so we might as well be slightly
-		 * optimistic.
 		 */
 		if (ItemIdIsDead(itemid))
 		{
@@ -1901,9 +1915,6 @@ retry:
 	 * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
 	 * that remains and needs to be considered for freezing now (LP_UNUSED and
 	 * LP_REDIRECT items also remain, but are of no further interest to us).
-	 *
-	 * Add page level counters to caller's counts, and then actually process
-	 * LP_DEAD and LP_NORMAL items.
 	 */
 	vacrel->offnum = InvalidOffsetNumber;
 
@@ -1988,13 +1999,6 @@ retry:
 	}
 #endif
 
-	/* Add page-local counts to whole-VACUUM counts */
-	vacrel->tuples_deleted += tuples_deleted;
-	vacrel->lpdead_items += lpdead_items;
-	vacrel->new_dead_tuples += new_dead_tuples;
-	vacrel->num_tuples += num_tuples;
-	vacrel->live_tuples += live_tuples;
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in the dead_tuples
 	 * array.  Also record that page has dead items in per-page prunestate.
@@ -2021,6 +2025,13 @@ retry:
 		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
 									 dead_tuples->num_tuples);
 	}
+
+	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->tuples_deleted += tuples_deleted;
+	vacrel->lpdead_items += lpdead_items;
+	vacrel->new_dead_tuples += new_dead_tuples;
+	vacrel->num_tuples += num_tuples;
+	vacrel->live_tuples += live_tuples;
 }
 
 /*
@@ -2095,6 +2106,12 @@ lazy_vacuum(LVRelState *vacrel, bool onecall)
 		 * not exceed 32MB.  This limits the risk that we will bypass index
 		 * vacuuming again and again until eventually there is a VACUUM whose
 		 * dead_tuples space is not CPU cache resident.
+		 *
+		 * We don't take any special steps to remember the LP_DEAD items (such
+		 * as counting them in new_dead_tuples report to the stats collector)
+		 * when the optimization is applied.  The tests that we apply should
+		 * avoid any noticeable disagreements between acquire_sample_rows()
+		 * and the accounting we perform in lazy_scan_prune().
 		 */
 		threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES;
 		do_bypass_optimization =
@@ -2146,7 +2163,8 @@ lazy_vacuum(LVRelState *vacrel, bool onecall)
 	}
 
 	/*
-	 * Forget the now-vacuumed tuples -- just press on
+	 * Forget the LP_DEAD items that we just vacuumed (or just decided to not
+	 * vacuum)
 	 */
 	vacrel->dead_tuples->num_tuples = 0;
 }
@@ -3101,6 +3119,12 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
  *
  * Also don't attempt it if wraparound failsafe is in effect.  It's hard to
  * predict how long lazy_truncate_heap will take.  Don't take any chances.
+ * There is very little chance of truncation working out when the failsafe is
+ * in effect in any case.  It is very likely that pages that lazy_scan_prune
+ * thought were !hastup before the failsafe was in effect won't be considered
+ * !hastup by count_nondeletable_pages now.  Note that lazy_scan_prune makes
+ * the optimistic assumption that any LP_DEAD items it encounters will always
+ * be LP_UNUSED by the time we're called.
  *
  * Also don't attempt it if we are doing early pruning/vacuuming, because a
  * scan which cannot find a truncated heap page cannot determine that the
-- 
2.27.0

In reply to: Peter Geoghegan (#3)
Re: ANALYZE counts LP_DEAD line pointers as n_dead_tup

On Fri, Apr 16, 2021 at 6:54 PM Peter Geoghegan <pg@bowt.ie> wrote:

How about just documenting it in comments, as in the attached patch? I
tried to address all of the issues with LP_DEAD accounting together.
Both the issue raised by Masahiko, and one or two others that were
also discussed recently on other threads. They all seem kind of
related to me.

I pushed a version of this just now.

Thanks
--
Peter Geoghegan