vacuum -vs reltuples on insert only index

Started by Jehan-Guillaume de Rorthaisabout 5 years ago7 messages

Hello,

I've found a behavior change with pg_class.reltuples on btree index. With only
insert activity on a table, when an index is processed, its related reltuples
is set to 0. Here is a demo script:

-- force index cleanup
set vacuum_cleanup_index_scale_factor to 0;

drop table if exists t;
create table t as select i from generate_series(1, 100) i;
create index t_i on t(i);

-- after index creation its reltuples is correct
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 100

-- vacuum set index reltuples to 0
vacuum t;
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 0

-- analyze set it back to correct value
analyze t;
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 100

-- insert + vacuum reset it again to 0
insert into t values(101);
vacuum (verbose off, analyze on, index_cleanup on) t;
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 0

-- delete + vacuum set it back to correct value
delete from t where i=10;
vacuum (verbose off, analyze on, index_cleanup on) t;
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 100

-- and back to 0 again with insert+vacuum
insert into t values(102);
vacuum (verbose off, analyze on, index_cleanup on) t;
select reltuples from pg_class where relname = 't_i'
-- result: reltuples | 0

Before 0d861bbb70, btvacuumpage was adding to relation stats the number of
leaving lines in the block using:

stats->num_index_tuples += maxoff - minoff + 1;

After 0d861bbb70, it is set using new variable nhtidslive:

stats->num_index_tuples += nhtidslive

However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback)
is set, which seems not to be the case on select-only workload.

A naive fix might be to use "maxoff - minoff + 1" when callback==NULL.

Thoughts?

Regards,

In reply to: Jehan-Guillaume de Rorthais (#1)
Re: vacuum -vs reltuples on insert only index

On Fri, Oct 23, 2020 at 8:51 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:

Before 0d861bbb70, btvacuumpage was adding to relation stats the number of
leaving lines in the block using:

stats->num_index_tuples += maxoff - minoff + 1;

After 0d861bbb70, it is set using new variable nhtidslive:

stats->num_index_tuples += nhtidslive

However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback)
is set, which seems not to be the case on select-only workload.

I agree that that's a bug.

A naive fix might be to use "maxoff - minoff + 1" when callback==NULL.

The problem with that is that we really should use nhtidslive (or
something like it), and we're not really willing to do the work to get
that information when callback==NULL. We could use "maxoff - minoff +
1" in the way you suggest, but that will be only ~30% of what
nhtidslive would be in pages where deduplication is maximally
effective (which is not at all uncommon -- you only need about 10 TIDs
per distinct value for the space savings to saturate like this).

GIN does this for cleanup (but not for delete, which has a real count
available):

/*
* XXX we always report the heap tuple count as the number of index
* entries. This is bogus if the index is partial, but it's real hard to
* tell how many distinct heap entries are referenced by a GIN index.
*/
stats->num_index_tuples = Max(info->num_heap_tuples, 0);
stats->estimated_count = info->estimated_count;

I suspect that we need to move in this direction within nbtree. I'm a
bit concerned about the partial index problem, though. I suppose maybe
we could do it the old way (which won't account for posting list
tuples) during cleanup as you suggest, but only use the final figure
when it turns out to have been a partial indexes. For other indexes we
could do what GIN does here.

Anybody else have thoughts on this?

--
Peter Geoghegan

In reply to: Peter Geoghegan (#2)
1 attachment(s)
Re: vacuum -vs reltuples on insert only index

On Fri, Oct 23, 2020 at 11:10 AM Peter Geoghegan <pg@bowt.ie> wrote:

I suspect that we need to move in this direction within nbtree. I'm a
bit concerned about the partial index problem, though. I suppose maybe
we could do it the old way (which won't account for posting list
tuples) during cleanup as you suggest, but only use the final figure
when it turns out to have been a partial indexes. For other indexes we
could do what GIN does here.

Actually, it seems better to always count num_index_tuples the old way
during cleanup-only index VACUUMs, despite the inaccuracy that that
creates with posting list tuples. The inaccuracy is at least a fixed
and relatively small inaccuracy, since nbtree doesn't have posting
list compression or a pending list mechanism (unlike GIN). This
approach avoids calculating a num_index_tuples value that is less than
the number of distinct values in the index, which seems important.
Taking a more sophisticated approach seems unnecessary, especially
given that we need something that can be backpatched to Postgres 13.

Attached is my proposed fix, which takes this approach. I will commit
this on Wednesday or Thursday, barring any objections.

Thanks
--
Peter Geoghegan

Attachments:

v1-0001-Avoid-nbtree-cleanup-only-VACUUM-stats-inaccuraci.patchapplication/octet-stream; name=v1-0001-Avoid-nbtree-cleanup-only-VACUUM-stats-inaccuraci.patchDownload
From fb7f8d958266125bd8df1c7e6b643eee4952d468 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Fri, 23 Oct 2020 16:57:01 -0700
Subject: [PATCH v1] Avoid nbtree cleanup-only VACUUM stats inaccuracies.

---
 src/backend/access/nbtree/nbtree.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index c822b49a71..47017ad4c1 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -933,6 +933,13 @@ btvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
 	 * double-counting some index tuples, so disbelieve any total that exceeds
 	 * the underlying heap's count ... if we know that accurately.  Otherwise
 	 * this might just make matters worse.
+	 *
+	 * Posting list tuples are another source of inaccuracy.  Cleanup-only
+	 * btvacuumscan calls assume that the number of index tuples can be used
+	 * as num_index_tuples, even though num_index_tuples is supposed to
+	 * represent the number of TIDs in the index.  This naive approach can
+	 * underestimate the number of tuples in the index (by about a factor of 3
+	 * in the worst case).
 	 */
 	if (!info->estimated_count)
 	{
@@ -1394,11 +1401,18 @@ backtrack:
 		 * separate live tuples).  We don't delete when backtracking, though,
 		 * since that would require teaching _bt_pagedel() about backtracking
 		 * (doesn't seem worth adding more complexity to deal with that).
+		 *
+		 * We don't count the number of live TIDs during cleanup-only calls to
+		 * btvacuumscan (i.e. when callback is not set).  We count the number
+		 * of index tuples directly instead.  This avoids the expense of
+		 * directly examining all of the tuples on each page.
 		 */
 		if (minoff > maxoff)
 			attempt_pagedel = (blkno == scanblkno);
-		else
+		else if (callback)
 			stats->num_index_tuples += nhtidslive;
+		else
+			stats->num_index_tuples += maxoff - minoff + 1;
 
 		Assert(!attempt_pagedel || nhtidslive == 0);
 	}
-- 
2.25.1

In reply to: Peter Geoghegan (#3)
Re: vacuum -vs reltuples on insert only index

On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote:

Attached is my proposed fix, which takes this approach. I will commit
this on Wednesday or Thursday, barring any objections.

Just to be clear: I am not proposing that we set
'IndexBulkDeleteResult.estimated_count = false' here, even though
there is a certain sense in which we now accept an unreliable figure
in Postgres 13. This is not what GIN does. That approach doesn't seem
appropriate for nbtree + deduplication, which is much closer to nbtree
in Postgres 12 than to GIN. I believe that the final num_index_tuples
value (generated during cleanup-only nbtree VACUUM) is in general
sufficiently reliable to not be treated as an estimate by vacuumlazy.c
-- the pg_class entry for the index should still be updated in
update_index_statistics().

In other words, I think that the remaining posting-list related
inaccuracies are comparable to the existing inaccuracies caused by
concurrent page splits during nbtree vacuuming (I describe the problem
right next to an old comment about that issue, in fact). What we have
in both cases is an artifact of how the data is physically represented
and the difficulty it causes us during vacuuming, in certain cases.
There are known error bars. That's why we shouldn't treat
num_index_tuples as merely an estimate.

--
Peter Geoghegan

In reply to: Peter Geoghegan (#4)
Re: vacuum -vs reltuples on insert only index

Just one more postscript...

On Mon, Nov 2, 2020 at 12:06 PM Peter Geoghegan <pg@bowt.ie> wrote:

Just to be clear: I am not proposing that we set
'IndexBulkDeleteResult.estimated_count = false' here

I meant 'IndexBulkDeleteResult.estimated_count = true'. So my patch
doesn't touch that field at all.

In other words, I think that the remaining posting-list related
inaccuracies are comparable to the existing inaccuracies caused by
concurrent page splits during nbtree vacuuming (I describe the problem
right next to an old comment about that issue, in fact).

I meant the inaccuracies that remain *once my patch is committed*.
(Clearly the current behavior of setting pg_class.reltuples to zero
during cleanup-only vacuuming is a bug.)

--
Peter Geoghegan

In reply to: Peter Geoghegan (#3)
Re: vacuum -vs reltuples on insert only index

On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote:

Actually, it seems better to always count num_index_tuples the old way
during cleanup-only index VACUUMs, despite the inaccuracy that that
creates with posting list tuples.

Pushed a fix for this just now.

Thanks for the report!
--
Peter Geoghegan

In reply to: Peter Geoghegan (#6)
Re: vacuum -vs reltuples on insert only index

On Wed, 4 Nov 2020 18:44:03 -0800
Peter Geoghegan <pg@bowt.ie> wrote:

On Mon, Nov 2, 2020 at 10:03 AM Peter Geoghegan <pg@bowt.ie> wrote:

Actually, it seems better to always count num_index_tuples the old way
during cleanup-only index VACUUMs, despite the inaccuracy that that
creates with posting list tuples.

Pushed a fix for this just now.

Thanks for the report!

Sorry I couldn't give some more feedback on your thoughts on time...

Thank you for your investigation and fix!

Regards,