New IndexAM API controlling index vacuum strategies
Hi all,
I've started this separate thread from [1]/messages/by-id/20200415233848.saqp72pcjv2y6ryi@alap3.anarazel.de for discussing the general
API design of index vacuum.
Summary:
* Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is
false, and leave it to the index AM whether or not skip them.
* Add a new index AM API amvacuumstrategy(), asking the index AM the
strategy before calling to ambulkdelete.
* Whether or not remove garbage tuples from heap depends on multiple
factors including INDEX_CLEANUP option and the answers of
amvacuumstrategy() for each index AM.
The first point is to fix the inappropriate behavior discussed on the thread[1]/messages/by-id/20200415233848.saqp72pcjv2y6ryi@alap3.anarazel.de.
The second and third points are to introduce a general framework for
future extensibility. User-visible behavior is not changed by this
change.
The new index AM API, amvacuumstrategy(), which is called before
bulkdelete() for each index and asks the index bulk-deletion strategy.
On this API, lazy vacuum asks, "Hey index X, I collected garbage heap
tuples during heap scanning, how urgent is vacuuming for you?", and
the index answers either "it's urgent" when it wants to do
bulk-deletion or "it's not urgent, I can skip it". The point of this
proposal is to isolate heap vacuum and index vacuum for each index so
that we can employ different strategies for each index. Lazy vacuum
can decide whether or not to do heap clean based on the answers from
the indexes.
By default, if all indexes answer 'yes' (meaning it will do
bulkdelete()), lazy vacuum can do heap clean. On the other hand, if
even one index answers 'no' (meaning it will not do bulkdelete()),
lazy vacuum doesn't the heap clean. Lazy vacuum would also be able to
require indexes to do bulkdelete() for some reason such as specyfing
INDEX_CLEANUP option by the user. It’s something like saying "Hey
index X, you answered not to do bulkdelete() but since heap clean is
necessary for me please don't skip bulkdelete()".
Currently, if INDEX_CLEANUP option is not set (i.g.
VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
heap clean. But with this patch we use the default as a neutral state
('smart' mode). This neutral state could be "on" and "off" depending
on several factors including the answers of amvacuumstrategy(), the
table status, and user's request. In this context, specifying
INDEX_CLEANUP would mean making the neutral state "on" or "off" by
user's request. The table status that could influence the decision
could concretely be, for instance:
* Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
* Making pages all-visible for index-only scan.
Also there are potential enhancements using this API:
* If bottom-up index deletion feature[2]/messages/by-id/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com is introduced, individual
indexes could be a different situation in terms of dead tuple
accumulation; some indexes on the table can delete its garbage index
tuples without bulkdelete(). A problem will appear that doing
bulkdelete() for such indexes would not be efficient. This problem is
solved by this proposal because we can do bulkdelete() for a subset of
indexes on the table.
* If retail index deletion feature[3]/messages/by-id/425db134-8bba-005c-b59d-56e50de3b41e@postgrespro.ru is introduced, we can make the
return value of bulkvacuumstrategy() a ternary value: "do_bulkdelete",
"do_indexscandelete", and "no".
* We probably can introduce a threshold of the number of dead tuples
to control whether or not to do index tuple bulk-deletion (like
bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
case where the amount of dead tuples is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small number of dead tuples, which is inefficient. This
problem is also solved by this proposal by allowing a subset of
indexes to skip bulkdelete() if the number of dead tuple doesn't
exceed the threshold.
I’ve attached the PoC patch for the above idea. By default, since lazy
vacuum choose the vacuum bulkdelete strategy based on answers of
amvacuumstrategy() so it can be either true or false ( although it’s
always true in the currene patch). But for amvacuumcleanup() there is
no the neutral state, lazy vacuum treats the default as true.
Comment and feedback are very welcome.
Regards,
[1]: /messages/by-id/20200415233848.saqp72pcjv2y6ryi@alap3.anarazel.de
[2]: /messages/by-id/CAH2-Wzm+maE3apHB8NOtmM=p-DO65j2V5GzAWCOEEuy3JZgb2g@mail.gmail.com
[3]: /messages/by-id/425db134-8bba-005c-b59d-56e50de3b41e@postgrespro.ru
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
Attachments:
poc_vacuumstrategy.patchapplication/octet-stream; name=poc_vacuumstrategy.patchDownload
diff --git a/contrib/bloom/bloom.h b/contrib/bloom/bloom.h
index 23aa7ac441..e07b71a336 100644
--- a/contrib/bloom/bloom.h
+++ b/contrib/bloom/bloom.h
@@ -201,6 +201,7 @@ extern void blendscan(IndexScanDesc scan);
extern IndexBuildResult *blbuild(Relation heap, Relation index,
struct IndexInfo *indexInfo);
extern void blbuildempty(Relation index);
+extern IndexVacuumStrategy blvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *blbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats, IndexBulkDeleteCallback callback,
void *callback_state);
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 26b9927c3a..4ea0cfc1d8 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -131,6 +131,7 @@ blhandler(PG_FUNCTION_ARGS)
amroutine->ambuild = blbuild;
amroutine->ambuildempty = blbuildempty;
amroutine->aminsert = blinsert;
+ amroutine->amvacuumstrategy = blvacuumstrategy;
amroutine->ambulkdelete = blbulkdelete;
amroutine->amvacuumcleanup = blvacuumcleanup;
amroutine->amcanreturn = NULL;
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 3282adde03..32150493ee 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -23,6 +23,15 @@
#include "storage/lmgr.h"
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+blvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* Bulk deletion of all index entries pointing to a set of heap tuples.
* The set of target tuples is specified via a callback routine that tells
@@ -45,6 +54,13 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
BloomMetaPageData *metaData;
GenericXLogState *gxlogState;
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
if (stats == NULL)
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
@@ -172,7 +188,7 @@ blvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
BlockNumber npages,
blkno;
- if (info->analyze_only)
+ if (info->analyze_only || !info->vacuumcleanup_requested)
return stats;
if (stats == NULL)
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1f72562c60..707c096e81 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -112,6 +112,7 @@ brinhandler(PG_FUNCTION_ARGS)
amroutine->ambuild = brinbuild;
amroutine->ambuildempty = brinbuildempty;
amroutine->aminsert = brininsert;
+ amroutine->amvacuumstrategy = brinvacuumstrategy;
amroutine->ambulkdelete = brinbulkdelete;
amroutine->amvacuumcleanup = brinvacuumcleanup;
amroutine->amcanreturn = NULL;
@@ -770,10 +771,20 @@ brinbuildempty(Relation index)
UnlockReleaseBuffer(metabuf);
}
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+brinvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* brinbulkdelete
* Since there are no per-heap-tuple index tuples in BRIN indexes,
- * there's not a lot we can do here.
+ * there's not a lot we can do here regardless of
+ * info->bulkdelete_skippable.
*
* XXX we could mark item tuples as "dirty" (when a minimum or maximum heap
* tuple is deleted), meaning the need to re-run summarization on the affected
@@ -799,8 +810,11 @@ brinvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
{
Relation heapRel;
- /* No-op in ANALYZE ONLY mode */
- if (info->analyze_only)
+ /*
+ * No-op in ANALYZE ONLY mode or when user requests to disable index
+ * cleanup.
+ */
+ if (info->analyze_only || !info->vacuumcleanup_requested)
return stats;
if (!stats)
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index ef9b56fd36..09d1cf5694 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -63,6 +63,7 @@ ginhandler(PG_FUNCTION_ARGS)
amroutine->ambuild = ginbuild;
amroutine->ambuildempty = ginbuildempty;
amroutine->aminsert = gininsert;
+ amroutine->amvacuumstrategy = ginvacuumstrategy;
amroutine->ambulkdelete = ginbulkdelete;
amroutine->amvacuumcleanup = ginvacuumcleanup;
amroutine->amcanreturn = NULL;
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index 0935a6d9e5..bcb804f3ce 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -560,6 +560,15 @@ ginVacuumEntryPage(GinVacuumState *gvs, Buffer buffer, BlockNumber *roots, uint3
return (tmppage == origpage) ? NULL : tmppage;
}
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+ginvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
IndexBulkDeleteResult *
ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback, void *callback_state)
@@ -571,6 +580,13 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
BlockNumber rootOfPostingTree[BLCKSZ / (sizeof(IndexTupleData) + sizeof(ItemId))];
uint32 nRoot;
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
gvs.tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
"Gin vacuum temporary context",
ALLOCSET_DEFAULT_SIZES);
@@ -708,6 +724,10 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
return stats;
}
+ /* Skip index cleanup if user requests to disable */
+ if (!info->vacuumcleanup_requested)
+ return stats;
+
/*
* Set up all-zero stats and cleanup pending inserts if ginbulkdelete
* wasn't called
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3f2b416ce1..f7d100255d 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -84,6 +84,7 @@ gisthandler(PG_FUNCTION_ARGS)
amroutine->ambuild = gistbuild;
amroutine->ambuildempty = gistbuildempty;
amroutine->aminsert = gistinsert;
+ amroutine->amvacuumstrategy = gistvacuumstrategy;
amroutine->ambulkdelete = gistbulkdelete;
amroutine->amvacuumcleanup = gistvacuumcleanup;
amroutine->amcanreturn = gistcanreturn;
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index a9c616c772..40ff75b1ad 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -52,6 +52,15 @@ static bool gistdeletepage(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
Buffer buffer, OffsetNumber downlink,
Buffer leafBuffer);
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+gistvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* VACUUM bulkdelete stage: remove index entries.
*/
@@ -59,6 +68,13 @@ IndexBulkDeleteResult *
gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback, void *callback_state)
{
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
/* allocate stats if first time through, else re-use existing struct */
if (stats == NULL)
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
@@ -74,8 +90,11 @@ gistbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
IndexBulkDeleteResult *
gistvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
{
- /* No-op in ANALYZE ONLY mode */
- if (info->analyze_only)
+ /*
+ * No-op in ANALYZE ONLY mode or when user requests to disable index
+ * cleanup.
+ */
+ if (info->analyze_only || !info->vacuumcleanup_requested)
return stats;
/*
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 7c9ccf446c..0ed2bd6717 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -81,6 +81,7 @@ hashhandler(PG_FUNCTION_ARGS)
amroutine->ambuild = hashbuild;
amroutine->ambuildempty = hashbuildempty;
amroutine->aminsert = hashinsert;
+ amroutine->amvacuumstrategy = hashvacuumstrategy;
amroutine->ambulkdelete = hashbulkdelete;
amroutine->amvacuumcleanup = hashvacuumcleanup;
amroutine->amcanreturn = NULL;
@@ -443,6 +444,15 @@ hashendscan(IndexScanDesc scan)
scan->opaque = NULL;
}
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+hashvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* Bulk deletion of all index entries pointing to a set of heap tuples.
* The set of target tuples is specified via a callback routine that tells
@@ -468,6 +478,13 @@ hashbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
HashMetaPage metap;
HashMetaPage cachedmetap;
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
tuples_removed = 0;
num_index_tuples = 0;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 25f2d5df1b..93c4488e39 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -214,6 +214,18 @@ typedef struct LVShared
double reltuples;
bool estimated_count;
+ /*
+ * Copied from LVRelStats. It tells index AM that lazy vacuum will remove
+ * dead tuples from the heap after index vacuum.
+ */
+ bool vacuum_heap;
+
+ /*
+ * Copied from LVRelStats. It tells index AM whether amvacuumcleanup is
+ * requested or not.
+ */
+ bool vacuumcleanup_requested;
+
/*
* In single process lazy vacuum we could consume more memory during index
* vacuuming or cleanup apart from the memory for heap scanning. In
@@ -293,8 +305,8 @@ typedef struct LVRelStats
{
char *relnamespace;
char *relname;
- /* useindex = true means two-pass strategy; false means one-pass */
- bool useindex;
+ /* hasindex = true means two-pass strategy; false means one-pass */
+ bool hasindex;
/* Overall statistics about rel */
BlockNumber old_rel_pages; /* previous value of pg_class.relpages */
BlockNumber rel_pages; /* total number of pages */
@@ -310,9 +322,11 @@ typedef struct LVRelStats
double tuples_deleted;
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
LVDeadTuples *dead_tuples;
+ bool vacuum_heap; /* do we remove dead tuples from the heap? */
int num_index_scans;
TransactionId latestRemovedXid;
bool lock_waiter_detected;
+ bool vacuumcleanup_requested; /* INDEX_CLEANUP is set to false */
/* Used for error callback */
char *indname;
@@ -343,6 +357,12 @@ static BufferAccessStrategy vac_strategy;
static void lazy_scan_heap(Relation onerel, VacuumParams *params,
LVRelStats *vacrelstats, Relation *Irel, int nindexes,
bool aggressive);
+static void choose_vacuum_strategy(LVRelStats *vacrelstats, VacuumParams *params,
+ Relation *Irel, int nindexes);
+static void lazy_vacuum_table_and_indexes(Relation onerel, VacuumParams *params,
+ LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps);
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelStats *vacrelstats);
@@ -442,7 +462,6 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
ErrorContextCallback errcallback;
Assert(params != NULL);
- Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
/* not every AM requires these to be valid, but heap does */
@@ -501,8 +520,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
/* Open all indexes of the relation */
vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
- vacrelstats->useindex = (nindexes > 0 &&
- params->index_cleanup == VACOPT_TERNARY_ENABLED);
+ vacrelstats->hasindex = (nindexes > 0);
/*
* Setup error traceback support for ereport(). The idea is to set up an
@@ -811,14 +829,23 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
vacrelstats->nonempty_pages = 0;
vacrelstats->latestRemovedXid = InvalidTransactionId;
+ /* index vacuum cleanup is enabled if index cleanup is not
+ * disabled, i.g., either default or enabled.
+ */
+ vacrelstats->vacuumcleanup_requested =
+ (params->index_cleanup != VACOPT_TERNARY_DISABLED);
+
vistest = GlobalVisTestFor(onerel);
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
- * least two indexes on a table.
+ * least two indexes on a table. When the index cleanup is disabled,
+ * since index bulk-deletions are likely to be no-op we disable a parallel
+ * vacuum.
*/
- if (params->nworkers >= 0 && vacrelstats->useindex && nindexes > 1)
+ if (params->nworkers >= 0 && nindexes > 1 &&
+ params->index_cleanup != VACOPT_TERNARY_DISABLED)
{
/*
* Since parallel workers cannot access data in temporary tables, we
@@ -1050,19 +1077,10 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
vmbuffer = InvalidBuffer;
}
- /* Work on all the indexes, then the heap */
- lazy_vacuum_all_indexes(onerel, Irel, indstats,
- vacrelstats, lps, nindexes);
-
- /* Remove tuples from heap */
- lazy_vacuum_heap(onerel, vacrelstats);
-
- /*
- * Forget the now-vacuumed tuples, and press on, but be careful
- * not to reset latestRemovedXid since we want that value to be
- * valid.
- */
- dead_tuples->num_tuples = 0;
+ /* Vacuum the table and its indexes */
+ lazy_vacuum_table_and_indexes(onerel, params, vacrelstats,
+ Irel, nindexes, indstats,
+ lps);
/*
* Vacuum the Free Space Map to make newly-freed space visible on
@@ -1515,29 +1533,14 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
* doing a second scan. Also we don't do that but forget dead tuples
* when index cleanup is disabled.
*/
- if (!vacrelstats->useindex && dead_tuples->num_tuples > 0)
+ if (!vacrelstats->hasindex && dead_tuples->num_tuples > 0)
{
- if (nindexes == 0)
- {
- /* Remove tuples from heap if the table has no index */
- lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
- vacuumed_pages++;
- has_dead_tuples = false;
- }
- else
- {
- /*
- * Here, we have indexes but index cleanup is disabled.
- * Instead of vacuuming the dead tuples on the heap, we just
- * forget them.
- *
- * Note that vacrelstats->dead_tuples could have tuples which
- * became dead after HOT-pruning but are not marked dead yet.
- * We do not process them because it's a very rare condition,
- * and the next vacuum will process them anyway.
- */
- Assert(params->index_cleanup == VACOPT_TERNARY_DISABLED);
- }
+ Assert(nindexes == 0);
+
+ /* Remove tuples from heap if the table has no index */
+ lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+ vacuumed_pages++;
+ has_dead_tuples = false;
/*
* Forget the now-vacuumed tuples, and press on, but be careful
@@ -1702,14 +1705,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
if (dead_tuples->num_tuples > 0)
- {
- /* Work on all the indexes, and then the heap */
- lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
- lps, nindexes);
-
- /* Remove tuples from heap */
- lazy_vacuum_heap(onerel, vacrelstats);
- }
+ lazy_vacuum_table_and_indexes(onerel, params, vacrelstats,
+ Irel, nindexes, indstats,
+ lps);
/*
* Vacuum the remainder of the Free Space Map. We must do this whether or
@@ -1722,7 +1720,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
/* Do post-vacuum cleanup */
- if (vacrelstats->useindex)
+ if (vacrelstats->hasindex)
lazy_cleanup_all_indexes(Irel, indstats, vacrelstats, lps, nindexes);
/*
@@ -1775,6 +1773,103 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
pfree(buf.data);
}
+/*
+ * Remove the collected garbage tuples from the table and its indexes.
+ */
+static void
+lazy_vacuum_table_and_indexes(Relation onerel, VacuumParams *params,
+ LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **indstats,
+ LVParallelState *lps)
+{
+ /*
+ * Choose the vacuum strategy for this vacuum cycle.
+ * choose_vacuum_strategy will set the decision to
+ * vacrelstats->vacuum_heap.
+ */
+ choose_vacuum_strategy(vacrelstats, params, Irel, nindexes);
+
+ /* Work on all the indexes, then the heap */
+ lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, lps,
+ nindexes);
+
+ if (vacrelstats->vacuum_heap)
+ {
+ /* Remove tuples from heap */
+ lazy_vacuum_heap(onerel, vacrelstats);
+ }
+ else
+ {
+ /*
+ * Here, we don't do heap vacuum in this cycle.
+ *
+ * Note that vacrelstats->dead_tuples could have tuples which
+ * became dead after HOT-pruning but are not marked dead yet.
+ * We do not process them because it's a very rare condition,
+ * and the next vacuum will process them anyway.
+ */
+ Assert(params->index_cleanup != VACOPT_TERNARY_ENABLED);
+ }
+
+ /*
+ * Forget the now-vacuumed tuples, and press on, but be careful
+ * not to reset latestRemovedXid since we want that value to be
+ * valid.
+ */
+ vacrelstats->dead_tuples->num_tuples = 0;
+}
+
+/*
+ * Decide whether or not we remove the collected garbage tuples from the
+ * heap.
+ */
+static void
+choose_vacuum_strategy(LVRelStats *vacrelstats, VacuumParams *params,
+ Relation *Irel, int nindexes)
+{
+ bool vacuum_heap = true;
+
+ /*
+ * If index cleanup option is specified, we use it.
+ *
+ * XXX: should we call amvacuumstrategy even if INDEX_CLEANUP
+ * is specified?
+ */
+ if (params->index_cleanup == VACOPT_TERNARY_ENABLED)
+ vacuum_heap = true;
+ else if (params->index_cleanup == VACOPT_TERNARY_DISABLED)
+ vacuum_heap = false;
+ else
+ {
+ int i;
+
+ /*
+ * If index cleanup option is not specified, we decide the vacuum
+ * strategy based on the returned values from amvacuumstrategy.
+ * If even one index returns 'none', we skip heap vacuum in this
+ * vacuum cycle.
+ */
+ for (i = 0; i < nindexes; i++)
+ {
+ IndexVacuumStrategy ivacstrat;
+ IndexVacuumInfo ivinfo;
+
+ ivinfo.index = Irel[i];
+ /* XXX: fill other fields */
+
+ ivacstrat = index_vacuum_strategy(&ivinfo);
+
+ if (ivacstrat == INDEX_VACUUM_NONE)
+ {
+ vacuum_heap = false;
+ break;
+ }
+ }
+ }
+
+ vacrelstats->vacuum_heap = vacuum_heap;
+}
+
/*
* lazy_vacuum_all_indexes() -- vacuum all indexes of relation.
*
@@ -2120,6 +2215,10 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /* Copy the information to the shared state */
+ lps->lvshared->vacuum_heap = vacrelstats->vacuum_heap;
+ lps->lvshared->vacuumcleanup_requested = vacrelstats->vacuumcleanup_requested;
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
@@ -2444,6 +2543,13 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
ivinfo.message_level = elevel;
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vac_strategy;
+ ivinfo.vacuumcleanup_requested = vacrelstats->vacuumcleanup_requested;
+
+ /*
+ * index bulk-deletion can be skipped safely if we won't delete
+ * garbage tuples from the heap.
+ */
+ ivinfo.bulkdelete_skippable = !(vacrelstats->vacuum_heap);
/*
* Update error traceback information.
@@ -2461,11 +2567,16 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
*stats = index_bulk_delete(&ivinfo, *stats,
lazy_tid_reaped, (void *) dead_tuples);
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrelstats->indname,
- dead_tuples->num_tuples),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ /*
+ * XXX: we don't want to report if ambulkdelete was no-op because of
+ * bulkdelete_skippable. But we cannot know it was or not.
+ */
+ if (*stats)
+ ereport(elevel,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ vacrelstats->indname,
+ dead_tuples->num_tuples),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrelstats, &saved_err_info);
@@ -2495,9 +2606,10 @@ lazy_cleanup_index(Relation indrel,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vac_strategy;
+ ivinfo.bulkdelete_skippable = false;
+ ivinfo.vacuumcleanup_requested = vacrelstats->vacuumcleanup_requested;
/*
* Update error traceback information.
@@ -2844,14 +2956,14 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
* Return the maximum number of dead tuples we can record.
*/
static long
-compute_max_dead_tuples(BlockNumber relblocks, bool useindex)
+compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
{
long maxtuples;
int vac_work_mem = IsAutoVacuumWorkerProcess() &&
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- if (useindex)
+ if (hasindex)
{
maxtuples = MAXDEADTUPLES(vac_work_mem * 1024L);
maxtuples = Min(maxtuples, INT_MAX);
@@ -2881,7 +2993,7 @@ lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
LVDeadTuples *dead_tuples = NULL;
long maxtuples;
- maxtuples = compute_max_dead_tuples(relblocks, vacrelstats->useindex);
+ maxtuples = compute_max_dead_tuples(relblocks, vacrelstats->hasindex);
dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
dead_tuples->num_tuples = 0;
@@ -3573,6 +3685,9 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vacrelstats.indname = NULL;
vacrelstats.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
+ vacrelstats.vacuum_heap = lvshared->vacuum_heap;
+ vacrelstats.vacuumcleanup_requested = lvshared->vacuumcleanup_requested;
+
/* Setup error traceback support for ereport() */
errcallback.callback = vacuum_error_callback;
errcallback.arg = &vacrelstats;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 3fb8688f8f..8df683c640 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -676,6 +676,25 @@ index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap)
return ntids;
}
+/* ----------------
+ * index_vacuum_strategy - decide whether or not to bulkdelete
+ *
+ * This callback routine is called just before calling ambulkdelete.
+ * Returns IndexVacuumStrategy to tell the lazy vacuum whether we do
+ * bulkdelete.
+ * ----------------
+ */
+IndexVacuumStrategy
+index_vacuum_strategy(IndexVacuumInfo *info)
+{
+ Relation indexRelation = info->index;
+
+ RELATION_CHECKS;
+ CHECK_REL_PROCEDURE(amvacuumstrategy);
+
+ return indexRelation->rd_indam->amvacuumstrategy(info);
+}
+
/* ----------------
* index_bulk_delete - do mass deletion of index entries
*
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 0abec10798..38d6a60199 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -133,6 +133,7 @@ bthandler(PG_FUNCTION_ARGS)
amroutine->ambuild = btbuild;
amroutine->ambuildempty = btbuildempty;
amroutine->aminsert = btinsert;
+ amroutine->amvacuumstrategy = btvacuumstrategy;
amroutine->ambulkdelete = btbulkdelete;
amroutine->amvacuumcleanup = btvacuumcleanup;
amroutine->amcanreturn = btcanreturn;
@@ -821,6 +822,18 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
*/
result = true;
}
+ else if (!info->vacuumcleanup_requested)
+ {
+ /*
+ * Skip cleanup if INDEX_CLEANUP is set to false, even if there might
+ * be a deleted page that can be recycled. If INDEX_CLEANUP continues
+ * to be disabled, recyclable pages could be left by XID wraparound.
+ * But in practice it's not so harmful since such workload doesn't need
+ * to delete and recycle pages in any case and deletion of btree index
+ * pages is relatively rare.
+ */
+ result = false;
+ }
else if (TransactionIdIsValid(metad->btm_oldest_btpo_xact) &&
GlobalVisCheckRemovableXid(NULL, metad->btm_oldest_btpo_xact))
{
@@ -863,6 +876,15 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
return result;
}
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+btvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* Bulk deletion of all index entries pointing to a set of heap tuples.
* The set of target tuples is specified via a callback routine that tells
@@ -877,6 +899,13 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
Relation rel = info->index;
BTCycleId cycleid;
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
/* allocate stats if first time through, else re-use existing struct */
if (stats == NULL)
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 64d3ba8288..b18858a50e 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -66,6 +66,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->ambuild = spgbuild;
amroutine->ambuildempty = spgbuildempty;
amroutine->aminsert = spginsert;
+ amroutine->amvacuumstrategy = spgvacuumstrategy;
amroutine->ambulkdelete = spgbulkdelete;
amroutine->amvacuumcleanup = spgvacuumcleanup;
amroutine->amcanreturn = spgcanreturn;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index e1c58933f9..9aafcf9347 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -894,6 +894,15 @@ spgvacuumscan(spgBulkDeleteState *bds)
bds->stats->pages_free = bds->stats->pages_deleted;
}
+/*
+ * Choose the vacuum strategy. Currently always do ambulkdelete.
+ */
+IndexVacuumStrategy
+spgvacuumstrategy(IndexVacuumInfo *info)
+{
+ return INDEX_VACUUM_BULKDELETE;
+}
+
/*
* Bulk deletion of all index entries pointing to a set of heap tuples.
* The set of target tuples is specified via a callback routine that tells
@@ -907,6 +916,13 @@ spgbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
{
spgBulkDeleteState bds;
+ /*
+ * Skip deleting index entries if the corresponding heap tuples will
+ * not be deleted.
+ */
+ if (info->bulkdelete_skippable)
+ return NULL;
+
/* allocate stats if first time through, else re-use existing struct */
if (stats == NULL)
stats = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
@@ -937,8 +953,11 @@ spgvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
{
spgBulkDeleteState bds;
- /* No-op in ANALYZE ONLY mode */
- if (info->analyze_only)
+ /*
+ * No-op in ANALYZE ONLY mode or when user requests to disable index
+ * cleanup.
+ */
+ if (info->analyze_only || !info->vacuumcleanup_requested)
return stats;
/*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 731610c701..abd8d1844e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3401,6 +3401,8 @@ validate_index(Oid heapId, Oid indexId, Snapshot snapshot)
ivinfo.message_level = DEBUG2;
ivinfo.num_heap_tuples = heapRelation->rd_rel->reltuples;
ivinfo.strategy = NULL;
+ ivinfo.bulkdelete_skippable = false;
+ ivinfo.vacuumcleanup_requested = true;
/*
* Encode TIDs as int8 values for the sort, rather than directly sorting
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..4e46e920cf 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -668,6 +668,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
ivinfo.message_level = elevel;
ivinfo.num_heap_tuples = onerel->rd_rel->reltuples;
ivinfo.strategy = vac_strategy;
+ ivinfo.vacuumcleanup_requested = true;
stats = index_vacuum_cleanup(&ivinfo, NULL);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 98270a1049..6a182ba9cd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1870,14 +1870,18 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
onerelid = onerel->rd_lockInfo.lockRelId;
LockRelationIdForSession(&onerelid, lmode);
- /* Set index cleanup option based on reloptions if not yet */
+ /* Set index cleanup option if either reloptions or INDEX_CLEANUP vacuum
+ * command option is set.
+ */
if (params->index_cleanup == VACOPT_TERNARY_DEFAULT)
{
- if (onerel->rd_options == NULL ||
- ((StdRdOptions *) onerel->rd_options)->vacuum_index_cleanup)
- params->index_cleanup = VACOPT_TERNARY_ENABLED;
- else
- params->index_cleanup = VACOPT_TERNARY_DISABLED;
+ if (onerel->rd_options != NULL)
+ {
+ if (((StdRdOptions *) onerel->rd_options)->vacuum_index_cleanup)
+ params->index_cleanup = VACOPT_TERNARY_ENABLED;
+ else
+ params->index_cleanup = VACOPT_TERNARY_DISABLED;
+ }
}
/* Set truncate option based on reloptions if not yet */
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 85b4766016..f885c6ac67 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -111,6 +111,8 @@ typedef bool (*aminsert_function) (Relation indexRelation,
Relation heapRelation,
IndexUniqueCheck checkUnique,
struct IndexInfo *indexInfo);
+/* vacuum strategy */
+typedef IndexVacuumStrategy (*amvacuumstrategy_function) (IndexVacuumInfo *info);
/* bulk delete */
typedef IndexBulkDeleteResult *(*ambulkdelete_function) (IndexVacuumInfo *info,
@@ -258,6 +260,7 @@ typedef struct IndexAmRoutine
ambuild_function ambuild;
ambuildempty_function ambuildempty;
aminsert_function aminsert;
+ amvacuumstrategy_function amvacuumstrategy;
ambulkdelete_function ambulkdelete;
amvacuumcleanup_function amvacuumcleanup;
amcanreturn_function amcanreturn; /* can be NULL */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 9ffc9100c0..cdf98489cf 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -97,6 +97,7 @@ extern int64 bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
extern void brinrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
ScanKey orderbys, int norderbys);
extern void brinendscan(IndexScanDesc scan);
+extern IndexVacuumStrategy brinvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *brinbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 68d90f5141..eea3a28411 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -34,7 +34,8 @@ typedef struct IndexBuildResult
} IndexBuildResult;
/*
- * Struct for input arguments passed to ambulkdelete and amvacuumcleanup
+ * Struct for input arguments passed to amvacuumstrategy, ambulkdelete
+ * and amvacuumcleanup
*
* num_heap_tuples is accurate only when estimated_count is false;
* otherwise it's just an estimate (currently, the estimate is the
@@ -47,6 +48,22 @@ typedef struct IndexVacuumInfo
bool analyze_only; /* ANALYZE (without any actual vacuum) */
bool report_progress; /* emit progress.h status reports */
bool estimated_count; /* num_heap_tuples is an estimate */
+
+ /*
+ * Is this ambulkdelete call is skippable? If true, since lazy vacuum
+ * won't delete the garbage tuples from the heap, the index AM can
+ * skip index bulk-deletion safely. This field is used only when
+ * ambulkdelete.
+ */
+ bool bulkdelete_skippable;
+
+ /*
+ * amvacuumcleanup is requested by lazy vacuum. If false, the index AM
+ * can skip index cleanup. This can be false if INDEX_CLEANUP vacuum option
+ * is set to false. This field is used only when amvacuumcleanup.
+ */
+ bool vacuumcleanup_requested;
+
int message_level; /* ereport level for progress messages */
double num_heap_tuples; /* tuples remaining in heap */
BufferAccessStrategy strategy; /* access strategy for reads */
@@ -125,6 +142,13 @@ typedef struct IndexOrderByDistance
bool isnull;
} IndexOrderByDistance;
+/* Result value for amvacuumstrategy */
+typedef enum IndexVacuumStrategy
+{
+ INDEX_VACUUM_NONE, /* No-op, skip bulk-deletion in this vacuum cycle */
+ INDEX_VACUUM_BULKDELETE /* Do ambulkdelete */
+} IndexVacuumStrategy;
+
/*
* generalized index_ interface routines (in indexam.c)
*/
@@ -173,6 +197,7 @@ extern bool index_getnext_slot(IndexScanDesc scan, ScanDirection direction,
struct TupleTableSlot *slot);
extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
+extern IndexVacuumStrategy index_vacuum_strategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 5cb2f72e4c..21e7282e36 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -396,6 +396,7 @@ extern int64 gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm);
extern void ginInitConsistentFunction(GinState *ginstate, GinScanKey key);
/* ginvacuum.c */
+extern IndexVacuumStrategy ginvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *ginbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index b68c01a5f2..3d191f241d 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -532,6 +532,7 @@ extern void gistMakeUnionKey(GISTSTATE *giststate, int attno,
extern XLogRecPtr gistGetFakeLSN(Relation rel);
/* gistvacuum.c */
+extern IndexVacuumStrategy gistvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *gistbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index bab4d9f1b0..a9b99a6fa3 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -371,6 +371,7 @@ extern IndexScanDesc hashbeginscan(Relation rel, int nkeys, int norderbys);
extern void hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
ScanKey orderbys, int norderbys);
extern void hashendscan(IndexScanDesc scan);
+extern IndexVacuumStrategy hashvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *hashbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index e8fecc6026..7f74066b44 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1008,6 +1008,7 @@ extern void btparallelrescan(IndexScanDesc scan);
extern void btendscan(IndexScanDesc scan);
extern void btmarkpos(IndexScanDesc scan);
extern void btrestrpos(IndexScanDesc scan);
+extern IndexVacuumStrategy btvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *btbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/access/spgist.h b/src/include/access/spgist.h
index 9f2ccc1730..33cc62f489 100644
--- a/src/include/access/spgist.h
+++ b/src/include/access/spgist.h
@@ -211,6 +211,7 @@ extern bool spggettuple(IndexScanDesc scan, ScanDirection dir);
extern bool spgcanreturn(Relation index, int attno);
/* spgvacuum.c */
+extern IndexVacuumStrategy spgvacuumstrategy(IndexVacuumInfo *info);
extern IndexBulkDeleteResult *spgbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
IndexBulkDeleteCallback callback,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a4cd721400..d96e6b6239 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -218,8 +218,10 @@ typedef struct VacuumParams
int log_min_duration; /* minimum execution threshold in ms at
* which verbose logs are activated, -1
* to use default */
- VacOptTernaryValue index_cleanup; /* Do index vacuum and cleanup,
- * default value depends on reloptions */
+ VacOptTernaryValue index_cleanup; /* Do index vacuum and cleanup. In
+ * default mode, it's decided based on
+ * multiple factors. See
+ * choose_vacuum_strategy. */
VacOptTernaryValue truncate; /* Truncate empty pages at the end,
* default value depends on reloptions */
On Tue, Dec 22, 2020 at 2:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've started this separate thread from [1] for discussing the general
API design of index vacuum.
This is a very difficult and very important problem. Clearly defining
the problem is probably the hardest part. This prototype patch seems
like a good start, though.
Private discussion between Masahiko and myself led to a shared
understanding of what the best *general* direction is for VACUUM now.
It is necessary to deal with several problems all at once here, and to
at least think about several more problems that will need to be solved
later. If anybody reading the thread initially finds it hard to see
the connection between the specific items that Masahiko has
introduced, they should note that that's *expected*.
Summary:
* Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is
false, and leave it to the index AM whether or not skip them.
Makes sense. I like the way you unify INDEX_CLEANUP and the
vacuum_cleanup_index_scale_factor stuff in a way that is now quite
explicit and obvious in the code.
The second and third points are to introduce a general framework for
future extensibility. User-visible behavior is not changed by this
change.
In some ways the ideas in your patch might be considered radical, or
at least novel: they introduce the idea that bloat can be a
qualitative thing. But at the same time the design is quite
conservative: these are fairly isolated changes, at least code-wise. I
am not 100% sure that this approach will be successful in
vacuumlazy.c, in the end (I'm ~95% sure). But I am 100% sure that our
collective understanding of the problems in this area will be
significantly improved by this effort. A fundamental rethink does not
necessarily require a fundamental redesign, and yet it might be just
as effective.
This is certainly what I see when testing my bottom-up index deletion
patch, which adds an incremental index deletion mechanism that merely
intervenes in a precise, isolated way. Despite my patch's simplicity,
it manages to practically eliminate an entire important *class* of
index bloat (at least once you make certain mild assumptions about the
duration of snapshots). Sometimes it is possible to solve a hard
problem by thinking about it only *slightly* differently.
This is a tantalizing possibility for VACUUM, too. I'm willing to risk
sounding grandiose if that's what it takes to get more hackers
interested in these questions. With that in mind, here is a summary of
the high level hypothesis behind this VACUUM patch:
VACUUM can and should be reimagined as a top-down mechanism that
complements various bottom-up mechanisms (including the stuff from my
deletion patch, heap pruning, and possibly an enhanced version of heap
pruning based on similar principles). This will be possible without
changing any of the fundamental invariants of the current vacuumlazy.c
design. VACUUM's problems are largely pathological behaviors of one
kind or another, that can be fixed with specific well-targeted
interventions. Workload characteristics can naturally determine how
much of the cleanup is done by VACUUM itself -- large variations are
possible within a single database, and even across indexes on the same
table.
The new index AM API, amvacuumstrategy(), which is called before
bulkdelete() for each index and asks the index bulk-deletion strategy.
On this API, lazy vacuum asks, "Hey index X, I collected garbage heap
tuples during heap scanning, how urgent is vacuuming for you?", and
the index answers either "it's urgent" when it wants to do
bulk-deletion or "it's not urgent, I can skip it". The point of this
proposal is to isolate heap vacuum and index vacuum for each index so
that we can employ different strategies for each index. Lazy vacuum
can decide whether or not to do heap clean based on the answers from
the indexes.
Right -- workload characteristics (plus appropriate optimizations at
the local level) make it possible that amvacuumstrategy() will give
*very* different answers from different indexes *on the same table*.
The idea that all indexes on the table are more or less equally
bloated at any given point in time is mostly wrong. Actually,
*sometimes* it really is correct! But other times it is *dramatically
wrong* -- it all depends on workload characteristics. What is likely
to be true *on average* across all tables/indexes is *irrelevant* (the
mean/average is simply not a useful concept, in fact).
The basic lazy vacuum design needs to recognize this important
difference, and other similar issues. That's the point of
amvacuumstrategy().
Currently, if INDEX_CLEANUP option is not set (i.g.
VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
heap clean. But with this patch we use the default as a neutral state
('smart' mode). This neutral state could be "on" and "off" depending
on several factors including the answers of amvacuumstrategy(), the
table status, and user's request. In this context, specifying
INDEX_CLEANUP would mean making the neutral state "on" or "off" by
user's request. The table status that could influence the decision
could concretely be, for instance:* Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
* Making pages all-visible for index-only scan.
So you have several different kinds of back pressure - 'smart' mode
really is smart.
Also there are potential enhancements using this API:
* If retail index deletion feature[3] is introduced, we can make the
return value of bulkvacuumstrategy() a ternary value: "do_bulkdelete",
"do_indexscandelete", and "no".
Makes sense.
* We probably can introduce a threshold of the number of dead tuples
to control whether or not to do index tuple bulk-deletion (like
bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
case where the amount of dead tuples is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small number of dead tuples, which is inefficient. This
problem is also solved by this proposal by allowing a subset of
indexes to skip bulkdelete() if the number of dead tuple doesn't
exceed the threshold.
Good idea. I bet other people can come up with other ideas a little
like this just by thinking about it. The "untangling" performed by
your patch creates many possibilities
I’ve attached the PoC patch for the above idea. By default, since lazy
vacuum choose the vacuum bulkdelete strategy based on answers of
amvacuumstrategy() so it can be either true or false ( although it’s
always true in the currene patch). But for amvacuumcleanup() there is
no the neutral state, lazy vacuum treats the default as true.
As you said, the next question must be: How do we teach lazy vacuum to
not do what gets requested by amvacuumcleanup() when it cannot respect
the wishes of one individual indexes, for example when the
accumulation of LP_DEAD items in the heap becomes a big problem in
itself? That really could be the thing that forces full heap
vacuuming, even with several indexes.
I will need to experiment in order to improve my understanding of how
to make this cooperate with bottom-up index deletion. But that's
mostly just a question for my patch (and a relatively easy one).
--
Peter Geoghegan
On Thu, Dec 24, 2020 at 12:59 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Dec 22, 2020 at 2:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've started this separate thread from [1] for discussing the general
API design of index vacuum.This is a very difficult and very important problem. Clearly defining
the problem is probably the hardest part. This prototype patch seems
like a good start, though.Private discussion between Masahiko and myself led to a shared
understanding of what the best *general* direction is for VACUUM now.
It is necessary to deal with several problems all at once here, and to
at least think about several more problems that will need to be solved
later. If anybody reading the thread initially finds it hard to see
the connection between the specific items that Masahiko has
introduced, they should note that that's *expected*.Summary:
* Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is
false, and leave it to the index AM whether or not skip them.Makes sense. I like the way you unify INDEX_CLEANUP and the
vacuum_cleanup_index_scale_factor stuff in a way that is now quite
explicit and obvious in the code.The second and third points are to introduce a general framework for
future extensibility. User-visible behavior is not changed by this
change.In some ways the ideas in your patch might be considered radical, or
at least novel: they introduce the idea that bloat can be a
qualitative thing. But at the same time the design is quite
conservative: these are fairly isolated changes, at least code-wise. I
am not 100% sure that this approach will be successful in
vacuumlazy.c, in the end (I'm ~95% sure). But I am 100% sure that our
collective understanding of the problems in this area will be
significantly improved by this effort. A fundamental rethink does not
necessarily require a fundamental redesign, and yet it might be just
as effective.This is certainly what I see when testing my bottom-up index deletion
patch, which adds an incremental index deletion mechanism that merely
intervenes in a precise, isolated way. Despite my patch's simplicity,
it manages to practically eliminate an entire important *class* of
index bloat (at least once you make certain mild assumptions about the
duration of snapshots). Sometimes it is possible to solve a hard
problem by thinking about it only *slightly* differently.This is a tantalizing possibility for VACUUM, too. I'm willing to risk
sounding grandiose if that's what it takes to get more hackers
interested in these questions. With that in mind, here is a summary of
the high level hypothesis behind this VACUUM patch:VACUUM can and should be reimagined as a top-down mechanism that
complements various bottom-up mechanisms (including the stuff from my
deletion patch, heap pruning, and possibly an enhanced version of heap
pruning based on similar principles). This will be possible without
changing any of the fundamental invariants of the current vacuumlazy.c
design. VACUUM's problems are largely pathological behaviors of one
kind or another, that can be fixed with specific well-targeted
interventions. Workload characteristics can naturally determine how
much of the cleanup is done by VACUUM itself -- large variations are
possible within a single database, and even across indexes on the same
table.
Agreed.
Ideally, the bottom-up mechanism works well and reclaim almost all
garbage. VACUUM should be a feature that complements these works if
the bottom-up mechanism cannot work well for some reason, and also is
used to make sure that all collected garbage has been vacuumed. For
heaps, we already have such a mechanism: opportunistically hot-pruning
and lazy vacuum. For indexes especially btree indexes, the bottom-up
index deletion and ambulkdelete() would have a similar relationship.
The new index AM API, amvacuumstrategy(), which is called before
bulkdelete() for each index and asks the index bulk-deletion strategy.
On this API, lazy vacuum asks, "Hey index X, I collected garbage heap
tuples during heap scanning, how urgent is vacuuming for you?", and
the index answers either "it's urgent" when it wants to do
bulk-deletion or "it's not urgent, I can skip it". The point of this
proposal is to isolate heap vacuum and index vacuum for each index so
that we can employ different strategies for each index. Lazy vacuum
can decide whether or not to do heap clean based on the answers from
the indexes.Right -- workload characteristics (plus appropriate optimizations at
the local level) make it possible that amvacuumstrategy() will give
*very* different answers from different indexes *on the same table*.
The idea that all indexes on the table are more or less equally
bloated at any given point in time is mostly wrong. Actually,
*sometimes* it really is correct! But other times it is *dramatically
wrong* -- it all depends on workload characteristics. What is likely
to be true *on average* across all tables/indexes is *irrelevant* (the
mean/average is simply not a useful concept, in fact).The basic lazy vacuum design needs to recognize this important
difference, and other similar issues. That's the point of
amvacuumstrategy().
Agreed.
In terms of bloat, the characteristics of index AM also bring such
differences (e.g., btree vs. brin). With the bottom-up index deletion
feature, even btree indexes on the same table will also different to
each other.
Currently, if INDEX_CLEANUP option is not set (i.g.
VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
heap clean. But with this patch we use the default as a neutral state
('smart' mode). This neutral state could be "on" and "off" depending
on several factors including the answers of amvacuumstrategy(), the
table status, and user's request. In this context, specifying
INDEX_CLEANUP would mean making the neutral state "on" or "off" by
user's request. The table status that could influence the decision
could concretely be, for instance:* Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
* Making pages all-visible for index-only scan.So you have several different kinds of back pressure - 'smart' mode
really is smart.Also there are potential enhancements using this API:
* If retail index deletion feature[3] is introduced, we can make the
return value of bulkvacuumstrategy() a ternary value: "do_bulkdelete",
"do_indexscandelete", and "no".Makes sense.
* We probably can introduce a threshold of the number of dead tuples
to control whether or not to do index tuple bulk-deletion (like
bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
case where the amount of dead tuples is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small number of dead tuples, which is inefficient. This
problem is also solved by this proposal by allowing a subset of
indexes to skip bulkdelete() if the number of dead tuple doesn't
exceed the threshold.Good idea. I bet other people can come up with other ideas a little
like this just by thinking about it. The "untangling" performed by
your patch creates many possibilitiesI’ve attached the PoC patch for the above idea. By default, since lazy
vacuum choose the vacuum bulkdelete strategy based on answers of
amvacuumstrategy() so it can be either true or false ( although it’s
always true in the currene patch). But for amvacuumcleanup() there is
no the neutral state, lazy vacuum treats the default as true.As you said, the next question must be: How do we teach lazy vacuum to
not do what gets requested by amvacuumcleanup() when it cannot respect
the wishes of one individual indexes, for example when the
accumulation of LP_DEAD items in the heap becomes a big problem in
itself? That really could be the thing that forces full heap
vacuuming, even with several indexes.
You mean requested by amvacuumstreategy(), not by amvacuumcleanup()? I
think amvacuumstrategy() affects only ambulkdelete(). But when all
ambulkdelete() were skipped by the requests by index AMs we might want
to skip amvacuumcleanup() as well.
I will need to experiment in order to improve my understanding of how
to make this cooperate with bottom-up index deletion. But that's
mostly just a question for my patch (and a relatively easy one).
Yeah, I think we might need something like statistics about garbage
per index so that individual index can make a different decision based
on their status. For example, a btree index might want to skip
ambulkdelete() if it has a few dead index tuples in its leaf pages. It
could be on stats collector or on btree's meta page.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
On Sun, Dec 27, 2020 at 10:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As you said, the next question must be: How do we teach lazy vacuum to
not do what gets requested by amvacuumcleanup() when it cannot respect
the wishes of one individual indexes, for example when the
accumulation of LP_DEAD items in the heap becomes a big problem in
itself? That really could be the thing that forces full heap
vacuuming, even with several indexes.You mean requested by amvacuumstreategy(), not by amvacuumcleanup()? I
think amvacuumstrategy() affects only ambulkdelete(). But when all
ambulkdelete() were skipped by the requests by index AMs we might want
to skip amvacuumcleanup() as well.
No, I was asking about how we should decide to do a real VACUUM even
(a real ambulkdelete() call) when no index asks for it because
bottom-up deletion works very well in every index. Clearly we will
need to eventually remove remaining LP_DEAD items from the heap at
some point if nothing else happens -- eventually LP_DEAD items in the
heap alone will force a traditional heap vacuum (which will still have
to go through indexes that have not grown, just to be safe/avoid
recycling a TID that's still in the index).
Postgres heap fillfactor is 100 by default, though I believe it's 90
in another well known DB system. If you set Postgres heap fill factor
to 90 you can fit a little over 200 LP_DEAD items in the "extra space"
left behind in each heap page after initial bulk loading/INSERTs take
place that respect our lower fill factor setting. This is about 4x the
number of initial heap tuples in the pgbench_accounts table -- it's
quite a lot!
If we pessimistically assume that all updates are non-HOT updates,
we'll still usually have enough space for each logical row to get
updated several times before the heap page "overflows". Even when
there is significant skew in the UPDATEs, the skew is not noticeable
at the level of individual heap pages. We have a surprisingly large
general capacity to temporarily "absorb" extra garbage LP_DEAD items
in heap pages this way. Nobody really cared about this extra capacity
very much before now, because it did not help with the big problem of
index bloat that you naturally see with this workload. But that big
problem may go away soon, and so this extra capacity may become
important at the same time.
I think that it could make sense for lazy_scan_heap() to maintain
statistics about the number of LP_DEAD items remaining in each heap
page (just local stack variables). From there, it can pass the
statistics to the choose_vacuum_strategy() function from your patch.
Perhaps choose_vacuum_strategy() will notice that the heap page with
the most LP_DEAD items encountered within lazy_scan_heap() (among
those encountered so far in the event of multiple index passes) has
too many LP_DEAD items -- this indicates that there is a danger that
some heap pages will start to "overflow" soon, which is now a problem
that lazy_scan_heap() must think about. Maybe if the "extra space"
left by applying heap fill factor (with settings below 100) is
insufficient to fit perhaps 2/3 of the LP_DEAD items needed on the
heap page that has the most LP_DEAD items (among all heap pages), we
stop caring about what amvacuumstrategy()/the indexes say. So we do
the right thing for the heap pages, while still mostly avoiding index
vacuuming and the final heap pass.
I experimented with this today, and I think that it is a good way to
do it. I like the idea of choose_vacuum_strategy() understanding that
heap pages that are subject to many non-HOT updates have a "natural
extra capacity for LP_DEAD items" that it must care about directly (at
least with non-default heap fill factor settings). My early testing
shows that it will often take a surprisingly long time for the most
heavily updated heap page to have more than about 100 LP_DEAD items.
I will need to experiment in order to improve my understanding of how
to make this cooperate with bottom-up index deletion. But that's
mostly just a question for my patch (and a relatively easy one).Yeah, I think we might need something like statistics about garbage
per index so that individual index can make a different decision based
on their status. For example, a btree index might want to skip
ambulkdelete() if it has a few dead index tuples in its leaf pages. It
could be on stats collector or on btree's meta page.
Right. I think that even a very conservative approach could work well.
For example, maybe we teach nbtree's amvacuumstrategy() routine to ask
to do a real ambulkdelete(), except in the extreme case where the
index is *exactly* the same size as it was after the last VACUUM.
This will happen regularly with bottom-up index deletion. Maybe that
approach is a bit too conservative, though.
--
Peter Geoghegan
On Sun, Dec 27, 2020 at 11:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
I experimented with this today, and I think that it is a good way to
do it. I like the idea of choose_vacuum_strategy() understanding that
heap pages that are subject to many non-HOT updates have a "natural
extra capacity for LP_DEAD items" that it must care about directly (at
least with non-default heap fill factor settings). My early testing
shows that it will often take a surprisingly long time for the most
heavily updated heap page to have more than about 100 LP_DEAD items.
Attached is a rough patch showing what I did here. It was applied on
top of my bottom-up index deletion patch series and your
poc_vacuumstrategy.patch patch. This patch was written as a quick and
dirty way of simulating what I thought would work best for bottom-up
index deletion for one specific benchmark/test, which was
non-hot-update heavy. This consists of a variant pgbench with several
indexes on pgbench_accounts (almost the same as most other bottom-up
deletion benchmarks I've been running). Only one index is "logically
modified" by the updates, but of course we still physically modify all
indexes on every update. I set fill factor to 90 for this benchmark,
which is an important factor for how your VACUUM patch works during
the benchmark.
This rough supplementary patch includes VACUUM logic that assumes (but
doesn't check) that the table has heap fill factor set to 90 -- see my
changes to choose_vacuum_strategy(). This benchmark is really about
stability over time more than performance (though performance is also
improved significantly). I wanted to keep both the table/heap and the
logically unmodified indexes (i.e. 3 out of 4 indexes on
pgbench_accounts) exactly the same size *forever*.
Does this make sense?
Anyway, with a 15k TPS limit on a pgbench scale 3000 DB, I see that
pg_stat_database shows an almost ~28% reduction in blks_read after an
overnight run for the patch series (it was 508,820,699 for the
patches, 705,282,975 for the master branch). I think that the VACUUM
component is responsible for some of that reduction. There were 11
VACUUMs for the patch, 7 of which did not call lazy_vacuum_heap()
(these 7 VACUUM operations all only dead a btbulkdelete() call for the
one problematic index on the table, named "abalance_ruin", which my
supplementary patch has hard-coded knowledge of).
--
Peter Geoghegan
Attachments:
0007-btvacuumstrategy-bottom-up-index-deletion-changes.patchapplication/octet-stream; name=0007-btvacuumstrategy-bottom-up-index-deletion-changes.patchDownload
From 5ae5dde505ded1f555324382f9db6e7fbd114492 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 23 Dec 2020 20:42:53 -0800
Subject: [PATCH 7/8] btvacuumstrategy() bottom-up index deletion changes
---
src/backend/access/heap/vacuumlazy.c | 69 +++++++++++++++++++++++++---
src/backend/access/nbtree/nbtree.c | 35 ++++++++++++--
src/backend/commands/vacuum.c | 6 +++
3 files changed, 100 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93c4488e39..c45c49d561 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -358,11 +358,13 @@ static void lazy_scan_heap(Relation onerel, VacuumParams *params,
LVRelStats *vacrelstats, Relation *Irel, int nindexes,
bool aggressive);
static void choose_vacuum_strategy(LVRelStats *vacrelstats, VacuumParams *params,
- Relation *Irel, int nindexes);
+ Relation *Irel, int nindexes, double live_tuples,
+ int maxdeadpage);
static void lazy_vacuum_table_and_indexes(Relation onerel, VacuumParams *params,
LVRelStats *vacrelstats, Relation *Irel,
int nindexes, IndexBulkDeleteResult **stats,
- LVParallelState *lps);
+ LVParallelState *lps, double live_tuples,
+ int maxdeadpage);
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelStats *vacrelstats);
@@ -781,6 +783,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
BlockNumber empty_pages,
vacuumed_pages,
next_fsm_block_to_vacuum;
+ int maxdeadpage = 0; /* controls if we skip heap vacuum scan */
double num_tuples, /* total number of nonremovable tuples */
live_tuples, /* live tuples (reltuples estimate) */
tups_vacuumed, /* tuples cleaned up by vacuum */
@@ -1080,7 +1083,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
/* Vacuum the table and its indexes */
lazy_vacuum_table_and_indexes(onerel, params, vacrelstats,
Irel, nindexes, indstats,
- lps);
+ lps, live_tuples, maxdeadpage);
/*
* Vacuum the Free Space Map to make newly-freed space visible on
@@ -1666,6 +1669,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
*/
if (dead_tuples->num_tuples == prev_dead_count)
RecordPageWithFreeSpace(onerel, blkno, freespace);
+ else
+ maxdeadpage = Max(maxdeadpage,
+ dead_tuples->num_tuples - prev_dead_count);
}
/* report that everything is scanned and vacuumed */
@@ -1707,7 +1713,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
if (dead_tuples->num_tuples > 0)
lazy_vacuum_table_and_indexes(onerel, params, vacrelstats,
Irel, nindexes, indstats,
- lps);
+ lps, live_tuples, maxdeadpage);
/*
* Vacuum the remainder of the Free Space Map. We must do this whether or
@@ -1780,14 +1786,16 @@ static void
lazy_vacuum_table_and_indexes(Relation onerel, VacuumParams *params,
LVRelStats *vacrelstats, Relation *Irel,
int nindexes, IndexBulkDeleteResult **indstats,
- LVParallelState *lps)
+ LVParallelState *lps, double live_tuples,
+ int maxdeadpage)
{
/*
* Choose the vacuum strategy for this vacuum cycle.
* choose_vacuum_strategy will set the decision to
* vacrelstats->vacuum_heap.
*/
- choose_vacuum_strategy(vacrelstats, params, Irel, nindexes);
+ choose_vacuum_strategy(vacrelstats, params, Irel, nindexes, live_tuples,
+ maxdeadpage);
/* Work on all the indexes, then the heap */
lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats, lps,
@@ -1825,7 +1833,8 @@ lazy_vacuum_table_and_indexes(Relation onerel, VacuumParams *params,
*/
static void
choose_vacuum_strategy(LVRelStats *vacrelstats, VacuumParams *params,
- Relation *Irel, int nindexes)
+ Relation *Irel, int nindexes, double live_tuples,
+ int maxdeadpage)
{
bool vacuum_heap = true;
@@ -1865,6 +1874,52 @@ choose_vacuum_strategy(LVRelStats *vacrelstats, VacuumParams *params,
break;
}
}
+
+ /*
+ * XXX: This 130 test is for the maximum number of LP_DEAD items on
+ * any one heap page encountered during heap scan by caller. The
+ * general idea here is to preserve the original pristine state of the
+ * table when it is subject to constant non-HOT updates when heap fill
+ * factor is reduced from its default.
+ *
+ * If we do this right (and with bottom-up index deletion), the
+ * overall effect for non-HOT-update heavy workloads is that both
+ * table and indexes (or at least a subset of indexes on the table
+ * that are never logically modified by the updates) never grow even
+ * by one block. We can actually make those things perfectly stable
+ * over time in the absence of queries that hold open MVCC snapshots
+ * for a long time. Stability is perhaps the most important thing
+ * here (not performance per se).
+ *
+ * The exact number used here (130) is based on the assumption that
+ * heap fillfactor is set to 90 in this table -- we can fit roughly
+ * 200 "extra" LP_DEAD items on heap pages before they start to
+ * "overflow" with that setting (e.g. before a pgbench_accounts table
+ * that is subject to constant non-HOT updates needs to allocate new
+ * pages just for new versions). We're trying to avoid having VACUUM
+ * call lazy_vacuum_heap() in most cases, but we don't want to be too
+ * aggressive: it would be risky to make the value we test for much
+ * higher/closer to ~200, since it might be too late by the time we
+ * actually call lazy_vacuum_heap(). (Unsure of this, but that's the
+ * idea, at least.)
+ *
+ * Since we're mostly worried about stability over time here, we have
+ * to be worried about "small" effects. If there are just a few heap
+ * page overflows in each VACUUM cycle, that still means that heap
+ * page overflows are _possible_. It is perhaps only a matter of time
+ * until the heap becomes almost as fragmented as it would with a heap
+ * fill factor of 100 -- so "small" effects may be really important.
+ * (Just guessing here, but I can say for sure that the bottom-up
+ * deletion patch works that way, so it is an "educated guess".)
+ */
+ if (!vacuum_heap)
+ {
+ if (maxdeadpage > 130 ||
+ /* Also check if maintenance_work_mem space is running out */
+ vacrelstats->dead_tuples->num_tuples >
+ vacrelstats->dead_tuples->max_tuples / 2)
+ vacuum_heap = true;
+ }
}
vacrelstats->vacuum_heap = vacuum_heap;
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 420457c1a2..ee071cb463 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -878,12 +878,35 @@ _bt_vacuum_needs_cleanup(IndexVacuumInfo *info)
}
/*
- * Choose the vacuum strategy. Currently always do ambulkdelete.
+ * Choose the vacuum strategy
*/
IndexVacuumStrategy
btvacuumstrategy(IndexVacuumInfo *info)
{
- return INDEX_VACUUM_BULKDELETE;
+ Relation rel = info->index;
+
+ /*
+ * This strcmp() is a quick and dirty prototype of logic that decides
+ * whether or not the index needs to get a bulk deletion during this
+ * VACUUM. A real version of this logic could work by remembering the
+ * size of the index during the last VACUUM. It would only return
+ * INDEX_VACUUM_BULKDELETE to choose_vacuum_strategy()/vacuumlazy.c iff it
+ * found that the index is now larger than it was last time around, even
+ * by one single block. (It could get a lot more sophisticated than that,
+ * for example by trying to understand UPDATEs vs DELETEs, but a very
+ * simple approach is probably almost as useful to users.)
+ *
+ * Further details on the strcmp() and my benchmarking:
+ *
+ * The index named abalance_ruin is the only index that receives logical
+ * changes in my pgbench benchmarks. It is one index among several on
+ * pgbench_accounts. It covers the abalance column, which makes almost
+ * 100% of all UPDATEs non-HOT UPDATEs.
+ */
+ if (strcmp(RelationGetRelationName(rel), "abalance_ruin") == 0)
+ return INDEX_VACUUM_BULKDELETE;
+
+ return INDEX_VACUUM_NONE;
}
/*
@@ -903,8 +926,14 @@ btbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
/*
* Skip deleting index entries if the corresponding heap tuples will
* not be deleted.
+ *
+ * XXX: Maybe we need to remember the decision made in btvacuumstrategy()
+ * in an AM-generic way, or using some standard idiom that is owned by the
+ * index AM? The strcmp() here repeats work done in btvacuumstrategy(),
+ * which is not ideal.
*/
- if (info->bulkdelete_skippable)
+ if (info->bulkdelete_skippable &&
+ strcmp(RelationGetRelationName(rel), "abalance_ruin") != 0)
return NULL;
/* allocate stats if first time through, else re-use existing struct */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 6a182ba9cd..223b7cb820 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1875,6 +1875,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
*/
if (params->index_cleanup == VACOPT_TERNARY_DEFAULT)
{
+ /*
+ * XXX had to comment this out to get choose_vacuum_strategy() to do
+ * the right thing
+ */
+#if 0
if (onerel->rd_options != NULL)
{
if (((StdRdOptions *) onerel->rd_options)->vacuum_index_cleanup)
@@ -1882,6 +1887,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
else
params->index_cleanup = VACOPT_TERNARY_DISABLED;
}
+#endif
}
/* Set truncate option based on reloptions if not yet */
--
2.27.0
On Mon, Dec 28, 2020 at 4:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Dec 27, 2020 at 10:55 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As you said, the next question must be: How do we teach lazy vacuum to
not do what gets requested by amvacuumcleanup() when it cannot respect
the wishes of one individual indexes, for example when the
accumulation of LP_DEAD items in the heap becomes a big problem in
itself? That really could be the thing that forces full heap
vacuuming, even with several indexes.You mean requested by amvacuumstreategy(), not by amvacuumcleanup()? I
think amvacuumstrategy() affects only ambulkdelete(). But when all
ambulkdelete() were skipped by the requests by index AMs we might want
to skip amvacuumcleanup() as well.No, I was asking about how we should decide to do a real VACUUM even
(a real ambulkdelete() call) when no index asks for it because
bottom-up deletion works very well in every index. Clearly we will
need to eventually remove remaining LP_DEAD items from the heap at
some point if nothing else happens -- eventually LP_DEAD items in the
heap alone will force a traditional heap vacuum (which will still have
to go through indexes that have not grown, just to be safe/avoid
recycling a TID that's still in the index).Postgres heap fillfactor is 100 by default, though I believe it's 90
in another well known DB system. If you set Postgres heap fill factor
to 90 you can fit a little over 200 LP_DEAD items in the "extra space"
left behind in each heap page after initial bulk loading/INSERTs take
place that respect our lower fill factor setting. This is about 4x the
number of initial heap tuples in the pgbench_accounts table -- it's
quite a lot!If we pessimistically assume that all updates are non-HOT updates,
we'll still usually have enough space for each logical row to get
updated several times before the heap page "overflows". Even when
there is significant skew in the UPDATEs, the skew is not noticeable
at the level of individual heap pages. We have a surprisingly large
general capacity to temporarily "absorb" extra garbage LP_DEAD items
in heap pages this way. Nobody really cared about this extra capacity
very much before now, because it did not help with the big problem of
index bloat that you naturally see with this workload. But that big
problem may go away soon, and so this extra capacity may become
important at the same time.I think that it could make sense for lazy_scan_heap() to maintain
statistics about the number of LP_DEAD items remaining in each heap
page (just local stack variables). From there, it can pass the
statistics to the choose_vacuum_strategy() function from your patch.
Perhaps choose_vacuum_strategy() will notice that the heap page with
the most LP_DEAD items encountered within lazy_scan_heap() (among
those encountered so far in the event of multiple index passes) has
too many LP_DEAD items -- this indicates that there is a danger that
some heap pages will start to "overflow" soon, which is now a problem
that lazy_scan_heap() must think about. Maybe if the "extra space"
left by applying heap fill factor (with settings below 100) is
insufficient to fit perhaps 2/3 of the LP_DEAD items needed on the
heap page that has the most LP_DEAD items (among all heap pages), we
stop caring about what amvacuumstrategy()/the indexes say. So we do
the right thing for the heap pages, while still mostly avoiding index
vacuuming and the final heap pass.
Agreed. I like the idea that we calculate how many LP_DEAD items we
can absorb based on the extra space left by applying the fill factor.
Since there is a limit on the maximum number of line pointers in a
heap page we might need to consider that limit when calculation.
From another point of view, given the maximum number of heap tuple in
one 8kb heap page (MaxHeapTuplesPerPage) is 291, I think how bad to
store LP_DEAD items in a heap page vary depending on the tuple size.
For example, suppose the tuple size is 200 we can store 40 tuples into
one heap page if there is no LP_DEAD item at all. Even if there are
150 LP_DEAD items on the page, we still are able to store 37 tuples
because we still can have 141 line pointers at most, which is enough
number to store the maximum number of heap tuples when there are no
LP_DEAD items, and we have (8192 - (4 * 150)) bytes space to store
tuples (with line pointers). That is, we can think that having 150
LP_DEAD items end up causing an overflow of 3 tuples. On the other
hand, suppose the tuple size is 40 we can store 204 tuples into one
heap page if there is no LP_DEAD item at all. If there are 150 LP_DEAD
items on the page, we are able to store 141 tuples. That is, having
150 LP_DEAD items end up causing an overflow of 63 tuples. I think
the impact on the table bloat by absorbing LP_DEAD items is larger in
the latter case.
The larger the tuple size, the more LP_DEAD items can be absorbed in a
heap page with less bad effect. Considering 32 bytes tuple, the
minimum heap tuples size including the tuple header, absorbing
approximately up to 70 LP_DEAD items would not affect much in terms of
bloat. In other words, if a heap page has more than 70 LP_DEAD items,
absorbing LP_DEAD items may become a problem of the table bloat. This
threshold of 70 LP_DEAD items is a conservative value and probably
would be a lower bound. If the tuple size is larger, we may be able to
absorb more LP_DEAD items.
FYI I've attached a graph showing how the number of LP_DEAD items on
one heap page affects the maximum number of heap tuples on the same
heap page. The X-axis is the number of LP_DEAD items in one heap page
and the Y-axis is the number of heap tuples that can be stored on the
page. The lines in the graph are heap tuple size respectively. For
example, in pgbench workload, since the tuple size is about 120 bytes
the page bloat accelerates if we leave more than about 230 LP_DEAD
items in a heap page.
I experimented with this today, and I think that it is a good way to
do it. I like the idea of choose_vacuum_strategy() understanding that
heap pages that are subject to many non-HOT updates have a "natural
extra capacity for LP_DEAD items" that it must care about directly (at
least with non-default heap fill factor settings). My early testing
shows that it will often take a surprisingly long time for the most
heavily updated heap page to have more than about 100 LP_DEAD items.
Agreed.
I will need to experiment in order to improve my understanding of how
to make this cooperate with bottom-up index deletion. But that's
mostly just a question for my patch (and a relatively easy one).Yeah, I think we might need something like statistics about garbage
per index so that individual index can make a different decision based
on their status. For example, a btree index might want to skip
ambulkdelete() if it has a few dead index tuples in its leaf pages. It
could be on stats collector or on btree's meta page.Right. I think that even a very conservative approach could work well.
For example, maybe we teach nbtree's amvacuumstrategy() routine to ask
to do a real ambulkdelete(), except in the extreme case where the
index is *exactly* the same size as it was after the last VACUUM.
This will happen regularly with bottom-up index deletion. Maybe that
approach is a bit too conservative, though.
Agreed.
Regards,
--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/
Attachments:
lp_dead.pngimage/png; name=lp_dead.pngDownload
�PNG
IHDR � � ��K� pHYs �� IDATx���w\����]B��{���PX������
�uE�u�m������E�V��uT������E�*"�,��@r���"�J4�2^�G�G8.�{%�}{��}B�,K @�h� �#4` � �
�h� <@ �0 T��w��<c����TB������p��mB������3r����U @K�����O�:5$$���_�5!����Eu���s�����yyyC����JII���P�A ��P%�����]�6,,����~�������wo��� 33�c�����FGG'%%�i��a���o �_�
���������1cFdd��������n��u�����w���m���bffv������s�Z�v���K
�------UI7���B������-����-z]~~>��B�������������*k����=z�������� !�O����quu
uqqi�����y��������L��NNNYYY�����h��KIIaY����� �oQ���oQ���N�������ddd�����]VV����]&�,Z�h��)!!!iii������/ccc}||��/!������599Y%It���Y����������oQ���o��5l��N���i������s�!{�����~������###?��c���0kk��� ooo�P��" �3�4���'O�<Yq����CBB
����-����144411Q�A ��jp�AU��(^y�bii��$~��� �Eo���Fx�j���Fx�����L����N�n�X�����7 ���k;��� ���=�7o^�����]���=~�x�����{��U�s��=S�N}�PIII��l���� @���w���#F,\����S��i��������',,���gU{���B�]����o���_��+��~e����3` �*�����u���]�z���O !������pppX�bE|||�������������r��%
4?~��;wJKK7n�`��;v���[ ���}�����_~����SSRR�
��w��{�
���W�[��W�d�����h� ���_�l��U�! �,X���j�*���v���s^^���g���3}���O�����������NNN'N�t������I�<==W�\�����{�������o����������t777n4�%�:+V��50 �IDDD���?���5k�,^���k��o�����������m�6m��j�j��qIIIaaa{��IMM3fL�>����?�9s&EQFFF�n�4h��c�����������K��K������ @��o��������'%%B:4m������h����o�����|��q//��c����#&&f������srr������7l����0o���M�>z�h����?>s�LLL/�.p �0i�����K$OO�_��2q�����N�:B��Y3e�nO�@`nn��ys33��K�
ooony���{�������X�""""//�_�~3g�4665jTiii���{������
� j"���?���T���R)M��w��?~��A{��
|���TTTT}�!�����%���p j"j�} !Us� !�v���������oi�V���mY�
4��u����z�� �4` P�����y������������=zDGG+���( T����Eu���s���d�����t��q��Y�/_��KQ �Lzzz���������!���[�zupp���O��;���_��j����x��H%�"��+��K%C� �:���EEE������^�x������������V�Z)��/E��%$Uxr� ��|�rWW����m��-_����6++���-;;����T�e��W��N[U� �:,Z������$--������k�_�533�������X�
�a����~�)7�*,,��q�FGGG�@��g��=���?~��R��z��Z�r��%������trr�y��������������I,EY3,E �'d2YAA�D9����������( TC (v_B�H$���,E �;,E �� �����c�����"�-<��( To��=z���uknn.�n�I,E �z���111����(�T��������Sy7�B%/�������J�B �zQu�eY�D�����oQ|�j��T�Y��B �z'�_Yx��-���v)�f6�f6"���Z�g� ��/<��( �����@ �-<��(�% ���Ob)J �z����X� @sa)J P
�86H���%��di��d�bZ*a�j`LW�2BZZ"�&t�P�� �I$�%K����N�0����?��������J)((�;w���
Y��={�����?����a�������]��qL���g�(��{���%Y7��*��<�;2 ����$ `��%g��!�L�6---�O�>aaa��=#X��� �R�8��j���{�����^�Xa����I����8U�UC[Ay�L���"4������a�# ��H$�n���9���k��v��EDD888�X�">>���UmKQVd�Ud���E��D��5�W���������]{�
D���U����RB +�WB�������b���� �/���������;%]�`!d��U4Mw��U�KQ�l I�_����y��v�QL����)����w%\u�7��\���8�l)W]�LRcK����4��;-���% ��]�v}��7g����R�R�"� �c��^d�ik~=�����Z��+
���������h�T0�1J@�2y���P��e���`�z� ��:4m������h����-E���r������+��-Ey�����(;v��������kKQv��u��uM�6�~�����}}}[�l����5=j�u%2�'U6?3.�<Gf�,,I����9�+0�w6�+�cga���!5�16 �F�Z�r�������:u"��Y�f��)�������}����_�KQ���).E���Q�ehh���-�e^^^�~�f��ill<j��������c)����R���%�f�L\E%�*��EE)�j�!*L�W��c����/��;8 �%��K��������"��`_��� \U�8V�Qy����[\j��b��T#n! xX�T����$��NO�y�6�wWbd/,{.������t���1�N��g�z��m& �����(��5��X�FY�������'�\���@ �dV��b��0��c
��;��
��U @��k�:u_BH��
�Y����"��d�E/*���8�8}���������KTO/ @���;�������9������'�|)J4`-���S�v6z�����HV�EhLKK���"_�U#;aY�������_q �E||���SW�\�k�����z��
���C��5kV�v����_��=Wq)J��f1�o�����o��)������Y�
X��&6N��K����I)F�
��T"�W#JZ&�b+��v��^��13 h�����k�����4��O?B^_xRmKQJ%�d�T��"_��W%C������������~)���9�7��w�� �������ggg/^�x���/<���(5����vm��mv!!������s��x�~*,,���o�n���w_YxR,�m)J�Xe��*�{I�f��BcG���0�i�I��3%|' MWVV����]&/����<q�wG��[�pKQ?~�j)�;v����9���eBB��y��6m�������?~����3111��u��-(((00��P��'����w
�h���\�vvv,�������B���
�����e�����������M���m��<x0�x���"�����s��������Q�����suu������Kii��_r]���������������� �M^_xR����%hx����3� ����'��%�}���(��b)J�;�>�K�[p�Wr�S|g ��KQ��^��mf����w ������i��� ��T�� @�����l"�T�"&�Y��R�� h
4`P�&�^���pw�Y �0����=S�J.Faj4 @���A��68�-����� @�������kf���C�� h44`P=�����4�hz����; ��B�za�\<6���G����� ����������cF?K:��� �B��5�f��G�o���; �fA�z�c�KY�����= �?h��+���� ���AM���n2��t�S�� h4`P�A��>��b_�/�; ���A���Hht|�����= �0����0�L��a������ �7h�����=�=���y| �0�&p�K������� �h���n�\!�F`j4 �4`�Y�Om0��I �oT��w��<c����TBHAAADDD�=����<��O��#G���������[������nIZ%�Y �D5
8>>~���!!!7n����� !.��������Y�._����7t�P//�������t�M+�q9Mc#�����h �B��RZZ�v����0����'B��s�V�^����s��egg���FGG'%%�i��a�����������r?�$h���a�?z<.I�jj�w ��������r����--���.��������egg/^�8<<\"���y������U���/_�t������a�{��)>�}�J"�����a��/��
�^i^uja�9&�����[�n����[[���,77���l'''�[�nBJJJ�����hiiiaa��W_�* h���.��0;3`��Y �$(((((�{W���3������`ooo��3!�k�����kff��'/]�����{�n�tU��������w ����3�_~����s���{�Z[[?�<22288�������������y@@����P(����UrP�m��Z���{%�?��w ����3������\���999���&&&��u���e�����+������&��|g x����z\�C$)v_B����+���,��'U�� �~���, ����@�Q4�����G��� �2h������g��U�| P
4`������J�NM�; �
��6����U����| xWh��eZN�i3�vg�| x'h��}��7�w�c{���R�� �%4`�J^�Qw���&����� @+���Qc3��X����|�� �0h���?=Y�C�A �
���;\%����� ���A�����=��aO� P[h��#���j9�f��}I���, 5C���o<"���AOrn��� �h��S���!g�{�,���, o�:h��F���^��� ����A7�w.xTqea�A ��:���Nf���� �h������l>��t(nO ��:������-�Y��(
��A���2��P���w%|g �C�`d/����')'��� @0�� �)�Y��w 4`�3]�:�$��X��w �wh��w��F�����I �'4`�G���8u1>5��A @����k�|��V���,�Y @���r 2r�s���G|g ��z���`�m�#������I �Vh� $4����y���� @}��!��>�����i| }� �q��kw����� z
�?��-�������A @������I���>(���� t0��,�D��z��ML����, �����!0��f4��<����|g ����G���,���� ���&��p�(���4�� �k��j��{/z0 �0@�ZM�q�a��u"�A @w���G��\t~\��'�
(m��o�������C��?��H ���0�/��-sn��� ���<}��-[�������{����:3h,���8���e��{�� �Mi�i����BQTrr��h����>����9| -���3�[�n���O�.]���L ���7VJb#��I ����~��W���Dggguf�
���$��{vJz���� �Gi��~~>!�����gg�>�@}� �A�O���.�����k^|g -����7���/�Dr������]�vm��A�� �B����69�0��i%�Y @�(m�����v����<t�PVV��?�|��u&�v~F=~q��SR��b�� ��P�����!W�\���IJJj����h#{��'M��Ytg�� jE�g��W��9sfFF�����5k��������3�����).2����6sm�� �Nin����'�~���PG -��c���O�����, ���6�?����o�-//gY���m�������ZN�6q���z0 (��3�e��;v���g�����X PK�����o��'��|g
��{xx�d�&M���a����jn8�a�}�U�� 4��<|�����:t�����N�� t���`�m�#��ScK�� G�g���WAAA������j��SB��$����$k�� �Qz�����G����5��! ������_�"�� �A�6�)S���9���+,�<y�:3���u�Fv�S#�� 4�������{� ???���Z}j�������| ���3`oo�q��^�t�������h��Q��&����>���w ������jll�����qcB��g�4�� ��9���N�u��Q�Y4� x������3���h$�$i��SR�o���- zJ�g� P(5������+�; �
�7�O{d^*���s�� �6`�L�����������R�� �G�5N��^���<y��s��5n�����G�Vg&