parallel vacuum comments
Hi,
Due to bug #17245: [1]/messages/by-id/17245-ddf06aaf85735f36@postgresql.org I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:
- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.
- There should be error check verifying that all indexes have actually been
vacuumed. It's way too easy to have bugs leading to index vacuuming being
skipped.
- The state machine is complicated. It's very unobvious that an index needs to
be processed serially by the leader if shared_indstats == NULL.
- I'm very confused by the existance of LVShared->bitmap. Why is it worth
saving 7 bits per index for something like this (compared to a simple
array of bools)? Nor does the naming explain what it's for.
The presence of the bitmap requires stuff like SizeOfLVShared(), which
accounts for some of the bitmap size, but not all?
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.
Greetings,
Andres Freund
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.
Yes. We should have tests at least for such cases.
- There should be error check verifying that all indexes have actually been
vacuumed. It's way too easy to have bugs leading to index vacuuming being
skipped.
Agreed.
- The state machine is complicated. It's very unobvious that an index needs to
be processed serially by the leader if shared_indstats == NULL.
I think we can consolidate the logic that decides who (a worker or the
leader) processes the index in one function.
- I'm very confused by the existance of LVShared->bitmap. Why is it worth
saving 7 bits per index for something like this (compared to a simple
array of bools)? Nor does the naming explain what it's for.The presence of the bitmap requires stuff like SizeOfLVShared(), which
accounts for some of the bitmap size, but not all?
Yes, it's better to account for the size of all bitmaps.
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.
Right. But is better to change to use booleans?
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.
Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.
I listed the function names that probably needs to be renamed from
that perspecti:
* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_index
Is there any other function that should be renamed?
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.
I don't come with an idea to make them more generic. Could you
elaborate on that?
I've started to write a patch for these comments.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/e
On Mon, Nov 1, 2021 at 10:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.Yes. We should have tests at least for such cases.
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
regression_tests_for_parallel_vacuum.patchapplication/octet-stream; name=regression_tests_for_parallel_vacuum.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 05221cc1d6..1abfdf06b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,6 +153,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDVAC_CHECK 6 /* used only when USE_ASSERT_CHECKING */
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -193,6 +194,20 @@ typedef struct LVDeadTuples
#define MAXDEADTUPLES(max_size) \
(((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
+/*
+ * LVIndexCheck stores an array of booleans that indicates whether or not
+ * N'th index is vacuumed (or cleaned up) or not. This is used to verify that
+ * all indexes are processed by parallel index vacuum. Therefore, this
+ * is used only when parallel vacuum and USE_ASSERT_CHECKING is defined.
+ */
+typedef struct LVIndVacCheck
+{
+ int nindexes;
+ bool processed[FLEXIBLE_ARRAY_MEMBER];
+} LVIndVacCheck;
+#define SizeOfLVIndVacCheck(nindexes) \
+ add_size(offsetof(LVIndVacCheck, processed), mul_size(sizeof(bool), (nindexes)))
+
/*
* Shared information among parallel workers. So this is allocated in the DSM
* segment.
@@ -369,6 +384,12 @@ typedef struct LVRelState
* table */
int64 num_tuples; /* total number of nonremovable tuples */
int64 live_tuples; /* live tuples (reltuples estimate) */
+
+ /*
+ * Information used to verify the result of parallel vacuum. This is
+ * always NULL if USE_ASSERT_CHECKING is undefined.
+ */
+ LVIndVacCheck *indvac_check;
} LVRelState;
/*
@@ -468,6 +489,11 @@ static void update_vacuum_error_info(LVRelState *vacrel,
static void restore_vacuum_error_info(LVRelState *vacrel,
const LVSavedErrInfo *saved_vacrel);
+#ifdef USE_ASSERT_CHECKING
+static void lazy_clear_indvac_check_info(LVIndVacCheck *indvac_check);
+static void lazy_mark_index_processed(LVIndVacCheck *indvac_check, int idx);
+static void lazy_verify_indvac_result(LVIndVacCheck *indvac_check);
+#endif
/*
* heap_vacuum_rel() -- perform VACUUM for one heap relation
@@ -2764,6 +2790,11 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
lps->pcxt->nworkers_launched, nworkers)));
}
+#ifdef USE_ASSERT_CHECKING
+ /* Clear index vacuum check info before actual vacuum processing */
+ lazy_clear_indvac_check_info(vacrel->indvac_check);
+#endif
+
/* Process the indexes that can be processed by only leader process */
do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
@@ -2786,6 +2817,11 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+#ifdef USE_ASSERT_CHECKING
+ /* Check if all indexes have been processed */
+ lazy_verify_indvac_result(vacrel->indvac_check);
+#endif
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2847,6 +2883,10 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
lvshared,
shared_istat,
vacrel);
+#ifdef USE_ASSERT_CHECKING
+ /* Mark this index has been processed */
+ lazy_mark_index_processed(vacrel->indvac_check, idx);
+#endif
}
/*
@@ -2898,6 +2938,11 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
lvshared,
shared_istat,
vacrel);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Mark this index has been processed */
+ lazy_mark_index_processed(vacrel->indvac_check, idx);
+#endif
}
/*
@@ -3837,6 +3882,7 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
ParallelContext *pcxt;
LVShared *shared;
LVDeadTuples *dead_tuples;
+ LVIndVacCheck *indvac_check = NULL;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *can_parallel_vacuum;
@@ -3936,6 +3982,15 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
mul_size(sizeof(WalUsage), pcxt->nworkers));
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /*
+ * Estimate size for index check information --
+ * PARALLEL_VACUUM_KEY_INDVAC_CHECK.
+ */
+#ifdef USE_ASSERT_CHECKING
+ shm_toc_estimate_chunk(&pcxt->estimator, SizeOfLVIndVacCheck(nindexes));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+#endif
+
/* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
if (debug_query_string)
{
@@ -4013,6 +4068,16 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
+#ifdef USE_ASSERT_CHECKING
+ /* Allocate and initialize space for index vacuum check information */
+ indvac_check = shm_toc_allocate(pcxt->toc, SizeOfLVIndVacCheck(nindexes));
+ MemSet(indvac_check, 0, SizeOfLVIndVacCheck(nindexes));
+ indvac_check->nindexes = nindexes;
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDVAC_CHECK, indvac_check);
+#endif
+
+ vacrel->indvac_check = indvac_check;
+
pfree(can_parallel_vacuum);
return lps;
}
@@ -4140,6 +4205,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
+ LVIndVacCheck *indvac_check = NULL;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
int nindexes;
@@ -4201,6 +4267,10 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
if (lvshared->maintenance_work_mem_worker > 0)
maintenance_work_mem = lvshared->maintenance_work_mem_worker;
+#ifdef USE_ASSERT_CHECKING
+ indvac_check = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_INDVAC_CHECK, false);
+#endif
+
/*
* Initialize vacrel for use as error callback arg by parallel worker.
*/
@@ -4209,6 +4279,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vacrel.indname = NULL;
vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
vacrel.dead_tuples = dead_tuples;
+ vacrel.indvac_check = indvac_check;
/* Setup error traceback support for ereport() */
errcallback.callback = vacuum_error_callback;
@@ -4331,3 +4402,32 @@ restore_vacuum_error_info(LVRelState *vacrel,
vacrel->offnum = saved_vacrel->offnum;
vacrel->phase = saved_vacrel->phase;
}
+
+#ifdef USE_ASSERT_CHECKING
+
+/* Clear information used by lazy_verify_indvac_result() later */
+static void
+lazy_clear_indvac_check_info(LVIndVacCheck *indvac_check)
+{
+ MemSet(indvac_check->processed, false, sizeof(bool) * indvac_check->nindexes);
+}
+
+/* Mark the idx'th index has been processed */
+static void
+lazy_mark_index_processed(LVIndVacCheck *indvac_check, int idx)
+{
+ /* The index must be processed only one process */
+ Assert(!indvac_check->processed[idx]);
+ indvac_check->processed[idx] = true;
+}
+
+/* Check if all indexes have been processed */
+static void
+lazy_verify_indvac_result(LVIndVacCheck *indvac_check)
+{
+ Assert(indvac_check->nindexes > 0);
+ for (int i = 0; i < indvac_check->nindexes; i++)
+ Assert(indvac_check->processed[i]);
+}
+
+#endif /* USE_ASSERT_CHECKING */
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
new file mode 100644
index 0000000000..b2946a22eb
--- /dev/null
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -0,0 +1,38 @@
+--
+-- VACUUM_PARALLEL
+-- All parallel vacuum tests in this file check if any index is not
+-- vacuumed during parallel vacuum which causes an assertion failure.
+--
+SET max_parallel_maintenance_workers TO 4;
+SET min_parallel_index_scan_size TO '64kB';
+CREATE TABLE pvac_test1 (a int) WITH (autovacuum_enabled = off);
+INSERT INTO pvac_test1 SELECT generate_series(1, 100000);
+CREATE INDEX pvac_large_index_1 ON pvac_test1 (a);
+CREATE INDEX pvac_large_index_2 ON pvac_test1 (a);
+CREATE INDEX pvac_small_index_1 ON pvac_test1 (a) WHERE a < 10;
+CREATE INDEX pvac_small_index_2 ON pvac_test1 (a) WHERE a < 20;
+SELECT relname, pg_relation_size(oid) < pg_size_bytes(current_setting('min_parallel_index_scan_size')) as is_small FROM pg_class WHERE relname ~ 'pvac_' AND relkind = 'i' ORDER BY 1;
+ relname | is_small
+--------------------+----------
+ pvac_large_index_1 | f
+ pvac_large_index_2 | f
+ pvac_small_index_1 | t
+ pvac_small_index_2 | t
+(4 rows)
+
+DELETE FROM pvac_test1;
+-- Do parallel index vacuum.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+-- Do parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+CREATE TABLE pvac_test2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO pvac_test2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 100000) g;
+CREATE INDEX pvac_btree_idx on pvac_test2 USING btree (a);
+CREATE INDEX pvac_gin_idx on pvac_test2 USING gin (b);
+CREATE INDEX pvac_brin_idx on pvac_test2 USING brin (a);
+CREATE INDEX pvac_hash_idx on pvac_test2 USING hash (a);
+DELETE FROM pvac_test2;
+-- Do parallel index vacuum for different kinds of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+-- Do parallel index cleanup for different kinds of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 7be89178f0..017e962fed 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -96,6 +96,7 @@ test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8
# run by itself so it can run parallel workers
test: select_parallel
test: write_parallel
+test: vacuum_parallel
# no relation related tests can be put in this group
test: publication subscription
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
new file mode 100644
index 0000000000..78a8e3681f
--- /dev/null
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -0,0 +1,42 @@
+--
+-- VACUUM_PARALLEL
+-- All parallel vacuum tests in this file check if any index is not
+-- vacuumed during parallel vacuum which causes an assertion failure.
+--
+
+SET max_parallel_maintenance_workers TO 4;
+SET min_parallel_index_scan_size TO '64kB';
+
+CREATE TABLE pvac_test1 (a int) WITH (autovacuum_enabled = off);
+INSERT INTO pvac_test1 SELECT generate_series(1, 100000);
+
+CREATE INDEX pvac_large_index_1 ON pvac_test1 (a);
+CREATE INDEX pvac_large_index_2 ON pvac_test1 (a);
+CREATE INDEX pvac_small_index_1 ON pvac_test1 (a) WHERE a < 10;
+CREATE INDEX pvac_small_index_2 ON pvac_test1 (a) WHERE a < 20;
+
+SELECT relname, pg_relation_size(oid) < pg_size_bytes(current_setting('min_parallel_index_scan_size')) as is_small FROM pg_class WHERE relname ~ 'pvac_' AND relkind = 'i' ORDER BY 1;
+
+DELETE FROM pvac_test1;
+
+-- Do parallel index vacuum.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+
+-- Do parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+
+CREATE TABLE pvac_test2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO pvac_test2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 100000) g;
+
+CREATE INDEX pvac_btree_idx on pvac_test2 USING btree (a);
+CREATE INDEX pvac_gin_idx on pvac_test2 USING gin (b);
+CREATE INDEX pvac_brin_idx on pvac_test2 USING brin (a);
+CREATE INDEX pvac_hash_idx on pvac_test2 USING hash (a);
+
+DELETE FROM pvac_test2;
+
+-- Do parallel index vacuum for different kinds of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
+
+-- Do parallel index cleanup for different kinds of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) pvac_test1;
On Mon, Nov 1, 2021 at 5:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.
I started looking at this because I want to commit something like it
alongside a fix to bug #17245.
While I tend to favor relatively heavy assertions (e.g., the
heap_page_is_all_visible() related asserts I added to
lazy_scan_prune()), the idea of having a whole shared memory area just
for assertions seems a bit too much, even to me. I tried to simplify
it by just adding a new field to LVSharedIndStats, which seemed more
natural. It took me at least 15 minutes before I realized that I was
actually repeating exactly the same mistake that led to bug #17245 in
the first place. I somehow forgot that
parallel_stats_for_idx()/IndStatsIsNull() will return NULL for any
index that has already been deemed too small to be worth processing in
parallel. Even after all that drama!
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.
I think that this PARALLEL_VACUUM_STATS refactoring is actually the
simplest way to comprehensively test parallel VACUUM. I will still
need to add tests for my fix to bug #17245, but they won't be truly
general tests. I'll have to make sure that one of the assertions in
nbtdedup.c fails when the tests are run without the fix in place, or
something like that.
--
Peter Geoghegan
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Nov 1, 2021 at 5:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.I started looking at this because I want to commit something like it
alongside a fix to bug #17245.While I tend to favor relatively heavy assertions (e.g., the
heap_page_is_all_visible() related asserts I added to
lazy_scan_prune()), the idea of having a whole shared memory area just
for assertions seems a bit too much, even to me. I tried to simplify
it by just adding a new field to LVSharedIndStats, which seemed more
natural. It took me at least 15 minutes before I realized that I was
actually repeating exactly the same mistake that led to bug #17245 in
the first place. I somehow forgot that
parallel_stats_for_idx()/IndStatsIsNull() will return NULL for any
index that has already been deemed too small to be worth processing in
parallel. Even after all that drama!
The idea of that patch was for back branches in order to not affect
non-enable-cassert builds. That said, I agree that it's an overkill
solution.
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.
Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice. Rather, tracking bulkdelete and vacuumcleanup completion
might enable us to improve the vacuum progress reporting so that the
progress stats view shows how many indexes have been vacuumed (or
cleaned up).
Anyway, I'll write a patch accordingly.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Nov 1, 2021 at 7:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
But, we need this information in the parallel worker as well to know
whether to perform index vacuum or clean up, so I guess we need this
information in shared memory, no?
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?
Can we think of moving parallelism-related code to a different file
(say vacuumparallel.c)? At least that will reduce the footprint of
vacuumlazy.c.
--
With Regards,
Amit Kapila.
On Tue, Nov 2, 2021 at 2:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Anyway, I'll write a patch accordingly.
While writing a patch for these comments, I found another bug in
parallel_processing_is_safe():
/*
* Returns false, if the given index can't participate in parallel index
* vacuum or parallel index cleanup
*/
static bool
parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
{
uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
/* first_time must be true only if for_cleanup is true */
Assert(lvshared->for_cleanup || !lvshared->first_time);
if (lvshared->for_cleanup)
{
/* Skip, if the index does not support parallel cleanup */
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return true;
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.
I've attached a patch to fix it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
fix_parallel_processing_is_safe.patchapplication/octet-stream; name=fix_parallel_processing_is_safe.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bfb1ea0d25..716af22e5b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -4116,7 +4116,7 @@ parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
/* Skip, if the index does not support parallel cleanup */
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return true;
+ return false;
/*
* Skip, if the index supports parallel cleanup conditionally, but we
On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.I've attached a patch to fix it.
I pushed your fix just now.
Thanks
--
Peter Geoghegan
On Wed, Nov 3, 2021 at 11:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.I've attached a patch to fix it.
I pushed your fix just now.
Thanks!
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice.
If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.
--
With Regards,
Amit Kapila.
On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice.If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor.patchapplication/octet-stream; name=parallel_vacuum_refactor.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index bfb1ea0d25..16deb328bb 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -153,6 +153,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -206,14 +207,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -251,23 +244,18 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVIndVacStatus
+{
+ INDVAC_STATUS_INITIAL = 0,
+ INDVAC_STATUS_NEED_BULKDELETE,
+ INDVAC_STATUS_NEED_CLEANUP,
+ INDVAC_STATUS_COMPLETED,
+} LVIndVacStatus;
/*
* Struct for an index bulk-deletion statistic used for parallel vacuum. This
@@ -275,7 +263,15 @@ typedef struct LVShared
*/
typedef struct LVSharedIndStats
{
- bool updated; /* are the stats updated? */
+ LVIndVacStatus status;
+
+ /*
+ * True if both leader and worker can process the index, otherwise only
+ * leader can process it.
+ */
+ bool parallel_safe;
+
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
} LVSharedIndStats;
@@ -287,6 +283,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Shared index statistics among parallel vacuum workers */
+ LVSharedIndStats *lvsharedindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
@@ -416,18 +415,14 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
+static void parallel_lazy_vacuum_or_cleanup_all_indexes(LVRelState *vacrel, bool vacuum);
+static void prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum);
+static void lazy_serial_process_indexes(LVRelState *vacrel);
+static void lazy_parallel_process_indexes(LVRelState *vacrel, LVShared *lvshared,
+ LVSharedIndStats *indstats);
+static void lazy_parallel_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *lvshared,
+ LVSharedIndStats *stats);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -450,16 +445,14 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
+static int compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested);
static void update_index_statistics(LVRelState *vacrel);
static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
BlockNumber nblocks,
int nrequested);
static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+static bool parallel_processing_is_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2251,7 +2244,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_lazy_vacuum_or_cleanup_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2625,78 +2618,32 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
/*
* Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
+parallel_lazy_vacuum_or_cleanup_all_indexes(LVRelState *vacrel, bool vacuum)
{
LVParallelState *lps = vacrel->lps;
+ int nworkers;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(vacrel));
Assert(vacrel->nindexes > 0);
+ /* Determine the number of parallel workers to launch */
+ if (vacuum)
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
+
/* The leader process will participate */
nworkers--;
@@ -2707,17 +2654,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /* Set data required for parallel index vacuum or cleanup */
+ prepare_parallel_index_processing(vacrel, vacuum);
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize the parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2750,28 +2698,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ lazy_serial_process_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ lazy_parallel_process_indexes(vacrel, lps->lvshared, vacrel->lps->lvsharedindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2797,12 +2757,62 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
}
}
+
+/*
+ * This function prepares the shared data for parallel index vacuum or cleanup,
+ * and set index vacuum status accordingly.
+ */
+static void
+prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum)
+{
+ LVIndVacStatus next_status;
+
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
+
+ next_status = INDVAC_STATUS_NEED_BULKDELETE;
+ }
+ else
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
+
+ next_status = INDVAC_STATUS_NEED_CLEANUP;
+ }
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVSharedIndStats *stats = &(vacrel->lps->lvsharedindstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_INITIAL);
+
+ stats->status = next_status;
+ stats->parallel_safe = parallel_processing_is_safe(vacrel,
+ vacrel->indrels[i],
+ vacuum);
+ }
+}
+
/*
* Index vacuum/cleanup routine used by the leader process and parallel
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+lazy_parallel_process_indexes(LVRelState *vacrel, LVShared *lvshared,
+ LVSharedIndStats *indstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2810,13 +2820,10 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
if (VacuumActiveNWorkers)
pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
- /* Loop until all indexes are vacuumed */
for (;;)
{
- int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ int idx;
+ LVSharedIndStats *stats;
/* Get an index number to process */
idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
@@ -2825,28 +2832,17 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ stats = &(indstats[idx]);
/*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * Parallel unsafe indexes can be processed only by leader (these are
+ * processed in lazy_serial_process_indexes() by leader.
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!stats->parallel_safe)
continue;
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ lazy_parallel_process_one_index(vacrel, vacrel->indrels[idx],
+ lvshared, stats);
}
/*
@@ -2861,16 +2857,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
* Perform parallel processing of indexes in leader process.
*
* Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * parallel safe.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by compute_parallel_vacuum_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+lazy_serial_process_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2879,30 +2875,16 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
if (VacuumActiveNWorkers)
pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
- for (int idx = 0; idx < vacrel->nindexes; idx++)
+ for (int i = 0; i < vacrel->nindexes; i++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, safe indexes as they are processed by workers */
+ if (stats->parallel_safe)
continue;
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ lazy_parallel_process_one_index(vacrel, vacrel->indrels[i],
+ lps->lvshared, stats);
}
/*
@@ -2919,29 +2901,33 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+lazy_parallel_process_one_index(LVRelState *vacrel, Relation indrel, LVShared *lvshared,
+ LVSharedIndStats *stats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ /* Get the index statistics space, if already updated */
+ if (stats->istat_updated)
+ istat = &(stats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (stats->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ lvshared->reltuples, vacrel);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ lvshared->reltuples,
+ lvshared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d",
+ stats->status);
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2955,19 +2941,18 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!stats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
+ memcpy(&(stats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ stats->istat_updated = true;
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ stats->status = INDVAC_STATUS_COMPLETED;
}
/*
@@ -3002,7 +2987,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_lazy_vacuum_or_cleanup_all_indexes(vacrel, false);
}
}
@@ -3734,13 +3719,10 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * the number of indexes that support parallel vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
+compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
@@ -3766,8 +3748,6 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
continue;
- will_parallel_vacuum[idx] = true;
-
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
nindexes_parallel_bulkdel++;
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
@@ -3840,14 +3820,15 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
Relation *indrels = vacrel->indrels;
int nindexes = vacrel->nindexes;
ParallelContext *pcxt;
+ LVSharedIndStats *indstats;
LVShared *shared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
- bool *will_parallel_vacuum;
long maxtuples;
- Size est_shared;
- Size est_deadtuples;
+ Size est_indstats = 0;
+ Size est_shared = 0;
+ Size est_deadtuples = 0;
int nindexes_mwm = 0;
int parallel_workers = 0;
int querylen;
@@ -3862,17 +3843,11 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
/*
* Compute the number of parallel vacuum workers to launch
*/
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
- will_parallel_vacuum);
+ parallel_workers = compute_parallel_vacuum_workers(vacrel, nrequested);
/* Can't perform vacuum in parallel */
if (parallel_workers <= 0)
- {
- pfree(will_parallel_vacuum);
return lps;
- }
lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
@@ -3882,41 +3857,13 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared = add_size(est_shared, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats = mul_size(sizeof(LVSharedIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared = MAXALIGN(sizeof(LVShared));
shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3953,6 +3900,45 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ indstats = (LVSharedIndStats *) shm_toc_allocate(pcxt->toc, est_indstats);
+ MemSet(indstats, 0, est_indstats);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that don't participate in parallel vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+
+ shm_toc_insert(pcxt->toc,PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ lps->lvsharedindstats = indstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared);
MemSet(shared, 0, est_shared);
@@ -3966,21 +3952,6 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4018,7 +3989,6 @@ begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
- pfree(will_parallel_vacuum);
return lps;
}
@@ -4043,21 +4013,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[idx]);
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
+ if (stats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &stats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4071,68 +4032,42 @@ end_parallel_vacuum(LVRelState *vacrel)
vacrel->lps = NULL;
}
-/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
/*
* Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_processing_is_safe(LVRelState *vacrel, Relation indrel, bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cufoff.
+ */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return false;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return true;
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
return false;
- }
return true;
}
@@ -4148,6 +4083,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVSharedIndStats *lvindstats;
LVShared *lvshared;
LVDeadTuples *dead_tuples;
BufferUsage *buffer_usage;
@@ -4161,10 +4097,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4185,6 +4118,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvindstats = (LVSharedIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead tuple space */
dead_tuples = (LVDeadTuples *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_TUPLES,
@@ -4230,7 +4168,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ lazy_parallel_process_indexes(&vacrel, lvshared, lvindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
index ddf0ee544b..a07f5b2b73 100644
--- a/src/test/regress/expected/vacuum_parallel.out
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -45,5 +45,25 @@ VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+DELETE FROM parallel_vacuum_table2;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+DELETE FROM parallel_vacuum_table2;
+SET maintenance_work_mem TO 1024;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+RESET maintenance_work_mem;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
index 1d23f33e39..49f4f4ce6d 100644
--- a/src/test/regress/sql/vacuum_parallel.sql
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -42,5 +42,40 @@ INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+
+
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+
+DELETE FROM parallel_vacuum_table2;
+
+SET maintenance_work_mem TO 1024;
+
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+RESET maintenance_work_mem;
+
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?
To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.
Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),
don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.
Greetings,
Andres Freund
On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.
This looks great!
I wonder if this is okay, though:
/* Process the indexes that can be processed by only leader process */ - do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared); + lazy_serial_process_indexes(vacrel);/* - * Join as a parallel worker. The leader process alone processes all the - * indexes in the case where no workers are launched. + * Join as a parallel worker. The leader process alone processes all + * parallel-safe indexes in the case where no workers are launched. */ - do_parallel_processing(vacrel, lps->lvshared); + lazy_parallel_process_indexes(vacrel, lps->lvshared, vacrel->lps->lvsharedindstats);/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
One small thing: the new "LVSharedIndStats.parallel_safe" field seems
to be slightly misnamed. Isn't it more like
"LVSharedIndStats.parallel_workers_can_process"? The index might
actually be parallel safe *in principle*, while nevertheless being
deliberately skipped over by workers due to a deliberate up-front
choice made earlier, in compute_parallel_vacuum_workers(). Most
obvious example of this is the choice to not use parallelism for a
smaller index (say a partial index whose size is <
min_parallel_index_scan_size).
Another small thing, which is closely related to the last one:
typedef struct LVSharedIndStats { - bool updated; /* are the stats updated? */ + LVIndVacStatus status; + + /* + * True if both leader and worker can process the index, otherwise only + * leader can process it. + */ + bool parallel_safe; + + bool istat_updated; /* are the stats updated? */ IndexBulkDeleteResult istat; } LVSharedIndStats;
It would be nice to point out that the new
"LVSharedIndStats.parallel_safe" field (or whatever we end up calling
it) had comments that pointed out that it isn't a fixed thing for the
entire VACUUM operation -- it's only fixed for an individual call to
parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
entire VACUUM).
Alternatively, you could just have two booleans, I think. You know,
one for the "ambulkdelete portion", another for the "amvacuumcleanup
portion". As I've said before, it would be nice if we only called
parallel_vacuum_main() once per VACUUM operation (and not once per
"portion"), but that's a harder and more invasive change. But I don't
think you necessarily have to go that far for it to make sense to have
two bools. Having two might allow you to make both of them immutable,
which is useful.
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.
Maybe that's okay. Do you notice that it takes a lot longer now? I did
try to keep the runtime down when I committed the fixup to the
parallel VACUUM related bug.
--
Peter Geoghegan
On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
I think that I might see a related problem. But I'm not sure, so I'll just ask:
+ /* Set data required for parallel index vacuum or cleanup */ + prepare_parallel_index_processing(vacrel, vacuum); + + /* Reset the parallel index processing counter */ + pg_atomic_write_u32(&(lps->lvshared->idx), 0); + /* Setup the shared cost-based vacuum delay and launch workers */ if (nworkers > 0) { + /* Reinitialize the parallel context to relaunch parallel workers */ if (vacrel->num_index_scans > 0) - { - /* Reset the parallel index processing counter */ - pg_atomic_write_u32(&(lps->lvshared->idx), 0); - - /* Reinitialize the parallel context to relaunch parallel workers */ ReinitializeParallelDSM(lps->pcxt); - }
Is it okay that we don't call ReinitializeParallelDSM() just because
"nworkers == 0" this time around? I notice that there is a wait for
"nworkers_launched" workers to finish parallel processing, at the top
of ReinitializeParallelDSM(). I can see why the
"vacrel->num_index_scans > 0" test is okay, but I can't see why the
"nworkers == 0" test is okay.
I just want to be sure that we're not somehow relying on seeing state
in shared memory (in the LVSharedIndStats array) in all cases, but
finding that it is not actually there in certain rare edge cases.
Maybe this didn't matter before, because the leader didn't expect to
find this information in shared memory in any case. But that is
changed by your patch, of course, so it's something to be concerned
about.
--
Peter Geoghegan
On Thur, Nov 4, 2021 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada<sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS --
a dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space
in an ad-hoc, type unsafe way). There should be one array element
for each and every index -- even those indexes where parallel
index vacuuming is unsafe or not worthwhile (unsure if avoiding
parallel processing for "not worthwhile" indexes actually makes
sense, BTW). We can then get rid of the bitmap/IndStatsIsNull()
stuff entirely. We'd also add new per-index state fields to
LVSharedIndStats itself. We could directly record the status of
each index (e.g., parallel unsafe, amvacuumcleanup processing
done, ambulkdelete processing done) explicitly. All code could
safely subscript the LVSharedIndStats array directly, using idx
style integers. That seems far more robust and consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate
index statistics on DSM for unsafe indexes and “not worthwhile"
indexes in practice.If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.I've attached a draft patch. The patch incorporated all comments from Andres
except for the last comment that moves parallel related code to another file.
I'd like to discuss how we split vacuumlazy.c.
Hi,
I was recently reading the parallel vacuum code, and I think the patch can
bring a certain improvement.
Here are a few minor comments about it.
1)
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
Do you think it might be clearer to report an error here ?
2)
+prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum)
For the second paramater 'vacuum'. Would it be clearer if we pass a
LVIndVacStatus type instead of the boolean value ?
Best regards,
Hou zj
On Fri, Nov 5, 2021 at 4:42 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.This looks great!
I wonder if this is okay, though:
/* Process the indexes that can be processed by only leader process */ - do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared); + lazy_serial_process_indexes(vacrel);/* - * Join as a parallel worker. The leader process alone processes all the - * indexes in the case where no workers are launched. + * Join as a parallel worker. The leader process alone processes all + * parallel-safe indexes in the case where no workers are launched. */ - do_parallel_processing(vacrel, lps->lvshared); + lazy_parallel_process_indexes(vacrel, lps->lvshared, vacrel->lps->lvsharedindstats);/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
lazy_serial_process_indexes() handles only parallel-unsafe (i.g.,
non-parallel-supported or too small indexes) indexes whereas
lazy_parallel_process_indexes() does that only parallel-safe indexes.
Therefore in the edge case, the leader process all indexes by using
both functions.
One small thing: the new "LVSharedIndStats.parallel_safe" field seems
to be slightly misnamed. Isn't it more like
"LVSharedIndStats.parallel_workers_can_process"? The index might
actually be parallel safe *in principle*, while nevertheless being
deliberately skipped over by workers due to a deliberate up-front
choice made earlier, in compute_parallel_vacuum_workers(). Most
obvious example of this is the choice to not use parallelism for a
smaller index (say a partial index whose size is <
min_parallel_index_scan_size).
Agreed.
Another small thing, which is closely related to the last one:
typedef struct LVSharedIndStats { - bool updated; /* are the stats updated? */ + LVIndVacStatus status; + + /* + * True if both leader and worker can process the index, otherwise only + * leader can process it. + */ + bool parallel_safe; + + bool istat_updated; /* are the stats updated? */ IndexBulkDeleteResult istat; } LVSharedIndStats;It would be nice to point out that the new
"LVSharedIndStats.parallel_safe" field (or whatever we end up calling
it) had comments that pointed out that it isn't a fixed thing for the
entire VACUUM operation -- it's only fixed for an individual call to
parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
entire VACUUM).
Agreed.
Alternatively, you could just have two booleans, I think. You know,
one for the "ambulkdelete portion", another for the "amvacuumcleanup
portion". As I've said before, it would be nice if we only called
parallel_vacuum_main() once per VACUUM operation (and not once per
"portion"), but that's a harder and more invasive change. But I don't
think you necessarily have to go that far for it to make sense to have
two bools. Having two might allow you to make both of them immutable,
which is useful.
If we want to make booleans immutable, we need three booleans since
parallel index cleanup behaves differently depending on whether
bulk-deletion has been called once. Anyway, if I understand your
suggestion correctly, it probably means to have booleans corresponding
to VACUUM_OPTION_PARALLEL_XXX flags. Does the worker itself need to
decide whether to skip conditionally-parallel-index-cleanup-safe
indexes?
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.Maybe that's okay. Do you notice that it takes a lot longer now? I did
try to keep the runtime down when I committed the fixup to the
parallel VACUUM related bug.
It took 6s on my laptop (was 400ms).
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Nov 5, 2021 at 6:25 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)I think that I might see a related problem. But I'm not sure, so I'll just ask:
+ /* Set data required for parallel index vacuum or cleanup */ + prepare_parallel_index_processing(vacrel, vacuum); + + /* Reset the parallel index processing counter */ + pg_atomic_write_u32(&(lps->lvshared->idx), 0); + /* Setup the shared cost-based vacuum delay and launch workers */ if (nworkers > 0) { + /* Reinitialize the parallel context to relaunch parallel workers */ if (vacrel->num_index_scans > 0) - { - /* Reset the parallel index processing counter */ - pg_atomic_write_u32(&(lps->lvshared->idx), 0); - - /* Reinitialize the parallel context to relaunch parallel workers */ ReinitializeParallelDSM(lps->pcxt); - }Is it okay that we don't call ReinitializeParallelDSM() just because
"nworkers == 0" this time around? I notice that there is a wait for
"nworkers_launched" workers to finish parallel processing, at the top
of ReinitializeParallelDSM(). I can see why the
"vacrel->num_index_scans > 0" test is okay, but I can't see why the
"nworkers == 0" test is okay.I just want to be sure that we're not somehow relying on seeing state
in shared memory (in the LVSharedIndStats array) in all cases, but
finding that it is not actually there in certain rare edge cases.
Maybe this didn't matter before, because the leader didn't expect to
find this information in shared memory in any case. But that is
changed by your patch, of course, so it's something to be concerned
about.
If we launch workers (i.g., nworkers > 0), we wait for these workers
to finish after processing all indexes (see we call
WaitForParallelWorkersToFinish() after lazy_parallel_process_indexes).
So it's guaranteed that all workers finished at the end
ofparallel_lazy_vacuum_or_cleanup_all_indexes(). So even in the
second call to this function, we don't need to wait for
"nworkers_launched" workers who previously were running to finish.
Does it make sense?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.
Thanks for your explanation. Understood.
I'll update the patch accordingly.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Nov 9, 2021 at 9:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.Thanks for your explanation. Understood.
I'll update the patch accordingly.
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.
What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum. Feedback is very
welcome.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor_v2.patchapplication/octet-stream; name=parallel_vacuum_refactor_v2.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 558cc88a08..8cde368a10 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -24,18 +24,9 @@
*
* Lazy vacuum supports parallel execution with parallel worker processes. In
* a parallel vacuum, we perform both index vacuum and index cleanup with
- * parallel worker processes. Individual indexes are processed by one vacuum
- * process. At the beginning of a lazy vacuum (at lazy_scan_heap) we prepare
- * the parallel context and initialize the DSM segment that contains shared
- * information as well as the memory space for storing dead tuples. When
- * starting either index vacuum or index cleanup, we launch parallel worker
- * processes. Once all indexes are processed the parallel worker processes
- * exit. After that, the leader process re-initializes the parallel context
- * so that it can use the same DSM for multiple passes of index vacuum and
- * for performing index cleanup. For updating the index statistics, we need
- * to update the system table and since updates are not allowed during
- * parallel mode we update the index statistics after exiting from the
- * parallel mode.
+ * parallel worker processes. For updating the index statistics, we need to
+ * update the system table and since updates are not allowed during parallel
+ * mode we update the index statistics after exiting from the parallel mode.
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -143,22 +134,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvc != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -171,137 +151,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
- * This is allocated in the DSM segment in parallel mode and in local memory
- * in non-parallel mode.
- */
-typedef struct LVDeadTuples
-{
- int max_tuples; /* # slots allocated in array */
- int num_tuples; /* current # of entries */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
- * ItemPointerData */
-} LVDeadTuples;
-
-/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
-#define SizeOfDeadTuples(cnt) \
- add_size(offsetof(LVDeadTuples, itemptrs), \
- mul_size(sizeof(ItemPointerData), cnt))
-#define MAXDEADTUPLES(max_size) \
- (((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
-} LVShared;
-
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
-
-/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
- */
-typedef struct LVSharedIndStats
-{
- bool updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVSharedIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -321,7 +170,7 @@ typedef struct LVRelState
/* Buffer access strategy and parallel state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumContext *pvc;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -345,7 +194,7 @@ typedef struct LVRelState
/*
* State managed by lazy_scan_heap() follows
*/
- LVDeadTuples *dead_tuples; /* items to vacuum from indexes */
+ VacDeadTuples *dead_tuples; /* items to vacuum from indexes */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -416,18 +265,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -446,20 +283,9 @@ static long compute_max_dead_tuples(BlockNumber relblocks, bool hasindex);
static void lazy_space_alloc(LVRelState *vacrel, int nworkers,
BlockNumber relblocks);
static void lazy_space_free(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
- BlockNumber nblocks,
- int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -905,7 +731,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2039,7 +1865,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2251,7 +2077,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ perform_parallel_index_bulkdel(vacrel->pvc, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2404,7 +2230,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int tupindex, Buffer *vmbuffer)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2625,351 +2451,6 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
- ReinitializeParallelDSM(lps->pcxt);
- }
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (lps->lvshared->for_cleanup)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
-
- /*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
- */
- do_parallel_processing(vacrel, lps->lvshared);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
-
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
- */
- if (!parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
- */
-static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
-{
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
-
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
-{
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
-
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
- {
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
- }
-
- return istat_res;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
@@ -3002,7 +2483,8 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ perform_parallel_index_cleanup(vacrel->pvc, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages));
}
}
@@ -3048,13 +2530,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_tuples);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_tuples->num_tuples),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vacuum_one_index(&ivinfo, istat, vacrel->dead_tuples);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3088,7 +2564,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3104,24 +2579,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3479,9 +2937,11 @@ compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
static void
lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
long maxtuples;
+ maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3505,16 +2965,29 @@ lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
vacrel->relname)));
}
else
- vacrel->lps = begin_parallel_vacuum(vacrel, nblocks, nworkers);
+ {
+ ParallelVacuumCtl ctl;
+
+ /* Create parallel vacuum context */
+ ctl.rel = vacrel->rel;
+ ctl.indrels = vacrel->indrels;
+ ctl.nindexes = vacrel->nindexes;
+ ctl.nrequested_workers = nworkers;
+ ctl.maxtuples = maxtuples;
+ ctl.elevel = elevel;
+ ctl.bstrategy = vacrel->bstrategy;
+ vacrel->pvc = begin_parallel_vacuum(&ctl);
+ }
/* If parallel mode started, we're done */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_tuples = get_vacuum_dead_tuples(vacrel->pvc);
return;
+ }
}
- maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
-
- dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
+ dead_tuples = (VacDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
dead_tuples->num_tuples = 0;
dead_tuples->max_tuples = (int) maxtuples;
@@ -3534,75 +3007,8 @@ lazy_space_free(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_tuples array is in sorted order.
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_tuples->itemptrs[0]);
- ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead tuples on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_tuples->itemptrs,
- dead_tuples->num_tuples,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ end_parallel_vacuum(vacrel->pvc, vacrel->indstats);
+ vacrel->pvc = NULL;
}
/*
@@ -3725,76 +3131,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3827,426 +3163,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * This function prepares and returns parallel vacuum state if we can launch
- * even one worker. This function is responsible for entering parallel mode,
- * create a parallel context, and then initialize the DSM segment.
- */
-static LVParallelState *
-begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
- int nrequested)
-{
- LVParallelState *lps = NULL;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- long maxtuples;
- Size est_shared;
- Size est_deadtuples;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
- will_parallel_vacuum);
-
- /* Can't perform vacuum in parallel */
- if (parallel_workers <= 0)
- {
- pfree(will_parallel_vacuum);
- return lps;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared = add_size(est_shared, sizeof(LVSharedIndStats));
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
- maxtuples = compute_max_dead_tuples(nblocks, true);
- est_deadtuples = MAXALIGN(SizeOfDeadTuples(maxtuples));
- shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared);
- MemSet(shared, 0, est_shared);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
- dead_tuples->max_tuples = maxtuples;
- dead_tuples->num_tuples = 0;
- MemSet(dead_tuples->itemptrs, 0, sizeof(ItemPointerData) * maxtuples);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
- vacrel->dead_tuples = dead_tuples;
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- pfree(will_parallel_vacuum);
- return lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-end_parallel_vacuum(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
-
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
-/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
- */
-static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
-
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
- return false;
- }
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVShared *lvshared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_TUPLES,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_tuples = dead_tuples;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..6b427772d5 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..2fcb576540 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -90,6 +92,9 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
+
/*
* Primary entry point for manual VACUUM and ANALYZE commands
*
@@ -2258,3 +2263,140 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * lazy_vacuum_one_index() -- vacuum index relation.
+ *
+ * Delete all the index entries pointing to tuples listed in
+ * dead_tuples, and update running statistics.
+ *
+ * reltuples is the number of heap tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_tuples);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_tuples->num_tuples),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * reltuples is the number of heap tuples and estimated_count is true
+ * if reltuples is an estimated value.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * lazy_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_tuples array is in sorted order.
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadTuples *dead_tuples = (VacDeadTuples *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_tuples->itemptrs[0]);
+ ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead tuples on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_tuples->itemptrs,
+ dead_tuples->num_tuples,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..165fa28be4
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1042 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumContext.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumContext contains shared information as well
+ * as the memory space for storing dead tuples allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumContext is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the lazy vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double num_table_tuples;
+ bool estimated_count;
+
+ /*
+ * In single process lazy vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process lazy vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ INDVAC_STATUS_INITIAL = 0,
+ INDVAC_STATUS_NEED_BULKDELETE,
+ INDVAC_STATUS_NEED_CLEANUP,
+ INDVAC_STATUS_COMPLETED,
+} PVIndVacStatus;
+
+/*
+ * Struct for an index bulk-deletion statistic used for parallel vacuum. This
+ * is allocated in the DSM segment.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it. This value
+ * is not a fixed for the entire VACUUM operation. It is only fixed for
+ * an individual parallel index vacuum and cleanup.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for a parallel index vacuum. */
+typedef struct PVState
+{
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} PVState;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumContext
+{
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /* Shared index statistics among parallel vacuum workers */
+ PVIndStats *indstats;
+
+ /* Shared dead tuple space among parallel vacuum workers */
+ VacDeadTuples *dead_tuples;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* Incremented by each bulkdel or cleanup */
+ int num_index_scans;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+} ParallelVacuumContext;
+
+static int compute_parallel_vacuum_workers(Relation *indrels, int nindexes,
+ int nrequested);
+static void set_parallel_vacuum_index_status(ParallelVacuumContext *pvc,
+ bool bulkdel);
+static void parallel_vacuum_all_indexes(ParallelVacuumContext *pvc, bool bulkdel);
+static void parallel_vacuum_indexes(PVState *pvstate);
+static void serial_vacuum_unsafe_indexes(PVState *pvstate);
+static void parallel_vacuum_one_index(PVState *pvstate, Relation indrel,
+ PVIndStats *stats);
+static bool index_parallel_vacuum_is_safe(Relation indrel, int num_index_scans,
+ bool bulkdel);
+static void parallel_index_vacuum_error_callback(void *arg);
+
+/*
+ * This function prepares and returns parallel vacuum context if we can launch
+ * even one worker. This function is responsible for entering parallel mode,
+ * create a parallel context, and then initialize the DSM segment.
+ */
+ParallelVacuumContext *
+begin_parallel_vacuum(ParallelVacuumCtl *pvctl)
+{
+ ParallelVacuumContext *pvc = NULL;
+ ParallelContext *pcxt;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ Size est_indstats = 0;
+ Size est_shared = 0;
+ Size est_deadtuples = 0;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(pvctl);
+ Assert(pvctl->nrequested_workers >= 0);
+ Assert(pvctl->nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ parallel_workers = compute_parallel_vacuum_workers(pvctl->indrels,
+ pvctl->nindexes,
+ pvctl->nrequested_workers);
+
+ /* Can't perform vacuum in parallel */
+ if (parallel_workers <= 0)
+ return pvc;
+
+ pvc = (ParallelVacuumContext *) palloc0(sizeof(ParallelVacuumContext));
+ pvc->indrels = pvctl->indrels;
+ pvc->nindexes = pvctl->nindexes;
+ pvc->bstrategy = pvctl->bstrategy;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvc->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats = mul_size(sizeof(PVIndStats), pvctl->nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ est_deadtuples = MAXALIGN(SizeOfDeadTuples(pvctl->maxtuples));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats);
+ MemSet(indstats, 0, est_indstats);
+ for (int i = 0; i < pvctl->nindexes; i++)
+ {
+ Relation indrel = pvctl->indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that don't participate in parallel vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvc->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvc->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvc->nindexes_parallel_condcleanup++;
+ }
+
+ shm_toc_insert(pcxt->toc,PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvc->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared);
+ MemSet(shared, 0, est_shared);
+ shared->relid = RelationGetRelid(pvctl->rel);
+ shared->elevel = pvctl->elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvc->shared = shared;
+
+ /* Prepare the dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
+ dead_tuples->max_tuples = pvctl->maxtuples;
+ dead_tuples->num_tuples = 0;
+ MemSet(dead_tuples->itemptrs, 0,
+ sizeof(ItemPointerData) * pvctl->maxtuples);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
+ pvc->dead_tuples = dead_tuples;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvc->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvc->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ return pvc;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy updated index
+ * statistics from DSM into local memory so that the caller uses that to
+ * update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+end_parallel_vacuum(ParallelVacuumContext *pvc, IndexBulkDeleteResult **indstats)
+{
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvc->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvc->indstats[i]);
+
+ if (stats->istat_updated)
+ {
+ indstats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[i], &stats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ indstats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvc->pcxt);
+ ExitParallelMode();
+
+ pfree(pvc);
+}
+
+/* Returns the dead tuple space */
+VacDeadTuples *
+get_vacuum_dead_tuples(ParallelVacuumContext *pvc)
+{
+ return pvc->dead_tuples;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+perform_parallel_index_bulkdel(ParallelVacuumContext *pvc, long num_table_tuples)
+{
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ pvc->shared->num_table_tuples = num_table_tuples;
+ pvc->shared->estimated_count = true;
+
+ parallel_vacuum_all_indexes(pvc, true);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ */
+void
+perform_parallel_index_cleanup(ParallelVacuumContext *pvc, long num_table_tuples,
+ bool estimated_count)
+{
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ pvc->shared->num_table_tuples = num_table_tuples;
+ pvc->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvc, false);
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ PVState pvstate;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_TUPLES,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Parallel vacuum state and the error callback arg */
+ pvstate.indrels = indrels;
+ pvstate.nindexes = nindexes;
+ pvstate.indstats = indstats;
+ pvstate.shared = shared;
+ pvstate.dead_tuples = dead_tuples;
+ pvstate.bstrategy = GetAccessStrategy(BAS_VACUUM);
+ pvstate.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvstate.relname = pstrdup(RelationGetRelationName(rel));
+ pvstate.indname = NULL; /* filled later during index vacuuming */
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_index_vacuum_error_callback;
+ errcallback.arg = &pvstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform bulk-deletion/cleanup */
+ parallel_vacuum_indexes(&pvstate);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvstate.bstrategy);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum.
+ */
+static int
+compute_parallel_vacuum_workers(Relation *indrels, int nindexes, int nrequested)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+static void
+set_parallel_vacuum_index_status(ParallelVacuumContext *pvc, bool bulkdel)
+{
+ PVIndVacStatus new_status = bulkdel
+ ? INDVAC_STATUS_NEED_BULKDELETE
+ : INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvc->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvc->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_INITIAL);
+
+ stats->status = new_status;
+ stats->parallel_workers_can_process =
+ index_parallel_vacuum_is_safe(pvc->indrels[i],
+ pvc->num_index_scans,
+ bulkdel);
+ }
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumContext *pvc, bool bulkdel)
+{
+ int nworkers;
+ PVState pvstate;
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvc->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvc->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (pvc->num_index_scans == 0)
+ nworkers += pvc->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ */
+ nworkers = Min(nworkers, pvc->pcxt->nworkers);
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvc->shared->idx), 0);
+
+ set_parallel_vacuum_index_status(pvc, bulkdel);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize the parallel context to relaunch parallel workers */
+ if (pvc->num_index_scans > 0)
+ ReinitializeParallelDSM(pvc->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvc->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvc->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvc->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvc->pcxt);
+
+ if (pvc->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvc->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvc->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvc->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvc->pcxt->nworkers_launched),
+ pvc->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvc->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvc->pcxt->nworkers_launched),
+ pvc->pcxt->nworkers_launched, nworkers)));
+ }
+
+ pvstate.indrels = pvc->indrels;
+ pvstate.nindexes = pvc->nindexes;
+ pvstate.indstats = pvc->indstats;
+ pvstate.shared = pvc->shared;
+ pvstate.dead_tuples = pvc->dead_tuples;
+ pvstate.bstrategy = pvc->bstrategy;
+
+ /* Process the indexes that can be processed by only leader process */
+ serial_vacuum_unsafe_indexes(&pvstate);
+
+ /*
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_indexes(&pvstate);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvc->pcxt);
+
+ for (int i = 0; i < pvc->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvc->buffer_usage[i], &pvc->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < pvc->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvc->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ /* Increment the count */
+ pvc->num_index_scans++;
+}
+
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to process the indexes in parallel.
+ */
+static void
+parallel_vacuum_indexes(PVState *pvstate)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (;;)
+ {
+ int idx;
+ PVIndStats *stats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvstate->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvstate->nindexes)
+ break;
+
+ stats = &(pvstate->indstats[idx]);
+
+ /*
+ * Parallel unsafe indexes can be processed only by leader (these are
+ * processed in lazy_serial_process_indexes() by leader.
+ */
+ if (!stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvstate, pvstate->indrels[idx], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel processing of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe.
+ *
+ * Also performs processing of smaller indexes that fell under the size cutoff
+ * enforced by compute_parallel_vacuum_workers().
+ */
+static void
+serial_vacuum_unsafe_indexes(PVState *pvstate)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvstate->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvstate->indstats[i]);
+
+ /* Skip, safe indexes as they are processed by workers */
+ if (stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvstate, pvstate->indrels[i], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Bulk-delete or cleanup index either by leader process or by one of the worker
+ * process. After processing the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(PVState *pvstate, Relation indrel, PVIndStats *stats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /* Get the index statistics space, if already updated */
+ if (stats->istat_updated)
+ istat = &(stats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvstate->shared->elevel;
+ ivinfo.estimated_count = pvstate->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvstate->shared->num_table_tuples;
+ ivinfo.strategy = pvstate->bstrategy;
+
+ /* Update error traceback information */
+ pvstate->indname = pstrdup(RelationGetRelationName(indrel));
+ pvstate->status = stats->status;
+
+ switch (stats->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vacuum_one_index(&ivinfo, istat, pvstate->dead_tuples);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel index vacuum status %d",
+ stats->status);
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!stats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(stats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ stats->istat_updated = true;
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ stats->status = INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pfree(pvstate->indname);
+ pvstate->indname = NULL;
+ pvstate->status = INDVAC_STATUS_COMPLETED;
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup.
+ */
+static bool
+index_parallel_vacuum_is_safe(Relation indrel, int num_index_scans,
+ bool bulkdel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return false;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_index_vacuum_error_callback(void *arg)
+{
+ PVState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while parallelly vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while parallelly cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case INDVAC_STATUS_INITIAL:
+ case INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e63b49fc38..4a8022fee7 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..3dc9055715 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,28 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Actual bitmap representation is private to vacuumparallel.c */
+typedef struct ParallelVacuumContext ParallelVacuumContext;
+
+/* Parameter data structure for BeginParallelVacuum */
+typedef struct ParallelVacuumCtl
+{
+ /* Table and its indexes */
+ Relation rel;
+ Relation *indrels;
+ int nindexes;
+
+ /* The number of workers requested to launch */
+ int nrequested_workers;
+
+ /* The maximum dead tuples to store */
+ long maxtuples;
+
+ /* Log level and the buffer access strategy */
+ int elevel;
+ BufferAccessStrategy bstrategy;
+} ParallelVacuumCtl;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +254,28 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadTuples stores the dead tuple TIDs collected during the heap scan.
+ * This is allocated in the DSM segment in parallel mode and in local memory
+ * in non-parallel mode.
+ */
+typedef struct VacDeadTuples
+{
+ int max_tuples; /* # slots allocated in array */
+ int num_tuples; /* current # of entries */
+ /* List of TIDs of tuples we intend to delete */
+ /* NB: this list is ordered by TID address */
+ ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
+ * ItemPointerData */
+} VacDeadTuples;
+
+/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
+#define SizeOfDeadTuples(cnt) \
+ add_size(offsetof(VacDeadTuples, itemptrs), \
+ mul_size(sizeof(ItemPointerData), cnt))
+#define MAXDEADTUPLES(max_size) \
+ (((max_size) - offsetof(VacDeadTuples, itemptrs)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +328,23 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vacuum_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumContext *begin_parallel_vacuum(ParallelVacuumCtl *pvctl);
+extern void end_parallel_vacuum(ParallelVacuumContext *pvc,
+ IndexBulkDeleteResult **indstats);
+extern VacDeadTuples *get_vacuum_dead_tuples(ParallelVacuumContext *pvc);
+extern void perform_parallel_index_bulkdel(ParallelVacuumContext *pvc,
+ long num_table_tuples);
+extern void perform_parallel_index_cleanup(ParallelVacuumContext *pvc,
+ long num_table_tuples,
+ bool estimated_count);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
index ddf0ee544b..a7d8a801e0 100644
--- a/src/test/regress/expected/vacuum_parallel.out
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -45,5 +45,29 @@ VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+DELETE FROM parallel_vacuum_table2;
+SET maintenance_work_mem TO 1024;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+RESET maintenance_work_mem;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
index 1d23f33e39..49f4f4ce6d 100644
--- a/src/test/regress/sql/vacuum_parallel.sql
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -42,5 +42,40 @@ INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+
+
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+
+DELETE FROM parallel_vacuum_table2;
+
+SET maintenance_work_mem TO 1024;
+
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+RESET maintenance_work_mem;
+
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
On Thu, Nov 11, 2021 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum.
I think it might be better to attempt doing anything to make it
generic for tableAM in a separate patch if that is required.
Few questions/comments:
=====================
1. There are three different structures PVState,
ParallelVacuumContext, and ParallelVacuumCtl to maintain the parallel
vacuum state. Can't we merge PVState and ParallelVacuumCtl? Also, I
see that most of the fields of PVState are there in
ParallelVacuumContext except for error info fields, does it makes
sense to directly use PVState instead? Also, it would be better to
write some more comments atop each structure to explain its usage.
2. In vacuum.c, the function names doesn't match the names in their
corresponding function header comments.
3.
+ INDVAC_STATUS_COMPLETED,
+} PVIndVacStatus;
The comma is not required after the last value of enum.
--
With Regards,
Amit Kapila.
On Thu, Nov 11, 2021 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 11, 2021 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum.I think it might be better to attempt doing anything to make it
generic for tableAM in a separate patch if that is required.
You mean to refactor relation_vacuum table AM API too? Currently,
relation_vacuum API is responsible for whole vacuum operation and
there is no room for the core doing anything during vacuum. So
probably it doesn’t make sense to have a table AM API for parallel
vacuum.
Few questions/comments:
=====================
1. There are three different structures PVState,
ParallelVacuumContext, and ParallelVacuumCtl to maintain the parallel
vacuum state. Can't we merge PVState and ParallelVacuumCtl? Also, I
see that most of the fields of PVState are there in
ParallelVacuumContext except for error info fields, does it makes
sense to directly use PVState instead?
Agreed.
Also, it would be better to
write some more comments atop each structure to explain its usage.
Agreed.
2. In vacuum.c, the function names doesn't match the names in their
corresponding function header comments.
Will fix.
3. + INDVAC_STATUS_COMPLETED, +} PVIndVacStatus;The comma is not required after the last value of enum.
Will fix.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Nov 15, 2021 at 2:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Nov 11, 2021 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 11, 2021 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum.I think it might be better to attempt doing anything to make it
generic for tableAM in a separate patch if that is required.You mean to refactor relation_vacuum table AM API too? Currently,
relation_vacuum API is responsible for whole vacuum operation and
there is no room for the core doing anything during vacuum. So
probably it doesn’t make sense to have a table AM API for parallel
vacuum.
No, I intend to say that let's not do anything for it as of now. It is
not clear what a generic structure for it should be and whether AM's
need anything like that. As the current structure is specific to heap,
it might make sense to declare it in heapam.h as we are doing for
function heap_vacuum_rel(), and additionally, you might want to
include heap in that structure name to be more explicit.
--
With Regards,
Amit Kapila.
On Mon, Nov 15, 2021 at 8:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 15, 2021 at 2:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Nov 11, 2021 at 6:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Nov 11, 2021 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum.I think it might be better to attempt doing anything to make it
generic for tableAM in a separate patch if that is required.You mean to refactor relation_vacuum table AM API too? Currently,
relation_vacuum API is responsible for whole vacuum operation and
there is no room for the core doing anything during vacuum. So
probably it doesn’t make sense to have a table AM API for parallel
vacuum.No, I intend to say that let's not do anything for it as of now. It is
not clear what a generic structure for it should be and whether AM's
need anything like that. As the current structure is specific to heap,
it might make sense to declare it in heapam.h as we are doing for
function heap_vacuum_rel(), and additionally, you might want to
include heap in that structure name to be more explicit.
Thanks for your clarification. I agreed.
I'll update an updated patch tomorrow.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thur, Nov 11, 2021 10:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and separates
parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.
Hi.
I noticed few minor issues in the patch.
1)
+ /*
+ * Parallel unsafe indexes can be processed only by leader (these are
+ * processed in lazy_serial_process_indexes() by leader.
+ */
It seems the function name in the comments should be serial_vacuum_unsafe_indexes
2)
+ stats->parallel_workers_can_process =
+ index_parallel_vacuum_is_safe(pvc->indrels[i],
+ pvc->num_index_scans,
+ bulkdel);
The function index_parallel_vacuum_is_safe also return false for the
index < min_parallel_index_scan_size cutoff which seems parallel safe. So,
maybe we can rename the function to xxx_worker_can_process() ?
Best regards,
Hou zj
On Tue, Nov 16, 2021 at 11:38 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Thur, Nov 11, 2021 10:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and separates
parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.Hi.
I noticed few minor issues in the patch.
1) + /* + * Parallel unsafe indexes can be processed only by leader (these are + * processed in lazy_serial_process_indexes() by leader. + */It seems the function name in the comments should be serial_vacuum_unsafe_indexes
2) + stats->parallel_workers_can_process = + index_parallel_vacuum_is_safe(pvc->indrels[i], + pvc->num_index_scans, + bulkdel);The function index_parallel_vacuum_is_safe also return false for the
index < min_parallel_index_scan_size cutoff which seems parallel safe. So,
maybe we can rename the function to xxx_worker_can_process() ?
Thank you for the comments!
I've incorporated these comments and attached an updated patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor_v3.patchapplication/octet-stream; name=parallel_vacuum_refactor_v3.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4ee1f14854..dcebb639fd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -24,18 +24,9 @@
*
* Lazy vacuum supports parallel execution with parallel worker processes. In
* a parallel vacuum, we perform both index vacuum and index cleanup with
- * parallel worker processes. Individual indexes are processed by one vacuum
- * process. At the beginning of a lazy vacuum (at lazy_scan_heap) we prepare
- * the parallel context and initialize the DSM segment that contains shared
- * information as well as the memory space for storing dead tuples. When
- * starting either index vacuum or index cleanup, we launch parallel worker
- * processes. Once all indexes are processed the parallel worker processes
- * exit. After that, the leader process re-initializes the parallel context
- * so that it can use the same DSM for multiple passes of index vacuum and
- * for performing index cleanup. For updating the index statistics, we need
- * to update the system table and since updates are not allowed during
- * parallel mode we update the index statistics after exiting from the
- * parallel mode.
+ * parallel worker processes. For updating the index statistics, we need to
+ * update the system table and since updates are not allowed during parallel
+ * mode we update the index statistics after exiting from the parallel mode.
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -56,7 +47,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -143,22 +133,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -171,137 +150,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
- * This is allocated in the DSM segment in parallel mode and in local memory
- * in non-parallel mode.
- */
-typedef struct LVDeadTuples
-{
- int max_tuples; /* # slots allocated in array */
- int num_tuples; /* current # of entries */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
- * ItemPointerData */
-} LVDeadTuples;
-
-/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
-#define SizeOfDeadTuples(cnt) \
- add_size(offsetof(LVDeadTuples, itemptrs), \
- mul_size(sizeof(ItemPointerData), cnt))
-#define MAXDEADTUPLES(max_size) \
- (((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
-} LVShared;
-
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
-
-/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
- */
-typedef struct LVSharedIndStats
-{
- bool updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVSharedIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -321,7 +169,7 @@ typedef struct LVRelState
/* Buffer access strategy and parallel state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -345,7 +193,7 @@ typedef struct LVRelState
/*
* State managed by lazy_scan_heap() follows
*/
- LVDeadTuples *dead_tuples; /* items to vacuum from indexes */
+ VacDeadTuples *dead_tuples; /* items to vacuum from indexes */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -416,18 +264,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -446,20 +282,9 @@ static long compute_max_dead_tuples(BlockNumber relblocks, bool hasindex);
static void lazy_space_alloc(LVRelState *vacrel, int nworkers,
BlockNumber relblocks);
static void lazy_space_free(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
- BlockNumber nblocks,
- int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -905,7 +730,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +1865,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2252,7 +2077,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ perform_parallel_index_bulkdel(vacrel->pvs, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2405,7 +2230,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int tupindex, Buffer *vmbuffer)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2626,351 +2451,6 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
- ReinitializeParallelDSM(lps->pcxt);
- }
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (lps->lvshared->for_cleanup)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
-
- /*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
- */
- do_parallel_processing(vacrel, lps->lvshared);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
-
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
- */
- if (!parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
- */
-static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
-{
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
-
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
-{
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
-
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
- {
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
- }
-
- return istat_res;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
@@ -3003,7 +2483,8 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ perform_parallel_index_cleanup(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages));
}
}
@@ -3049,13 +2530,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_tuples);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_tuples->num_tuples),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vacuum_one_index(&ivinfo, istat, vacrel->dead_tuples);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3089,7 +2564,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3105,24 +2579,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3480,9 +2937,11 @@ compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
static void
lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
long maxtuples;
+ maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3506,16 +2965,29 @@ lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
vacrel->relname)));
}
else
- vacrel->lps = begin_parallel_vacuum(vacrel, nblocks, nworkers);
+ {
+ ParallelVacuumCtl ctl;
+
+ /* Create parallel vacuum state */
+ ctl.rel = vacrel->rel;
+ ctl.indrels = vacrel->indrels;
+ ctl.nindexes = vacrel->nindexes;
+ ctl.nrequested_workers = nworkers;
+ ctl.maxtuples = maxtuples;
+ ctl.elevel = elevel;
+ ctl.bstrategy = vacrel->bstrategy;
+ vacrel->pvs = begin_parallel_vacuum(&ctl);
+ }
/* If parallel mode started, we're done */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_tuples = get_vacuum_dead_tuples(vacrel->pvs);
return;
+ }
}
- maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
-
- dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
+ dead_tuples = (VacDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
dead_tuples->num_tuples = 0;
dead_tuples->max_tuples = (int) maxtuples;
@@ -3535,75 +3007,8 @@ lazy_space_free(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_tuples array is in sorted order.
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_tuples->itemptrs[0]);
- ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead tuples on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_tuples->itemptrs,
- dead_tuples->num_tuples,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ end_parallel_vacuum(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3727,76 +3132,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3829,426 +3164,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * This function prepares and returns parallel vacuum state if we can launch
- * even one worker. This function is responsible for entering parallel mode,
- * create a parallel context, and then initialize the DSM segment.
- */
-static LVParallelState *
-begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
- int nrequested)
-{
- LVParallelState *lps = NULL;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- long maxtuples;
- Size est_shared;
- Size est_deadtuples;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
- will_parallel_vacuum);
-
- /* Can't perform vacuum in parallel */
- if (parallel_workers <= 0)
- {
- pfree(will_parallel_vacuum);
- return lps;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared = add_size(est_shared, sizeof(LVSharedIndStats));
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
- maxtuples = compute_max_dead_tuples(nblocks, true);
- est_deadtuples = MAXALIGN(SizeOfDeadTuples(maxtuples));
- shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared);
- MemSet(shared, 0, est_shared);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
- dead_tuples->max_tuples = maxtuples;
- dead_tuples->num_tuples = 0;
- MemSet(dead_tuples->itemptrs, 0, sizeof(ItemPointerData) * maxtuples);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
- vacrel->dead_tuples = dead_tuples;
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- pfree(will_parallel_vacuum);
- return lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-end_parallel_vacuum(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
-
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
-/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
- */
-static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
-
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
- return false;
- }
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVShared *lvshared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_TUPLES,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_tuples = dead_tuples;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..6b427772d5 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..e3cf6fb037 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -90,6 +92,9 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
+
/*
* Primary entry point for manual VACUUM and ANALYZE commands
*
@@ -2258,3 +2263,140 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * lazy_vacuum_one_index() -- vacuum index relation.
+ *
+ * Delete all the index entries pointing to tuples listed in
+ * dead_tuples, and update running statistics.
+ *
+ * reltuples is the number of heap tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_tuples);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_tuples->num_tuples),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * reltuples is the number of heap tuples and estimated_count is true
+ * if reltuples is an estimated value.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_tuples array is in sorted order.
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadTuples *dead_tuples = (VacDeadTuples *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_tuples->itemptrs[0]);
+ ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead tuples on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_tuples->itemptrs,
+ dead_tuples->num_tuples,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..c603e33af2
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1028 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead tuples allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double num_table_tuples;
+ bool estimated_count;
+
+ /*
+ * In single process lazy vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process lazy vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ INDVAC_STATUS_INITIAL = 0,
+ INDVAC_STATUS_NEED_BULKDELETE,
+ INDVAC_STATUS_NEED_CLEANUP,
+ INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for an index bulk-deletion statistic used for parallel vacuum. This
+ * is allocated in the DSM segment.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it. This value
+ * is not a fixed for the entire VACUUM operation. It is only fixed for
+ * an individual parallel index vacuum and cleanup.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* Always NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /* Shared index statistics among parallel vacuum workers */
+ PVIndStats *indstats;
+
+ /* Shared dead tuple space among parallel vacuum workers */
+ VacDeadTuples *dead_tuples;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively. Used by only leader.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* Incremented by each bulkdel or cleanup. Used by only leader */
+ int num_index_scans;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int compute_parallel_vacuum_workers(Relation *indrels, int nindexes,
+ int nrequested);
+static void set_parallel_vacuum_index_status(ParallelVacuumState *pvs,
+ bool bulkdel);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel);
+static void parallel_vacuum_indexes(ParallelVacuumState *pvs);
+static void serial_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *stats);
+static bool index_can_participate_parallel_vacuum(Relation indrel,
+ int num_index_scans,
+ bool bulkdel);
+static void parallel_index_vacuum_error_callback(void *arg);
+
+/*
+ * This function prepares and returns parallel vacuum state if we can launch
+ * even one worker. This function is responsible for entering parallel mode,
+ * create a parallel context, and then initialize the DSM segment.
+ */
+ParallelVacuumState *
+begin_parallel_vacuum(ParallelVacuumCtl *pvstl)
+{
+ ParallelVacuumState *pvs = NULL;
+ ParallelContext *pcxt;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ Size est_indstats = 0;
+ Size est_shared = 0;
+ Size est_deadtuples = 0;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(pvstl);
+ Assert(pvstl->nrequested_workers >= 0);
+ Assert(pvstl->nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ parallel_workers = compute_parallel_vacuum_workers(pvstl->indrels,
+ pvstl->nindexes,
+ pvstl->nrequested_workers);
+
+ /* Can't perform vacuum in parallel */
+ if (parallel_workers <= 0)
+ return pvs;
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = pvstl->indrels;
+ pvs->nindexes = pvstl->nindexes;
+ pvs->bstrategy = pvstl->bstrategy;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats = mul_size(sizeof(PVIndStats), pvstl->nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ est_deadtuples = MAXALIGN(SizeOfDeadTuples(pvstl->maxtuples));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats);
+ MemSet(indstats, 0, est_indstats);
+ for (int i = 0; i < pvstl->nindexes; i++)
+ {
+ Relation indrel = pvstl->indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that don't participate in parallel vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+
+ shm_toc_insert(pcxt->toc,PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared);
+ MemSet(shared, 0, est_shared);
+ shared->relid = RelationGetRelid(pvstl->rel);
+ shared->elevel = pvstl->elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
+ dead_tuples->max_tuples = pvstl->maxtuples;
+ dead_tuples->num_tuples = 0;
+ MemSet(dead_tuples->itemptrs, 0,
+ sizeof(ItemPointerData) * pvstl->maxtuples);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
+ pvs->dead_tuples = dead_tuples;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy updated index
+ * statistics from DSM into local memory so that the caller uses that to
+ * update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+end_parallel_vacuum(ParallelVacuumState *pvs, IndexBulkDeleteResult **indstats)
+{
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ if (stats->istat_updated)
+ {
+ indstats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[i], &stats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ indstats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs);
+}
+
+/* Returns the dead tuple space */
+VacDeadTuples *
+get_vacuum_dead_tuples(ParallelVacuumState *pvs)
+{
+ return pvs->dead_tuples;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+perform_parallel_index_bulkdel(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ pvs->shared->num_table_tuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ parallel_vacuum_all_indexes(pvs, true);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ */
+void
+perform_parallel_index_cleanup(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count)
+{
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ pvs->shared->num_table_tuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false);
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ ParallelVacuumState pvs;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_TUPLES,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Parallel vacuum state and the error callback arg */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_tuples = dead_tuples;
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+ pvs.indname = NULL; /* filled later during index vacuuming */
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_index_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform bulk-deletion/cleanup */
+ parallel_vacuum_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum.
+ */
+static int
+compute_parallel_vacuum_workers(Relation *indrels, int nindexes, int nrequested)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+static void
+set_parallel_vacuum_index_status(ParallelVacuumState *pvs, bool bulkdel)
+{
+ PVIndVacStatus new_status = bulkdel
+ ? INDVAC_STATUS_NEED_BULKDELETE
+ : INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_INITIAL);
+
+ stats->status = new_status;
+ stats->parallel_workers_can_process =
+ index_can_participate_parallel_vacuum(pvs->indrels[i],
+ pvs->num_index_scans,
+ bulkdel);
+ }
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel)
+{
+ int nworkers;
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (pvs->num_index_scans == 0)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ set_parallel_vacuum_index_status(pvs, bulkdel);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize the parallel context to relaunch parallel workers */
+ if (pvs->num_index_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Process the indexes that can be processed by only leader process */
+ serial_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ /* Increment the count */
+ pvs->num_index_scans++;
+}
+
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to process the indexes in parallel.
+ */
+static void
+parallel_vacuum_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (;;)
+ {
+ int idx;
+ PVIndStats *stats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ stats = &(pvs->indstats[idx]);
+
+ /*
+ * Parallel unsafe indexes can be processed only by leader (these are
+ * processed in serial_vacuum_unsafe_indexes() by leader.
+ */
+ if (!stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel processing of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe.
+ *
+ * Also performs processing of smaller indexes that fell under the size cutoff
+ * enforced by compute_parallel_vacuum_workers().
+ */
+static void
+serial_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are processed by workers */
+ if (stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Bulk-delete or cleanup index either by leader process or by one of the worker
+ * process. After processing the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *stats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /* Get the index statistics space, if already updated */
+ if (stats->istat_updated)
+ istat = &(stats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->num_table_tuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = stats->status;
+
+ switch (stats->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vacuum_one_index(&ivinfo, istat, pvs->dead_tuples);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel index vacuum status %d",
+ stats->status);
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!stats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(stats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ stats->istat_updated = true;
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ stats->status = INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+ pvs->status = INDVAC_STATUS_COMPLETED;
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup.
+ */
+static bool
+index_can_participate_parallel_vacuum(Relation indrel, int num_index_scans,
+ bool bulkdel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return false;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_index_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while parallelly vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while parallelly cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case INDVAC_STATUS_INITIAL:
+ case INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..1fa4669a00 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,30 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
+/*
+ * Parameter data structure for BeginParallelVacuum.
+ */
+typedef struct ParallelVacuumCtl
+{
+ /* Table and its indexes */
+ Relation rel;
+ Relation *indrels;
+ int nindexes;
+
+ /* The number of workers requested to launch */
+ int nrequested_workers;
+
+ /* The maximum dead tuples to store */
+ long maxtuples;
+
+ /* Log level and the buffer access strategy */
+ int elevel;
+ BufferAccessStrategy bstrategy;
+} ParallelVacuumCtl;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +256,28 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadTuples stores the dead tuple TIDs collected during the heap scan.
+ * This is allocated in the DSM segment in parallel mode and in local memory
+ * in non-parallel mode.
+ */
+typedef struct VacDeadTuples
+{
+ int max_tuples; /* # slots allocated in array */
+ int num_tuples; /* current # of entries */
+ /* List of TIDs of tuples we intend to delete */
+ /* NB: this list is ordered by TID address */
+ ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
+ * ItemPointerData */
+} VacDeadTuples;
+
+/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
+#define SizeOfDeadTuples(cnt) \
+ add_size(offsetof(VacDeadTuples, itemptrs), \
+ mul_size(sizeof(ItemPointerData), cnt))
+#define MAXDEADTUPLES(max_size) \
+ (((max_size) - offsetof(VacDeadTuples, itemptrs)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +330,23 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vacuum_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *begin_parallel_vacuum(ParallelVacuumCtl *pvctl);
+extern void end_parallel_vacuum(ParallelVacuumState *pvs,
+ IndexBulkDeleteResult **indstats);
+extern VacDeadTuples *get_vacuum_dead_tuples(ParallelVacuumState *pvs);
+extern void perform_parallel_index_bulkdel(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void perform_parallel_index_cleanup(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
index ddf0ee544b..a7d8a801e0 100644
--- a/src/test/regress/expected/vacuum_parallel.out
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -45,5 +45,29 @@ VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+DELETE FROM parallel_vacuum_table2;
+SET maintenance_work_mem TO 1024;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+RESET maintenance_work_mem;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
index 1d23f33e39..49f4f4ce6d 100644
--- a/src/test/regress/sql/vacuum_parallel.sql
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -42,5 +42,40 @@ INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+
+
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+
+DELETE FROM parallel_vacuum_table2;
+
+SET maintenance_work_mem TO 1024;
+
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+RESET maintenance_work_mem;
+
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
On Tues, Nov 16, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Thanks for updating the patch.
I read the latest patch and have few comments.
1)
+/*
+ * lazy_vacuum_one_index() -- vacuum index relation.
...
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
...
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
The above function names seem different from the name mentioned in the function
header.
2)
static void vacuum_error_callback(void *arg);
I noticed the patch changed the parallel worker's error callback function to
parallel_index_vacuum_error_callback(). The error message in new callback
function seems a little different from the old one, was it intentional ?
3)
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
Would it be safer if we report an error if any index's status is not
INDVAC_STATUS_COMPLETED ?
4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:
end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanup
So that all the parallel related functions' name is like parallel_vacuum_xxx.
Best regards,
Hou zj
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Tues, Nov 16, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
2)
static void vacuum_error_callback(void *arg);I noticed the patch changed the parallel worker's error callback function to
parallel_index_vacuum_error_callback(). The error message in new callback
function seems a little different from the old one, was it intentional ?
One more point related to this is that it seems a new callback will be
invoked only by parallel workers, so the context displayed during
parallel vacuum will be different based on if the error happens during
processing by leader or worker. I think if done correctly this would
be an improvement over what we have now but isn't it better to do this
change as a separate patch?
4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?
In general, we are not following the convention to start the function
names with parallel_* at other places so I think we should consider
such a convention on case to case basis. In this case, if we can get
rid of perform_parallel_index_bulkdel|cleanup then we probably don't
need such a renaming.
--
With Regards,
Amit Kapila.
On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Review comments:
================
1.
index_can_participate_parallel_vacuum()
{
..
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
..
}
IIRC, we do this to avoid the need to invoke worker when parallel
cleanup doesn't need to scan the index which means the work required
to be performed by a worker would be minimal. If so, maybe we can
write that in comments here or with
VACUUM_OPTION_PARALLEL_COND_CLEANUP.
If the above understanding is correct then is it correct to check
num_index_scans here? AFAICS, this value is incremented in
parallel_vacuum_all_indexes irrespective of whether it is invoked for
bulk delete or clean up. OTOH, previously, this was done based on
first_time variable which was in turn set based on
vacrel->num_index_scans and that is incremented in
lazy_vacuum_all_indexes(both in serial and parallel case).
2. The structure ParallelVacuumState contains both PVIndVacStatus and
PVIndStats. Considering PVIndVacStatus is already present in
PVIndStats, does ParallelVacuumState need to have both?
3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
only used in one function begin_parallel_vacuum, can't we just declare
in vacuumparallel.c? As it is only required for one function and it is
not that the number of individual parameters will be too huge, can't
we do without having that structure.
--
With Regards,
Amit Kapila.
On Mon, Nov 22, 2021 at 6:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Review comments: ================ 1. index_can_participate_parallel_vacuum() { .. + /* + * Not safe, if the index supports parallel cleanup conditionally, + * but we have already processed the index (for bulkdelete). See the + * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know + * when indexes support parallel cleanup conditionally. + */ + if (num_index_scans > 0 && + ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)) .. }IIRC, we do this to avoid the need to invoke worker when parallel
cleanup doesn't need to scan the index which means the work required
to be performed by a worker would be minimal. If so, maybe we can
write that in comments here or with
VACUUM_OPTION_PARALLEL_COND_CLEANUP.
Right. Will add the comments.
If the above understanding is correct then is it correct to check
num_index_scans here? AFAICS, this value is incremented in
parallel_vacuum_all_indexes irrespective of whether it is invoked for
bulk delete or clean up. OTOH, previously, this was done based on
first_time variable which was in turn set based on
vacrel->num_index_scans and that is incremented in
lazy_vacuum_all_indexes(both in serial and parallel case).
You're right. That's wrong to increment num_index_scans also when
vacuumcleanup. It should be incremented only when bulkdelete. Perhaps,
the caller (i.g., table AM) should pass num_index_scans to parallel
vacuum code? I initially thought that ParallelVacuumState can have
num_index_scans and increment it only when parallel bulk-deletion. But
if we do that we will end up having the same thing in two places:
ParallelVacuumState and LVRelState. It would be clearer if we maintain
num_index_scan in LVRelState and pass it to parallel index vacuum when
calling to parallel index bulk-deletion or cleanup. On the other hand,
the downside would be that there is a possibility that a table AM
passes the wrong num_index_scans. Probably it’s also a valid argument
that since if a table AM is capable of parallel index vacuum, it’s
better to outsource index bulkdelete/cleanup to parallel index vacuum
through a whole vacuum operation, it’d be better to have
ParallelVacuumState maintain num_index_scans.
2. The structure ParallelVacuumState contains both PVIndVacStatus and
PVIndStats. Considering PVIndVacStatus is already present in
PVIndStats, does ParallelVacuumState need to have both?
"PVIndVacStatus status” of ParallelVacuumState is used by the worker
in the error callback function,
parallel_index_vacuum_error_callback(), in order to know the status of
the index vacuum that the worker is working on. I think that without
PVIndVacStatus, the worker needs to have the index of the PVIndStats
array in order to get the status by like
errinfo->indstats[idx]->status. Do you prefer to do that?
3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
only used in one function begin_parallel_vacuum, can't we just declare
in vacuumparallel.c?
ParallelVacuumCtl is a struct to begin the parallel vacuum and
therefore is expected to be passed by table AM. If we declare it in
vacuumparallel.c a table AM (e.g., vacuumlazy.c) cannot use it, no?
As it is only required for one function and it is
not that the number of individual parameters will be too huge, can't
we do without having that structure.
Yes, we can do that without having that structure. I was a bit
concerned that there are already 7 arguments.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Tues, Nov 16, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
2)
static void vacuum_error_callback(void *arg);I noticed the patch changed the parallel worker's error callback function to
parallel_index_vacuum_error_callback(). The error message in new callback
function seems a little different from the old one, was it intentional ?One more point related to this is that it seems a new callback will be
invoked only by parallel workers, so the context displayed during
parallel vacuum will be different based on if the error happens during
processing by leader or worker. I think if done correctly this would
be an improvement over what we have now but isn't it better to do this
change as a separate patch?
Agreed.
4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?
The reason why I declare these two functions are: (1) the fields of
ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup
require different arguments (estimated_count is required only by
cleanup). So if we expose the fields of ParallelVacuumState, the
caller can do those assignments and directly call
parallel_vacuum_all_indexes(). But I'm not sure it's good if those
assignments are the caller's responsibility.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Nov 19, 2021 at 11:25 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Tues, Nov 16, 2021 1:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Thanks for updating the patch.
I read the latest patch and have few comments.
Thank you for the comments! For the comments (2) and (4), I replied in
a separate email to answer your and Amit's comments.
1) +/* + * lazy_vacuum_one_index() -- vacuum index relation. ... +IndexBulkDeleteResult * +vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation. ... +IndexBulkDeleteResult * +cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)The above function names seem different from the name mentioned in the function
header.
Will fix both.
3)
+ /* + * Reset all index status back to invalid (while checking that we have + * processed all indexes). + */ + for (int i = 0; i < pvs->nindexes; i++) + { + PVIndStats *stats = &(pvs->indstats[i]); + + Assert(stats->status == INDVAC_STATUS_COMPLETED); + stats->status = INDVAC_STATUS_INITIAL; + }Would it be safer if we report an error if any index's status is not
INDVAC_STATUS_COMPLETED ?
Agreed. It'd be safer since even if some indexes are vacuumed due to a
bug vacuum errored out rather than continue it (and cause index
corruption).
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Nov 24, 2021 at 7:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Nov 19, 2021 at 11:25 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:3)
+ /* + * Reset all index status back to invalid (while checking that we have + * processed all indexes). + */ + for (int i = 0; i < pvs->nindexes; i++) + { + PVIndStats *stats = &(pvs->indstats[i]); + + Assert(stats->status == INDVAC_STATUS_COMPLETED); + stats->status = INDVAC_STATUS_INITIAL; + }Would it be safer if we report an error if any index's status is not
INDVAC_STATUS_COMPLETED ?Agreed. It'd be safer since even if some indexes are vacuumed due to a
bug vacuum errored out rather than continue it (and cause index
corruption).
I think if we want to report an error in this case, we should use elog
as this is an internal error.
--
With Regards,
Amit Kapila.
On Wed, Nov 24, 2021 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 6:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Review comments: ================ 1. index_can_participate_parallel_vacuum() { .. + /* + * Not safe, if the index supports parallel cleanup conditionally, + * but we have already processed the index (for bulkdelete). See the + * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know + * when indexes support parallel cleanup conditionally. + */ + if (num_index_scans > 0 && + ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)) .. }IIRC, we do this to avoid the need to invoke worker when parallel
cleanup doesn't need to scan the index which means the work required
to be performed by a worker would be minimal. If so, maybe we can
write that in comments here or with
VACUUM_OPTION_PARALLEL_COND_CLEANUP.Right. Will add the comments.
If the above understanding is correct then is it correct to check
num_index_scans here? AFAICS, this value is incremented in
parallel_vacuum_all_indexes irrespective of whether it is invoked for
bulk delete or clean up. OTOH, previously, this was done based on
first_time variable which was in turn set based on
vacrel->num_index_scans and that is incremented in
lazy_vacuum_all_indexes(both in serial and parallel case).You're right. That's wrong to increment num_index_scans also when
vacuumcleanup. It should be incremented only when bulkdelete. Perhaps,
the caller (i.g., table AM) should pass num_index_scans to parallel
vacuum code? I initially thought that ParallelVacuumState can have
num_index_scans and increment it only when parallel bulk-deletion. But
if we do that we will end up having the same thing in two places:
ParallelVacuumState and LVRelState. It would be clearer if we maintain
num_index_scan in LVRelState and pass it to parallel index vacuum when
calling to parallel index bulk-deletion or cleanup.
That sounds reasonable.
On the other hand,
the downside would be that there is a possibility that a table AM
passes the wrong num_index_scans.
If that happens then also there will be no problem as such because it
will do some work via worker where it would have been done by the
leader itself. I think it is better to have one source of information
for this as we need to mainly consider whether bulkdelete has been
already performed or not, it doesn't matter whether that is performed
by leader or worker.
2. The structure ParallelVacuumState contains both PVIndVacStatus and
PVIndStats. Considering PVIndVacStatus is already present in
PVIndStats, does ParallelVacuumState need to have both?"PVIndVacStatus status” of ParallelVacuumState is used by the worker
in the error callback function,
parallel_index_vacuum_error_callback(), in order to know the status of
the index vacuum that the worker is working on. I think that without
PVIndVacStatus, the worker needs to have the index of the PVIndStats
array in order to get the status by like
errinfo->indstats[idx]->status. Do you prefer to do that?
As mentioned in my another email to which you agreed that we need to
re-design this callback and do it separately, I think it is better to
consider it separately. So, we can probably remove this parameter from
the main patch as of now.
3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
only used in one function begin_parallel_vacuum, can't we just declare
in vacuumparallel.c?ParallelVacuumCtl is a struct to begin the parallel vacuum and
therefore is expected to be passed by table AM. If we declare it in
vacuumparallel.c a table AM (e.g., vacuumlazy.c) cannot use it, no?As it is only required for one function and it is
not that the number of individual parameters will be too huge, can't
we do without having that structure.Yes, we can do that without having that structure. I was a bit
concerned that there are already 7 arguments.
Yeah, it is better to have fewer arguments but I don't this number is
big enough to worry. Also, I am not sure about the table AM point as
there is no clear example in front of us which tells how any other
table AM might use it and whether this structure is generic enough. So
I think it might be better to use arguments for this and if we later
find some generic use then we can replace it with structure.
--
With Regards,
Amit Kapila.
On Wed, Nov 24, 2021 at 7:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?The reason why I declare these two functions are: (1) the fields of
ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup
require different arguments (estimated_count is required only by
cleanup). So if we expose the fields of ParallelVacuumState, the
caller can do those assignments and directly call
parallel_vacuum_all_indexes(). But I'm not sure it's good if those
assignments are the caller's responsibility.
Okay, that makes sense. However, I am still not very comfortable with
the function naming suggested by Hou-San, do you have any thoughts on
that?
--
With Regards,
Amit Kapila.
On Wed, Nov 24, 2021 at 1:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 24, 2021 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 6:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 16, 2021 at 11:23 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've incorporated these comments and attached an updated patch.
Review comments: ================ 1. index_can_participate_parallel_vacuum() { .. + /* + * Not safe, if the index supports parallel cleanup conditionally, + * but we have already processed the index (for bulkdelete). See the + * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know + * when indexes support parallel cleanup conditionally. + */ + if (num_index_scans > 0 && + ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)) .. }IIRC, we do this to avoid the need to invoke worker when parallel
cleanup doesn't need to scan the index which means the work required
to be performed by a worker would be minimal. If so, maybe we can
write that in comments here or with
VACUUM_OPTION_PARALLEL_COND_CLEANUP.Right. Will add the comments.
If the above understanding is correct then is it correct to check
num_index_scans here? AFAICS, this value is incremented in
parallel_vacuum_all_indexes irrespective of whether it is invoked for
bulk delete or clean up. OTOH, previously, this was done based on
first_time variable which was in turn set based on
vacrel->num_index_scans and that is incremented in
lazy_vacuum_all_indexes(both in serial and parallel case).You're right. That's wrong to increment num_index_scans also when
vacuumcleanup. It should be incremented only when bulkdelete. Perhaps,
the caller (i.g., table AM) should pass num_index_scans to parallel
vacuum code? I initially thought that ParallelVacuumState can have
num_index_scans and increment it only when parallel bulk-deletion. But
if we do that we will end up having the same thing in two places:
ParallelVacuumState and LVRelState. It would be clearer if we maintain
num_index_scan in LVRelState and pass it to parallel index vacuum when
calling to parallel index bulk-deletion or cleanup.That sounds reasonable.
On the other hand,
the downside would be that there is a possibility that a table AM
passes the wrong num_index_scans.If that happens then also there will be no problem as such because it
will do some work via worker where it would have been done by the
leader itself. I think it is better to have one source of information
for this as we need to mainly consider whether bulkdelete has been
already performed or not, it doesn't matter whether that is performed
by leader or worker.
Agreed.
2. The structure ParallelVacuumState contains both PVIndVacStatus and
PVIndStats. Considering PVIndVacStatus is already present in
PVIndStats, does ParallelVacuumState need to have both?"PVIndVacStatus status” of ParallelVacuumState is used by the worker
in the error callback function,
parallel_index_vacuum_error_callback(), in order to know the status of
the index vacuum that the worker is working on. I think that without
PVIndVacStatus, the worker needs to have the index of the PVIndStats
array in order to get the status by like
errinfo->indstats[idx]->status. Do you prefer to do that?As mentioned in my another email to which you agreed that we need to
re-design this callback and do it separately, I think it is better to
consider it separately. So, we can probably remove this parameter from
the main patch as of now.
Yes, I'll remove it from the next version patch. With that, since
parallel vacuum workers don't set errcontext we will need to do
something for that in a separate patch.
3. Why ParallelVacuumCtl is declared in vacuum.h? It appears to be
only used in one function begin_parallel_vacuum, can't we just declare
in vacuumparallel.c?ParallelVacuumCtl is a struct to begin the parallel vacuum and
therefore is expected to be passed by table AM. If we declare it in
vacuumparallel.c a table AM (e.g., vacuumlazy.c) cannot use it, no?As it is only required for one function and it is
not that the number of individual parameters will be too huge, can't
we do without having that structure.Yes, we can do that without having that structure. I was a bit
concerned that there are already 7 arguments.Yeah, it is better to have fewer arguments but I don't this number is
big enough to worry. Also, I am not sure about the table AM point as
there is no clear example in front of us which tells how any other
table AM might use it and whether this structure is generic enough. So
I think it might be better to use arguments for this and if we later
find some generic use then we can replace it with structure.
Makes sense. Will fix in the next version patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Nov 24, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 24, 2021 at 7:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?The reason why I declare these two functions are: (1) the fields of
ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup
require different arguments (estimated_count is required only by
cleanup). So if we expose the fields of ParallelVacuumState, the
caller can do those assignments and directly call
parallel_vacuum_all_indexes(). But I'm not sure it's good if those
assignments are the caller's responsibility.Okay, that makes sense. However, I am still not very comfortable with
the function naming suggested by Hou-San, do you have any thoughts on
that?
I personally don't disagree with the names starting with
"parallel_vacuum_*". Alternative ideas would be names starting with
"vac_*" like other functions declared in vacuum.h, or to distinguish
from them names starting with "pvac_*".
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Nov 24, 2021 at 12:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 24, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 24, 2021 at 7:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?The reason why I declare these two functions are: (1) the fields of
ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup
require different arguments (estimated_count is required only by
cleanup). So if we expose the fields of ParallelVacuumState, the
caller can do those assignments and directly call
parallel_vacuum_all_indexes(). But I'm not sure it's good if those
assignments are the caller's responsibility.Okay, that makes sense. However, I am still not very comfortable with
the function naming suggested by Hou-San, do you have any thoughts on
that?I personally don't disagree with the names starting with
"parallel_vacuum_*".
I don't have any strong opinion here but I prefer the name which makes
more sense in the context it is being used. OTOH, I see there is an
argument that it will be easier to follow and might appear consistent
if we use parallel_vacuum_*.
--
With Regards,
Amit Kapila.
On Wed, Nov 24, 2021 at 5:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 24, 2021 at 12:16 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 24, 2021 at 1:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 24, 2021 at 7:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Nov 22, 2021 at 1:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Nov 19, 2021 at 7:55 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:4)
Just a personal suggestion for the parallel related function name. Since Andres
wanted a uniform naming pattern. Mabe we can rename the following functions:end|begin_parallel_vacuum => parallel_vacuum_end|begin
perform_parallel_index_bulkdel|cleanup => parallel_vacuum_index_bulkdel|cleanupSo that all the parallel related functions' name is like parallel_vacuum_xxx.
BTW, do we really need functions
perform_parallel_index_bulkdel|cleanup? Both do some minimal
assignments and then call parallel_vacuum_all_indexes() and there is
just one caller of each. Isn't it better to just do those assignments
in the caller and directly call parallel_vacuum_all_indexes()?The reason why I declare these two functions are: (1) the fields of
ParallelVacuumState are not exposed and (2) bulk-deletion and cleanup
require different arguments (estimated_count is required only by
cleanup). So if we expose the fields of ParallelVacuumState, the
caller can do those assignments and directly call
parallel_vacuum_all_indexes(). But I'm not sure it's good if those
assignments are the caller's responsibility.Okay, that makes sense. However, I am still not very comfortable with
the function naming suggested by Hou-San, do you have any thoughts on
that?I personally don't disagree with the names starting with
"parallel_vacuum_*".I don't have any strong opinion here but I prefer the name which makes
more sense in the context it is being used. OTOH, I see there is an
argument that it will be easier to follow and might appear consistent
if we use parallel_vacuum_*.
Maybe we can start with using parallel_vacuum_*. We can change them
later if there is an argument.
I've attached an updated patch. I don't update the terminology in
vacuum that we're discussing on another thread[1]/messages/by-id/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com.
Regards,
[1]: /messages/by-id/CAH2-WzktGBg4si6DEdmq3q6SoXSDqNi6MtmB8CmmTmvhsxDTLA@mail.gmail.com
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor_v4.patchapplication/octet-stream; name=parallel_vacuum_refactor_v4.patchDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddd0bb9875..1f48d688a4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -21,6 +21,12 @@
* that there only needs to be one call to lazy_vacuum, after the initial pass
* completes.
*
+ * Lazy vacuum supports parallel execution with parallel worker processes. In
+ * a parallel vacuum, we perform both index vacuum and index cleanup with
+ * parallel worker processes. For updating the index statistics, we need to
+ * update the system table and since updates are not allowed during parallel
+ * mode we update the index statistics after exiting from the parallel mode.
+ *
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
@@ -40,7 +46,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,22 +125,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -148,137 +142,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadTuples stores the dead tuple TIDs collected during the heap scan.
- * This is allocated in the DSM segment in parallel mode and in local memory
- * in non-parallel mode.
- */
-typedef struct LVDeadTuples
-{
- int max_tuples; /* # slots allocated in array */
- int num_tuples; /* current # of entries */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
- * ItemPointerData */
-} LVDeadTuples;
-
-/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
-#define SizeOfDeadTuples(cnt) \
- add_size(offsetof(LVDeadTuples, itemptrs), \
- mul_size(sizeof(ItemPointerData), cnt))
-#define MAXDEADTUPLES(max_size) \
- (((max_size) - offsetof(LVDeadTuples, itemptrs)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
-} LVShared;
-
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
-
-/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
- */
-typedef struct LVSharedIndStats
-{
- bool updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVSharedIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -298,7 +161,7 @@ typedef struct LVRelState
/* Buffer access strategy and parallel state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -322,7 +185,7 @@ typedef struct LVRelState
/*
* State managed by lazy_scan_heap() follows
*/
- LVDeadTuples *dead_tuples; /* items to vacuum from indexes */
+ VacDeadTuples *dead_tuples; /* items to vacuum from indexes */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -393,18 +256,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -423,20 +274,9 @@ static long compute_max_dead_tuples(BlockNumber relblocks, bool hasindex);
static void lazy_space_alloc(LVRelState *vacrel, int nworkers,
BlockNumber relblocks);
static void lazy_space_free(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static LVParallelState *begin_parallel_vacuum(LVRelState *vacrel,
- BlockNumber nblocks,
- int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -894,7 +734,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2030,7 +1870,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2242,7 +2082,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_index_bulkdel(vacrel->pvs, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2395,7 +2235,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int tupindex, Buffer *vmbuffer)
{
- LVDeadTuples *dead_tuples = vacrel->dead_tuples;
+ VacDeadTuples *dead_tuples = vacrel->dead_tuples;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2616,351 +2456,6 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
- ReinitializeParallelDSM(lps->pcxt);
- }
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (lps->lvshared->for_cleanup)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
-
- /*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
- */
- do_parallel_processing(vacrel, lps->lvshared);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
-
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
- */
- if (!parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
- */
-static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
-{
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
-
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
-{
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
-
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
- {
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
- }
-
- return istat_res;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
@@ -2993,7 +2488,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_index_cleanup(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
}
}
@@ -3039,13 +2536,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_tuples);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_tuples->num_tuples),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vacuum_one_index(&ivinfo, istat, vacrel->dead_tuples);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3079,7 +2570,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3095,24 +2585,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3470,9 +2943,11 @@ compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
static void
lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
{
- LVDeadTuples *dead_tuples;
+ VacDeadTuples *dead_tuples;
long maxtuples;
+ maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3496,16 +2971,20 @@ lazy_space_alloc(LVRelState *vacrel, int nworkers, BlockNumber nblocks)
vacrel->relname)));
}
else
- vacrel->lps = begin_parallel_vacuum(vacrel, nblocks, nworkers);
+ vacrel->pvs = parallel_vacuum_begin(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ maxtuples, elevel,
+ vacrel->bstrategy);
/* If parallel mode started, we're done */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_tuples = get_vacuum_dead_tuples(vacrel->pvs);
return;
+ }
}
- maxtuples = compute_max_dead_tuples(nblocks, vacrel->nindexes > 0);
-
- dead_tuples = (LVDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
+ dead_tuples = (VacDeadTuples *) palloc(SizeOfDeadTuples(maxtuples));
dead_tuples->num_tuples = 0;
dead_tuples->max_tuples = (int) maxtuples;
@@ -3525,75 +3004,8 @@ lazy_space_free(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_tuples array is in sorted order.
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadTuples *dead_tuples = (LVDeadTuples *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_tuples->itemptrs[0]);
- ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead tuples on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_tuples->itemptrs,
- dead_tuples->num_tuples,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3717,76 +3129,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3819,432 +3161,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * This function prepares and returns parallel vacuum state if we can launch
- * even one worker. This function is responsible for entering parallel mode,
- * create a parallel context, and then initialize the DSM segment.
- */
-static LVParallelState *
-begin_parallel_vacuum(LVRelState *vacrel, BlockNumber nblocks,
- int nrequested)
-{
- LVParallelState *lps = NULL;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- long maxtuples;
- Size est_shared;
- Size est_deadtuples;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
- will_parallel_vacuum);
-
- /* Can't perform vacuum in parallel */
- if (parallel_workers <= 0)
- {
- pfree(will_parallel_vacuum);
- return lps;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared = add_size(est_shared, sizeof(LVSharedIndStats));
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
- maxtuples = compute_max_dead_tuples(nblocks, true);
- est_deadtuples = MAXALIGN(SizeOfDeadTuples(maxtuples));
- shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared);
- MemSet(shared, 0, est_shared);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
- dead_tuples->max_tuples = maxtuples;
- dead_tuples->num_tuples = 0;
- MemSet(dead_tuples->itemptrs, 0, sizeof(ItemPointerData) * maxtuples);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
- vacrel->dead_tuples = dead_tuples;
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- pfree(will_parallel_vacuum);
- return lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-end_parallel_vacuum(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
-
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
-/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
- */
-static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
-
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
- return false;
- }
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVShared *lvshared;
- LVDeadTuples *dead_tuples;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set dead tuple space */
- dead_tuples = (LVDeadTuples *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_TUPLES,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_tuples = dead_tuples;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..6b427772d5 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..2722f3a121 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -90,6 +92,9 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
+
/*
* Primary entry point for manual VACUUM and ANALYZE commands
*
@@ -2258,3 +2263,140 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * vacuum_one_index() -- vacuum index relation.
+ *
+ * Delete all the index entries pointing to tuples listed in
+ * dead_tuples, and update running statistics.
+ *
+ * reltuples is the number of heap tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_tuples);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_tuples->num_tuples),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * reltuples is the number of heap tuples and estimated_count is true
+ * if reltuples is an estimated value.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_tuples array is in sorted order.
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadTuples *dead_tuples = (VacDeadTuples *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_tuples->itemptrs[0]);
+ ritem = itemptr_encode(&dead_tuples->itemptrs[dead_tuples->num_tuples - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead tuples on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_tuples->itemptrs,
+ dead_tuples->num_tuples,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..6b0cf81153
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,979 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead tuples allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double num_table_tuples;
+ bool estimated_count;
+
+ /*
+ * In single process lazy vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process lazy vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ INDVAC_STATUS_INITIAL = 0,
+ INDVAC_STATUS_NEED_BULKDELETE,
+ INDVAC_STATUS_NEED_CLEANUP,
+ INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for an index bulk-deletion statistic used for parallel vacuum. This
+ * is allocated in the DSM segment.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it. This value
+ * is not a fixed for the entire VACUUM operation. It is only fixed for
+ * an individual parallel index vacuum and cleanup.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* Always NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /* Shared index statistics among parallel vacuum workers */
+ PVIndStats *indstats;
+
+ /* Shared dead tuple space among parallel vacuum workers */
+ VacDeadTuples *dead_tuples;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively. Used by only leader.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+} ParallelVacuumState;
+
+static int compute_parallel_vacuum_workers(Relation *indrels, int nindexes,
+ int nrequested);
+static void set_parallel_vacuum_index_status(ParallelVacuumState *pvs,
+ bool bulkdel,
+ int num_index_scans);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ int num_index_scans);
+static void parallel_vacuum_indexes(ParallelVacuumState *pvs);
+static void vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *stats);
+static bool index_can_participate_parallel_vacuum(Relation indrel,
+ int num_index_scans,
+ bool bulkdel);
+
+/*
+ * This function prepares and returns parallel vacuum state if we can launch
+ * even one worker. This function is responsible for entering parallel mode,
+ * create a parallel context, and then initialize the DSM segment.
+ */
+ParallelVacuumState *
+parallel_vacuum_begin(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, long maxtuples,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs = NULL;
+ ParallelContext *pcxt;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ Size est_indstats = 0;
+ Size est_shared = 0;
+ Size est_deadtuples = 0;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ parallel_workers = compute_parallel_vacuum_workers(indrels, nindexes,
+ nrequested_workers);
+
+ /* Can't perform vacuum in parallel */
+ if (parallel_workers <= 0)
+ return pvs;
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->bstrategy = bstrategy;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
+ est_deadtuples = MAXALIGN(SizeOfDeadTuples(maxtuples));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats);
+ MemSet(indstats, 0, est_indstats);
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that don't participate in parallel vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+
+ shm_toc_insert(pcxt->toc,PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared);
+ MemSet(shared, 0, est_shared);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_allocate(pcxt->toc, est_deadtuples);
+ dead_tuples->max_tuples = maxtuples;
+ dead_tuples->num_tuples = 0;
+ MemSet(dead_tuples->itemptrs, 0,
+ sizeof(ItemPointerData) * maxtuples);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_TUPLES, dead_tuples);
+ pvs->dead_tuples = dead_tuples;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy updated index
+ * statistics from DSM into local memory so that the caller uses that to
+ * update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **indstats)
+{
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ if (stats->istat_updated)
+ {
+ indstats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[i], &stats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ indstats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs);
+}
+
+/* Returns the dead tuple space */
+VacDeadTuples *
+get_vacuum_dead_tuples(ParallelVacuumState *pvs)
+{
+ return pvs->dead_tuples;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_index_bulkdel(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ pvs->shared->num_table_tuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* num_index_scans is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, 0);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ */
+void
+parallel_vacuum_index_cleanup(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, int num_index_scans)
+{
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ pvs->shared->num_table_tuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, num_index_scans);
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadTuples *dead_tuples;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ ParallelVacuumState pvs;
+ int nindexes;
+ char *sharedquery;
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead tuple space */
+ dead_tuples = (VacDeadTuples *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_TUPLES,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_tuples = dead_tuples;
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform bulk-deletion/cleanup */
+ parallel_vacuum_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum.
+ */
+static int
+compute_parallel_vacuum_workers(Relation *indrels, int nindexes, int nrequested)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+static void
+set_parallel_vacuum_index_status(ParallelVacuumState *pvs, bool bulkdel,
+ int num_index_scans)
+{
+ PVIndVacStatus new_status = bulkdel
+ ? INDVAC_STATUS_NEED_BULKDELETE
+ : INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_INITIAL);
+
+ stats->status = new_status;
+ stats->parallel_workers_can_process =
+ index_can_participate_parallel_vacuum(pvs->indrels[i],
+ num_index_scans,
+ bulkdel);
+ }
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ int num_index_scans)
+{
+ int nworkers;
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (num_index_scans == 0)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ set_parallel_vacuum_index_status(pvs, bulkdel, num_index_scans);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize the parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+
+ pvs->first_time = false;
+ }
+
+ /* Process all indexes that can be processed by only leader process */
+ vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ if (stats->status != INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to process the indexes in parallel.
+ */
+static void
+parallel_vacuum_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (;;)
+ {
+ int idx;
+ PVIndStats *stats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ stats = &(pvs->indstats[idx]);
+
+ /*
+ * Parallel unsafe indexes can be processed only by leader (these are
+ * processed in vacuum_unsafe_indexes() by leader.
+ */
+ if (!stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel processing of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe.
+ *
+ * Also performs processing of smaller indexes that fell under the size cutoff
+ * enforced by compute_parallel_vacuum_workers().
+ */
+static void
+vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *stats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are processed by workers */
+ if (stats->parallel_workers_can_process)
+ continue;
+
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], stats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Bulk-delete or cleanup index either by leader process or by one of the worker
+ * process. After processing the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *stats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /* Get the index statistics space, if already updated */
+ if (stats->istat_updated)
+ istat = &(stats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->num_table_tuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ switch (stats->status)
+ {
+ case INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vacuum_one_index(&ivinfo, istat, pvs->dead_tuples);
+ break;
+ case INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel index vacuum status %d of index \"%s\"",
+ stats->status, RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!stats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(stats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ stats->istat_updated = true;
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ stats->status = INDVAC_STATUS_COMPLETED;
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup.
+ */
+static bool
+index_can_participate_parallel_vacuum(Relation indrel, int num_index_scans,
+ bool bulkdel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return false;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index. See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */
+ if (num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..35ed913b56 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,28 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadTuples stores the dead tuple TIDs collected during the heap scan.
+ * This is allocated in the DSM segment in parallel mode and in local memory
+ * in non-parallel mode.
+ */
+typedef struct VacDeadTuples
+{
+ int max_tuples; /* # slots allocated in array */
+ int num_tuples; /* current # of entries */
+ /* List of TIDs of tuples we intend to delete */
+ /* NB: this list is ordered by TID address */
+ ItemPointerData itemptrs[FLEXIBLE_ARRAY_MEMBER]; /* array of
+ * ItemPointerData */
+} VacDeadTuples;
+
+/* The dead tuple space consists of LVDeadTuples and dead tuple TIDs */
+#define SizeOfDeadTuples(cnt) \
+ add_size(offsetof(VacDeadTuples, itemptrs), \
+ mul_size(sizeof(ItemPointerData), cnt))
+#define MAXDEADTUPLES(max_size) \
+ (((max_size) - offsetof(VacDeadTuples, itemptrs)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +309,28 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vacuum_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadTuples *dead_tuples);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_begin(Relation rel, Relation *indrels,
+ int nindexes,
+ int nrequested_workers,
+ long maxtuples, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs,
+ IndexBulkDeleteResult **indstats);
+extern VacDeadTuples *get_vacuum_dead_tuples(ParallelVacuumState *pvs);
+extern void parallel_vacuum_index_bulkdel(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_index_cleanup(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ int num_index_scans);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
index ddf0ee544b..a7d8a801e0 100644
--- a/src/test/regress/expected/vacuum_parallel.out
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -45,5 +45,29 @@ VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table;
INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+DELETE FROM parallel_vacuum_table2;
+SET maintenance_work_mem TO 1024;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+RESET maintenance_work_mem;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
index 1d23f33e39..49f4f4ce6d 100644
--- a/src/test/regress/sql/vacuum_parallel.sql
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -42,5 +42,40 @@ INSERT INTO parallel_vacuum_table SELECT i FROM generate_series(1, 10000) i;
RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+
+
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+
+DELETE FROM parallel_vacuum_table2;
+
+SET maintenance_work_mem TO 1024;
+
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+RESET maintenance_work_mem;
+
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
On Mon, Nov 29, 2021 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Maybe we can start with using parallel_vacuum_*. We can change them
later if there is an argument.I've attached an updated patch. I don't update the terminology in
vacuum that we're discussing on another thread[1].
Hi,
I noticed the patch no longer applies on the latest source.
And few comments.
1)
+static void set_parallel_vacuum_index_status(ParallelVacuumState *pvs,
+ bool bulkdel,
+ int num_index_scans);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ int num_index_scans);
...
+static bool index_can_participate_parallel_vacuum(Relation indrel,
+ int num_index_scans);
Maybe the parameter num_index_scans can be replaced by a bool flag since it is
only used in the check "num_index_scans > 0" and "num_index_scans == 0".
2)
+ /* Reinitialize the parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
It seems the ParallelVacuumState::first_time was not initialized before ?
Best regards
Hou zj
On Tue, Nov 30, 2021 at 11:03 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Mon, Nov 29, 2021 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2) + /* Reinitialize the parallel context to relaunch parallel workers */ + if (!pvs->first_time)It seems the ParallelVacuumState::first_time was not initialized before ?
Yeah, I also notice this while looking at the patch.
One more thing it seems the patch has removed even the existing error
callback from parallel_vacuum_main. I suggested that we can enhance or
add a new one if required in a separate patch but let's keep the
current one as it is.
Can we think of splitting the patch in the following manner: (a) the
patch to get rid of bitmap to represent whether particular index
supports parallel vacuum and rename of functions (b) any other stuff
to improve the current implementation, (c) move the parallel vacuum
related code to a separate file?
I think if we can split the patch, it will be easier to review and
reduce the chances of introducing any bugs in this area.
--
With Regards,
Amit Kapila.
On Tue, Nov 30, 2021 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 30, 2021 at 11:03 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Mon, Nov 29, 2021 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2) + /* Reinitialize the parallel context to relaunch parallel workers */ + if (!pvs->first_time)It seems the ParallelVacuumState::first_time was not initialized before ?
Yeah, I also notice this while looking at the patch.
Thank you for the comments, Amit and Hou.
One more thing it seems the patch has removed even the existing error
callback from parallel_vacuum_main. I suggested that we can enhance or
add a new one if required in a separate patch but let's keep the
current one as it is.
Understood.
Can we think of splitting the patch in the following manner: (a) the
patch to get rid of bitmap to represent whether particular index
supports parallel vacuum and rename of functions (b) any other stuff
to improve the current implementation, (c) move the parallel vacuum
related code to a separate file?
Okay, I'll split the patch and submit them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Nov 30, 2021 at 4:45 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Nov 30, 2021 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 30, 2021 at 11:03 AM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Mon, Nov 29, 2021 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2) + /* Reinitialize the parallel context to relaunch parallel workers */ + if (!pvs->first_time)It seems the ParallelVacuumState::first_time was not initialized before ?
Yeah, I also notice this while looking at the patch.
Thank you for the comments, Amit and Hou.
One more thing it seems the patch has removed even the existing error
callback from parallel_vacuum_main. I suggested that we can enhance or
add a new one if required in a separate patch but let's keep the
current one as it is.Understood.
Can we think of splitting the patch in the following manner: (a) the
patch to get rid of bitmap to represent whether particular index
supports parallel vacuum and rename of functions (b) any other stuff
to improve the current implementation, (c) move the parallel vacuum
related code to a separate file?Okay, I'll split the patch and submit them.
I've attached updated patches.
The first patch is the main patch for refactoring parallel vacuum
code; removes bitmap-related code and renames function names for
consistency. The second patch moves these parallel-related codes to
vacuumparallel.c as well as common functions that are used by both
lazyvacuum.c and vacuumparallel.c to vacuum.c. The third patch adds
regression tests for parallel vacuum on different kinds of indexes
with multiple index scans. Please review them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v4-0003-Add-regression-test-cases-for-parallel-vacuum.patchapplication/octet-stream; name=v4-0003-Add-regression-test-cases-for-parallel-vacuum.patchDownload
From e0dea27f6c9025662d229445cf84cd656185b96b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Dec 2021 14:55:26 +0900
Subject: [PATCH v4 3/3] Add regression test cases for parallel vacuum
The newly added test cases test parallel vacuum on the table with
different types of indexes and multiple index scans.
---
src/test/regress/expected/vacuum_parallel.out | 24 +++++++++++++
src/test/regress/sql/vacuum_parallel.sql | 35 +++++++++++++++++++
2 files changed, 59 insertions(+)
diff --git a/src/test/regress/expected/vacuum_parallel.out b/src/test/regress/expected/vacuum_parallel.out
index ddf0ee544b..b793c247f8 100644
--- a/src/test/regress/expected/vacuum_parallel.out
+++ b/src/test/regress/expected/vacuum_parallel.out
@@ -47,3 +47,27 @@ RESET max_parallel_maintenance_workers;
RESET min_parallel_index_scan_size;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+DELETE FROM parallel_vacuum_table2;
+SET maintenance_work_mem TO 1024;
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+RESET maintenance_work_mem;
diff --git a/src/test/regress/sql/vacuum_parallel.sql b/src/test/regress/sql/vacuum_parallel.sql
index 1d23f33e39..03eb8ef858 100644
--- a/src/test/regress/sql/vacuum_parallel.sql
+++ b/src/test/regress/sql/vacuum_parallel.sql
@@ -44,3 +44,38 @@ RESET min_parallel_index_scan_size;
-- Deliberately don't drop table, to get further coverage from tools like
-- pg_amcheck in some testing scenarios
+
+CREATE TABLE parallel_vacuum_table2 (a int, b int4[]) WITH (autovacuum_enabled = off);
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 10000) g;
+
+-- Create different types of indexes, i.g. having different parallelvacuumoptions.
+-- Also create a small index same as above.
+CREATE INDEX pv_bt_index ON parallel_vacuum_table2 USING btree (a);
+CREATE INDEX pv_hash_index ON parallel_vacuum_table2 USING hash (a);
+CREATE INDEX pv_gin_index ON parallel_vacuum_table2 USING gin (b);
+CREATE INDEX pv_brin_index ON parallel_vacuum_table2 USING brin (a);
+CREATE INDEX pv_small_index ON parallel_vacuum_table2 USING btree ((1));
+
+
+-- Parallel index vacuum for various types of indexes.
+DELETE FROM parallel_vacuum_table2;
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- XXX: in order to execute index scan twice, we need about 200,000 garbage tuples
+-- with the minimum maintenance_work_mem. However, it takes a long time to load.
+INSERT INTO parallel_vacuum_table2 SELECT g, ARRAY[1, 2, g] FROM generate_series(1, 200000) g;
+
+DELETE FROM parallel_vacuum_table2;
+
+SET maintenance_work_mem TO 1024;
+
+-- Parallel index vacuum for various types of indexes.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+-- Parallel index cleanup.
+VACUUM (PARALLEL 4, INDEX_CLEANUP ON) parallel_vacuum_table2;
+
+RESET maintenance_work_mem;
--
2.24.3 (Apple Git-128)
v4-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchapplication/octet-stream; name=v4-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchDownload
From e7c96ce544fdc53cd8ccae48ed8238c35fa50673 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v4 1/3] Refactor parallel vacuum to remove bitmap-related
code.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming
is safe and had null-bitmap in shmem area to access them. This logic
was too complicated with a small benefit of saving only a few bits per
indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for
every index, even those indexes where parallel index vacuuming is
unsafe or not worthwhile. This commit makes the code clear by removing
all bitmap-related code.
Also, add the check each index vacuum status after parallel index
vacuum to make sure that all indexes have been processed.
Finaly, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
An upcoming patch also refactors parallel vacuum further to make it
generic so that any table AM can utilize parallel vacuum functionality.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 612 +++++++++++++--------------
1 file changed, 288 insertions(+), 324 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87b..2255861930 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -181,14 +182,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,6 +266,9 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /* Shared index statistics among parallel vacuum workers */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
@@ -391,18 +398,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -425,14 +420,22 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2240,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2610,99 +2613,91 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
/*
* Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
{
LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
Assert(!IsParallelWorker());
Assert(ParallelVacuumIsActive(vacrel));
Assert(vacrel->nindexes > 0);
+ /* Determine the number of parallel workers to launch */
+ if (vacuum)
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
+
/* The leader process will participate */
nworkers--;
/*
* It is possible that parallel context is initialized with fewer workers
* than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ * phase, so we need to consider it. See parallel_vacuum_compute_workers().
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
+
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+ }
+ else
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
+
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+ }
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(vacrel,
+ vacrel->indrels[i],
+ vacuum);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2735,28 +2730,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ parallel_vacuum_process_unsafe_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2771,6 +2766,21 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2787,7 +2797,8 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2799,39 +2810,27 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
for (;;)
{
int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pis;
/* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
/* Done for all indexes? */
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ pis = &(pindstats[idx]);
/*
* Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * processed in parallel_vacuum_process_unsafe_indexes() by leader)
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!pis->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
}
/*
@@ -2847,15 +2846,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * on details like whether we're performing index cleanup right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by parallel_vacuum_compute_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2866,28 +2866,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
/*
@@ -2904,29 +2891,37 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
/*
* Update the pointer to the corresponding bulk-deletion result if someone
* has already updated it
*/
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (pindstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -2987,7 +2983,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
@@ -3520,7 +3516,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3548,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3753,13 +3749,10 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*
* nrequested is the number of parallel workers that user requested. If
* nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
+ * the number of indexes that support parallel vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested)
{
int nindexes_parallel = 0;
int nindexes_parallel_bulkdel = 0;
@@ -3779,13 +3772,13 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
continue;
- will_parallel_vacuum[idx] = true;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
nindexes_parallel_bulkdel++;
@@ -3855,7 +3848,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3856,11 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
- bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3883,14 +3877,10 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
/*
* Compute the number of parallel vacuum workers to launch
*/
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
- will_parallel_vacuum);
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested);
if (parallel_workers <= 0)
{
/* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
return;
}
@@ -3902,41 +3892,13 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(LVShared));
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3973,6 +3935,42 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +3984,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4038,8 +4021,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
- pfree(will_parallel_vacuum);
-
/* Success -- set dead_items and lps in leader's vacrel state */
vacrel->dead_items = dead_items;
vacrel->lps = lps;
@@ -4055,7 +4036,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +4047,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4095,67 +4067,56 @@ end_parallel_vacuum(LVRelState *vacrel)
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
*/
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- p += sizeof(LVSharedIndStats);
- }
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
- return (LVSharedIndStats *) p;
+ return false;
}
/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return false;
- }
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
return true;
}
@@ -4171,6 +4132,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4152,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4173,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4223,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
--
2.24.3 (Apple Git-128)
v4-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v4-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From 9a7fb99ea81e2a62d244b507dcca8929ea22b79b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Dec 2021 14:35:05 +0900
Subject: [PATCH v4 2/3] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1162 ++-----------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 156 ++++
src/backend/commands/vacuumparallel.c | 1092 +++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 42 +
7 files changed, 1355 insertions(+), 1101 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2255861930..7c5240fa2e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,141 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /* Shared index statistics among parallel vacuum workers */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -301,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -325,9 +177,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -413,29 +270,12 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_index_statistics(LVRelState *vacrel);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_should_skip_index(Relation indrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
-
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -893,7 +733,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2028,7 +1868,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2071,7 +1911,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2200,7 +2039,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2239,8 +2077,21 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2392,7 +2243,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2613,351 +2464,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- /* Determine the number of parallel workers to launch */
- if (vacuum)
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- else
- {
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
- }
-
- /* Set index vacuum status and mark as parallel safe or not */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
-
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- parallel_vacuum_index_is_parallel_safe(vacrel,
- vacrel->indrels[i],
- vacuum);
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in parallel_vacuum_process_unsafe_indexes() by leader)
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each
- * worker touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2982,8 +2494,23 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
}
}
@@ -3031,13 +2558,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vacuum_one_index(&ivinfo, istat, vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3072,7 +2593,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3088,24 +2608,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3467,19 +2970,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3490,9 +2980,12 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3516,16 +3009,22 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_begin(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3548,75 +3047,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3740,73 +3172,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- continue;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3817,7 +3182,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3839,407 +3204,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(sizeof(LVShared));
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Check if the index is a totally unsuitable target for all parallel
- * processing up front. For example, the index could be
- * < min_parallel_index_scan_size cutoff.
- */
-static bool
-parallel_vacuum_should_skip_index(Relation indrel)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- return true;
-
- return false;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- return false;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally,
- * but we have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- LVDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..d043d4f8f1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,155 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * vacuum_one_index() -- vacuum index relation.
+ *
+ * Delete all the index tuples containing a TID collected in
+ * dead_items. Also update running statistics. Exact details depend
+ * on index AM's ambulkdelete routine.
+ *
+ * reltuples is the number of table tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ * See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vacuum_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine. reltuples is the number
+ * of table tuples and estimated_count is true if reltuples is an
+ * estimated value. See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..9392c25478
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1092 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input table tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for table scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as an index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /* Shared index statistics among parallel vacuum workers */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool no_bulkdel_call);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool no_bulkdel_call);
+static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success (when we can launch one or more workers), return parallel vacuum
+ * state. Otherwise, return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_begin(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * processing indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = MAXALIGN(vac_max_items_to_alloc_size(max_items));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum *\/ */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
+
+ return false;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* no_bulkdel_call is not used in parallel bulkdel cases */
+ parallel_vacuum_process_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * no_bulkdel_call must be true if there was no parallel_vacuum_bulkdel_all_indexes
+ * call in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool no_bulkdel_call)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_process_all_indexes(pvs, false, no_bulkdel_call);
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool no_bulkdel_call)
+{
+ int nworkers;
+ ErrorContextCallback errcallback;
+ PVIndVacStatus new_status = vacuum
+ ? PARALLEL_INDVAC_STATUS_NEED_BULKDELETE
+ : PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ Assert(!IsParallelWorker());
+
+ /* Determine the number of parallel workers to launch */
+ if (vacuum)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (no_bulkdel_call)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
+ vacuum,
+ no_bulkdel_call);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of table.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+
+ pvs->first_time = false;
+ }
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process the indexes that can be processed by only leader process */
+ parallel_vacuum_process_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_process_safe_indexes(pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to table scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool no_bulkdel_call)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Return false if the index is unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). See the
+ * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
+ * when indexes support parallel cleanup conditionally.
+ */
+ if (!no_bulkdel_call &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform parallel processing of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs processing of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are processed by workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to process the indexes in parallel.
+ */
+static void
+parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip processing indexes that are unsafe for workers (these are
+ * processed in parallel_vacuum_process_unsafe_indexes() by leader)
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do bulkdelete or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After processing the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vacuum_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space (set as worker's dead_items below) */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..506d649a6a 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores dead TIDs collected during the heap scan.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +302,28 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vacuum_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_begin(Relation rel, Relation *indrels,
+ int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
--
2.24.3 (Apple Git-128)
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the function and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code. We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?
3. /*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?
4.
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(LVShared));
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
Do we need MAXALIGN here? I think previously we required it here
because immediately after this we were writing index stats but now
those are allocated separately. Normally, shm_toc_estimate_chunk()
aligns the size but sometimes we need to do it so as to avoid
unaligned accessing of shared mem. I am really not sure whether we
require it for dead_items, do you remember why it is there in the
first place.
5. The below-updated comment based on one of my previous suggestions
seems to be missing in this version.
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index. See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */
--
With Regards,
Amit Kapila.
On Thur, Dec 2, 2021 8:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
The first patch is the main patch for refactoring parallel vacuum code; removes
bitmap-related code and renames function names for consistency. The second
patch moves these parallel-related codes to vacuumparallel.c as well as
common functions that are used by both lazyvacuum.c and vacuumparallel.c to
vacuum.c. The third patch adds regression tests for parallel vacuum on
different kinds of indexes with multiple index scans. Please review them.
Thanks for updating the patch.
I reviewed the 0001 patch and didn’t find some big issues in the patch.
I only have a personally suggestion for the following function name:
parallel_vacuum_process_unsafe_indexes
parallel_vacuum_index_is_parallel_safe
It seems not only "unsafe" index are processed in the above functions,
but also index which is unsuitable(based on parallel_vacuum_should_skip_index).
So, it might be clear to avoid "unsafe" in the name. Maybe we can use: "xxin_leader"
or " can_participate".
Best regards,
Hou zj
On Fri, Dec 3, 2021 at 3:01 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Thur, Dec 2, 2021 8:31 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
The first patch is the main patch for refactoring parallel vacuum code; removes
bitmap-related code and renames function names for consistency. The second
patch moves these parallel-related codes to vacuumparallel.c as well as
common functions that are used by both lazyvacuum.c and vacuumparallel.c to
vacuum.c. The third patch adds regression tests for parallel vacuum on
different kinds of indexes with multiple index scans. Please review them.Thanks for updating the patch.
I reviewed the 0001 patch and didn’t find some big issues in the patch.I only have a personally suggestion for the following function name:
parallel_vacuum_process_unsafe_indexes
parallel_vacuum_index_is_parallel_safeIt seems not only "unsafe" index are processed in the above functions,
but also index which is unsuitable(based on parallel_vacuum_should_skip_index).
I have given one comment to remove the call to
parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe(). If Sawada-San follows that
then maybe your point will be addressed.
--
With Regards,
Amit Kapila.
On Fri, Dec 3, 2021 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
The new test proposed by v4-0003 is increasing the vacuum_parallel.sql
timing by more than 10 times. It appears to be taking the highest time
among all the tests in make check. I think it is because of a large
amount of data being processed by the test. I think it is good to use
it for testing of patches during development but won't be a good idea
to commit. Can we reduce its timings?
--
With Regards,
Amit Kapila.
and
On Fri, Dec 3, 2021 at 6:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
The new test proposed by v4-0003 is increasing the vacuum_parallel.sql
timing by more than 10 times. It appears to be taking the highest time
among all the tests in make check. I think it is because of a large
amount of data being processed by the test.
Right.
I think it is good to use
it for testing of patches during development but won't be a good idea
to commit. Can we reduce its timings?
On reflection, we already have test cases for:
* a parallel vacuum does bulkdeletion and cleanup
* a parallel vacuum does only cleanup
and the case that the new tests add is that a vacuum does bulkdeletion
twice and cleanup. Given the increasing the regression test time, the
thrid patch might not worth to be added.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
Thank you for the comments!
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the function
Agreed.
and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.
I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.
Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().
We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?
parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?
3. /* * Copy the index bulk-deletion result returned from ambulkdelete and @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel, * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. */ - if (shared_istat && !shared_istat->updated && istat_res != NULL) + if (!pindstats->istat_updated && istat_res != NULL) { - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult)); - shared_istat->updated = true; + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult)); + pindstats->istat_updated = true;/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?
What do you mean by "local cleanup"?
4. + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */ + est_shared_len = MAXALIGN(sizeof(LVShared)); shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);Do we need MAXALIGN here? I think previously we required it here
because immediately after this we were writing index stats but now
those are allocated separately. Normally, shm_toc_estimate_chunk()
aligns the size but sometimes we need to do it so as to avoid
unaligned accessing of shared mem. I am really not sure whether we
require it for dead_items, do you remember why it is there in the
first place.
Indeed. I don't remember that. Probably it's an oversight.
5. The below-updated comment based on one of my previous suggestions seems to be missing in this version. + /* + * Not safe, if the index supports parallel cleanup conditionally, + * but we have already processed the index (for bulkdelete). We do + * this to avoid the need to invoke workers when parallel index + * cleanup doesn't need to scan the index. See the comments for + * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes + * support parallel cleanup conditionally. + */
Added.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
Thank you for the comments!
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the functionAgreed.
and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.
Okay.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?
I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().
3. /* * Copy the index bulk-deletion result returned from ambulkdelete and @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel, * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. */ - if (shared_istat && !shared_istat->updated && istat_res != NULL) + if (!pindstats->istat_updated && istat_res != NULL) { - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult)); - shared_istat->updated = true; + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult)); + pindstats->istat_updated = true;/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?What do you mean by "local cleanup"?
Clean-up performed by leader backend.
4. + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */ + est_shared_len = MAXALIGN(sizeof(LVShared)); shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);Do we need MAXALIGN here? I think previously we required it here
because immediately after this we were writing index stats but now
those are allocated separately. Normally, shm_toc_estimate_chunk()
aligns the size but sometimes we need to do it so as to avoid
unaligned accessing of shared mem. I am really not sure whether we
require it for dead_items, do you remember why it is there in the
first place.Indeed. I don't remember that. Probably it's an oversight.
Yeah, I also think so.
--
With Regards,
Amit Kapila.
On Mon, Dec 6, 2021 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
Thank you for the comments!
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the functionAgreed.
and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.Okay.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().3. /* * Copy the index bulk-deletion result returned from ambulkdelete and @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel, * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. */ - if (shared_istat && !shared_istat->updated && istat_res != NULL) + if (!pindstats->istat_updated && istat_res != NULL) { - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult)); - shared_istat->updated = true; + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult)); + pindstats->istat_updated = true;/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?What do you mean by "local cleanup"?
Clean-up performed by leader backend.
I might be missing your points but I think the patch doesn't change
the behavior around these codes. With the patch, we allocate
IndexBulkDeleteResult on DSM for every index but the patch doesn't
change the fact that in parallel vacuum/cleanup cases, we copy
IndexBulkDeleteResult returned from ambulkdelete() or amvacuumcleanup
to DSM space. Non-parallel vacuum doesn't use this function. Do you
have any suggestions on better comments here?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Dec 7, 2021 at 6:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 6, 2021 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
3. /* * Copy the index bulk-deletion result returned from ambulkdelete and @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel, * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. */ - if (shared_istat && !shared_istat->updated && istat_res != NULL) + if (!pindstats->istat_updated && istat_res != NULL) { - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult)); - shared_istat->updated = true; + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult)); + pindstats->istat_updated = true;/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?What do you mean by "local cleanup"?
Clean-up performed by leader backend.
I might be missing your points but I think the patch doesn't change
the behavior around these codes. With the patch, we allocate
IndexBulkDeleteResult on DSM for every index but the patch doesn't
change the fact that in parallel vacuum/cleanup cases, we copy
IndexBulkDeleteResult returned from ambulkdelete() or amvacuumcleanup
to DSM space. Non-parallel vacuum doesn't use this function.
I was talking about when we call parallel_vacuum_process_one_index()
via parallel_vacuum_process_unsafe_indexes() where the leader
processes the indexes that will be skipped by workers. Isn't that case
slightly different now because previously in that case we would not
have done the copy but now we will copy the stats in that case as
well? Am, I missing something?
--
With Regards,
Amit Kapila.
On Tue, Dec 7, 2021 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 7, 2021 at 6:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 6, 2021 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
3. /* * Copy the index bulk-deletion result returned from ambulkdelete and @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel, * Since all vacuum workers write the bulk-deletion result at different * slots we can write them without locking. */ - if (shared_istat && !shared_istat->updated && istat_res != NULL) + if (!pindstats->istat_updated && istat_res != NULL) { - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult)); - shared_istat->updated = true; + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult)); + pindstats->istat_updated = true;/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?What do you mean by "local cleanup"?
Clean-up performed by leader backend.
I might be missing your points but I think the patch doesn't change
the behavior around these codes. With the patch, we allocate
IndexBulkDeleteResult on DSM for every index but the patch doesn't
change the fact that in parallel vacuum/cleanup cases, we copy
IndexBulkDeleteResult returned from ambulkdelete() or amvacuumcleanup
to DSM space. Non-parallel vacuum doesn't use this function.I was talking about when we call parallel_vacuum_process_one_index()
via parallel_vacuum_process_unsafe_indexes() where the leader
processes the indexes that will be skipped by workers. Isn't that case
slightly different now because previously in that case we would not
have done the copy but now we will copy the stats in that case as
well? Am, I missing something?
I got your point. Yes, with the patch, we copy the stats to DSM even
if the index doesn't participate in parallel vacuum at all.
Previously, these statistics are allocated in the local memory.
Updated the comments at the declaration of lvpindstats.
I've attached an updated patch. I've removed 0003 patch that added
regression tests as per discussion. Regarding the terminology like
"bulkdel" and "cleanup" you pointed out, I've done that in 0002 patch
while moving the code to vacuumparallel.c. In that file, we can
consistently use the terms "bulkdel" and "cleanup" instead of "vacuum"
and "cleanup".
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v5-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v5-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From 4db6254aab3126cfcc221e80e92c0b2a5a1d50a7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Dec 2021 14:35:05 +0900
Subject: [PATCH v5 2/2] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1179 ++-----------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 156 ++++
src/backend/commands/vacuumparallel.c | 1119 +++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 42 +
src/tools/pgindent/typedefs.list | 2 +
8 files changed, 1384 insertions(+), 1118 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ec9d8fedbe..6652094d99 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,148 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where
- * parallel index vacuuming is unsafe or not worthwhile (i.g.,
- * parallel_vacuum_should_skip_index() returns true). During parallel
- * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
- * copied into local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -308,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -332,9 +177,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -420,30 +270,12 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_index_statistics(LVRelState *vacrel);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_should_skip_index(Relation indrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
-
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -901,7 +733,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2036,7 +1868,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2079,7 +1911,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2208,7 +2039,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2247,8 +2077,21 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2400,7 +2243,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2621,349 +2464,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /* Set index vacuum status and mark as parallel safe or not */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
-
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- parallel_vacuum_index_is_parallel_safe(vacrel,
- vacrel->indrels[i],
- vacuum);
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in parallel_vacuum_process_unsafe_indexes() by leader)
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each
- * worker touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2988,8 +2494,23 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
}
}
@@ -3037,13 +2558,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = bulkdel_one_index(&ivinfo, istat, vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3078,7 +2593,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3094,24 +2608,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3473,19 +2970,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3496,9 +2980,12 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3522,16 +3009,22 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_begin(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3554,75 +3047,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3746,77 +3172,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3827,7 +3182,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3849,414 +3204,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- pfree(will_parallel_vacuum);
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Check if the index is a totally unsuitable target for all parallel
- * processing up front. For example, the index could be
- * < min_parallel_index_scan_size cutoff.
- */
-static bool
-parallel_vacuum_should_skip_index(Relation indrel)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- return true;
-
- return false;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- return false;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally,
- * but we have already processed the index (for bulkdelete). We do
- * this to avoid the need to invoke workers when parallel index
- * cleanup doesn't need to scan the index. See the comments for
- * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
- * support parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- LVDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..70a719f16c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,155 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Delete all the index tuples containing a TID collected in
+ * dead_items. Also update running statistics. Exact details depend
+ * on index AM's ambulkdelete routine.
+ *
+ * reltuples is the number of table tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ * See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine. reltuples is the number
+ * of table tuples and estimated_count is true if reltuples is an
+ * estimated value. See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..e4024b7e67
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1119 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input table tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for table scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as an index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (i.g.,
+ * parallel_vacuum_should_skip_index() returns true). During parallel
+ * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
+ * copied into local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success (when we can launch one or more workers), return parallel vacuum
+ * state. Otherwise, return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_begin(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = MAXALIGN(vac_max_items_to_alloc_size(max_items));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /*
+ * Skip indexes that are unsuitable target for parallel index vacuum
+ */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ pfree(will_parallel_vacuum);
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions;
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
+
+ return false;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* no_bulkdel_call is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * no_bulkdel_call must be true if there was no parallel_vacuum_bulkdel_all_indexes
+ * call in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool no_bulkdel_call)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, no_bulkdel_call);
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ int nworkers;
+ ErrorContextCallback errcallback;
+ PVIndVacStatus new_status = bulkdel
+ ? PARALLEL_INDVAC_STATUS_NEED_BULKDELETE
+ : PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ Assert(!IsParallelWorker());
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (no_bulkdel_call)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
+ bulkdel,
+ no_bulkdel_call);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of table.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+
+ pvs->first_time = false;
+ }
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone vacuums all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to table scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Return false if the index is unsuitable target for parallel index
+ * vacuum
+ */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
+
+ /* In bulk-deletion case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (!no_bulkdel_call &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are vacuumed by workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming indexes that are unsafe for workers (these are
+ * processed in parallel_vacuum_unsafe_indexes() by leader)
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do bulkdelete or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space (set as worker's dead_items below) */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..88e0154d60 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores dead TIDs collected during the heap scan.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +302,28 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_begin(Relation rel, Relation *indrels,
+ int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f41ef0d2bc..017ea7091c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1808,6 +1808,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
@@ -2798,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
v5-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchapplication/octet-stream; name=v5-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchDownload
From b6d8530a1dc16de2bae6179bd2a4dfc33858baf2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v5 1/2] Refactor parallel vacuum to remove bitmap-related
code.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming
is safe and had null-bitmap in shmem area to access them. This logic
was too complicated with a small benefit of saving only a few bits per
indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for
every index, even those indexes where parallel index vacuuming is
unsafe or not worthwhile. This commit makes the code clear by removing
all bitmap-related code.
Also, add the check each index vacuum status after parallel index
vacuum to make sure that all indexes have been processed.
Finaly, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
An upcoming patch also refactors parallel vacuum further to make it
generic so that any table AM can utilize parallel vacuum functionality.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 605 +++++++++++++--------------
1 file changed, 293 insertions(+), 312 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87b..ec9d8fedbe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -181,14 +182,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,6 +266,16 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where
+ * parallel index vacuuming is unsafe or not worthwhile (i.g.,
+ * parallel_vacuum_should_skip_index() returns true). During parallel
+ * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
+ * copied into local memory at the end of parallel vacuum.
+ */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
@@ -391,18 +405,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -425,14 +427,23 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2248,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2611,76 +2622,54 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
+ LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
+ Assert(!IsParallelWorker());
+ Assert(ParallelVacuumIsActive(vacrel));
+ Assert(vacrel->nindexes > 0);
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ }
else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
/* The leader process will participate */
nworkers--;
@@ -2688,21 +2677,33 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
/*
* It is possible that parallel context is initialized with fewer workers
* than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ * phase, so we need to consider it. See parallel_vacuum_compute_workers().
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(vacrel,
+ vacrel->indrels[i],
+ vacuum);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2735,28 +2736,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ parallel_vacuum_process_unsafe_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2771,6 +2772,21 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2787,7 +2803,8 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2799,39 +2816,27 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
for (;;)
{
int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pis;
/* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
/* Done for all indexes? */
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ pis = &(pindstats[idx]);
/*
* Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * processed in parallel_vacuum_process_unsafe_indexes() by leader)
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!pis->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
}
/*
@@ -2847,15 +2852,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * on details like whether we're performing index cleanup right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by parallel_vacuum_compute_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2866,28 +2872,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
/*
@@ -2904,29 +2897,37 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
/*
* Update the pointer to the corresponding bulk-deletion result if someone
* has already updated it
*/
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (pindstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2941,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -2987,7 +2989,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
@@ -3520,7 +3522,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3554,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3758,7 +3760,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
@@ -3779,14 +3781,15 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
continue;
will_parallel_vacuum[idx] = true;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
nindexes_parallel_bulkdel++;
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
@@ -3855,7 +3858,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3866,12 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3884,8 +3889,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
will_parallel_vacuum);
if (parallel_workers <= 0)
{
@@ -3902,47 +3906,19 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(LVShared);
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
+ est_dead_items_len = max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3973,6 +3949,41 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +3997,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4055,7 +4051,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +4062,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4095,67 +4082,58 @@ end_parallel_vacuum(LVRelState *vacrel)
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
*/
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- p += sizeof(LVSharedIndStats);
- }
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
- return (LVSharedIndStats *) p;
+ return false;
}
/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return false;
- }
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index. See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
return true;
}
@@ -4171,6 +4149,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4169,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4190,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4240,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
--
2.24.3 (Apple Git-128)
On Tuesday, December 7, 2021 1:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. I've removed 0003 patch that added
regression tests as per discussion. Regarding the terminology like "bulkdel"
and "cleanup" you pointed out, I've done that in 0002 patch while moving the
code to vacuumparallel.c. In that file, we can consistently use the terms
"bulkdel" and "cleanup" instead of "vacuum"
and "cleanup".
Hi,
Thanks for updating the patch.
I noticed few minor things.
0001
1)
* Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * processed in parallel_vacuum_process_unsafe_indexes() by leader)
It might be clearer to mention that the index to be skipped are unsafe OR not
worthwhile.
2)
+ /* Set index vacuum status and mark as parallel safe or not */
+ for (int i = 0; i < pvc->nindexes; i++)
+ {
...
+ pindstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(vacrel,
+ vacrel->indrels[i],
+ vacuum);
For the comments above the loop, maybe better to mention we are marking whether
worker can process the index(not only safe/unsafe).
0002
3)
+ /*
+ * Skip indexes that are unsuitable target for parallel index vacuum
+ */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
It seems we can use will_parallel_vacuum[] here instead of invoking the function
again.
Best regards,
Hou zj
On Wed, Dec 8, 2021 at 12:22 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Tuesday, December 7, 2021 1:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. I've removed 0003 patch that added
regression tests as per discussion. Regarding the terminology like "bulkdel"
and "cleanup" you pointed out, I've done that in 0002 patch while moving the
code to vacuumparallel.c. In that file, we can consistently use the terms
"bulkdel" and "cleanup" instead of "vacuum"
and "cleanup".Hi,
Thanks for updating the patch.
I noticed few minor things.
Thank you for the comments!
0001
1)* Skip processing indexes that are unsafe for workers (these are - * processed in do_serial_processing_for_unsafe_indexes() by leader) + * processed in parallel_vacuum_process_unsafe_indexes() by leader)It might be clearer to mention that the index to be skipped are unsafe OR not
worthwhile.
Agreed. Will add the comments.
2) + /* Set index vacuum status and mark as parallel safe or not */ + for (int i = 0; i < pvc->nindexes; i++) + { ... + pindstats->parallel_workers_can_process = + parallel_vacuum_index_is_parallel_safe(vacrel, + vacrel->indrels[i], + vacuum);For the comments above the loop, maybe better to mention we are marking whether
worker can process the index(not only safe/unsafe).
Right. WIll fix.
0002
3)+ /* + * Skip indexes that are unsuitable target for parallel index vacuum + */ + if (parallel_vacuum_should_skip_index(indrel)) + continue; +It seems we can use will_parallel_vacuum[] here instead of invoking the function
again.
Oops, I missed updating it in 0002 patch. Will fix.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Dec 8, 2021 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Dec 8, 2021 at 12:22 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:On Tuesday, December 7, 2021 1:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. I've removed 0003 patch that added
regression tests as per discussion. Regarding the terminology like "bulkdel"
and "cleanup" you pointed out, I've done that in 0002 patch while moving the
code to vacuumparallel.c. In that file, we can consistently use the terms
"bulkdel" and "cleanup" instead of "vacuum"
and "cleanup".Hi,
Thanks for updating the patch.
I noticed few minor things.Thank you for the comments!
0001
1)* Skip processing indexes that are unsafe for workers (these are - * processed in do_serial_processing_for_unsafe_indexes() by leader) + * processed in parallel_vacuum_process_unsafe_indexes() by leader)It might be clearer to mention that the index to be skipped are unsafe OR not
worthwhile.Agreed. Will add the comments.
2) + /* Set index vacuum status and mark as parallel safe or not */ + for (int i = 0; i < pvc->nindexes; i++) + { ... + pindstats->parallel_workers_can_process = + parallel_vacuum_index_is_parallel_safe(vacrel, + vacrel->indrels[i], + vacuum);For the comments above the loop, maybe better to mention we are marking whether
worker can process the index(not only safe/unsafe).Right. WIll fix.
0002
3)+ /* + * Skip indexes that are unsuitable target for parallel index vacuum + */ + if (parallel_vacuum_should_skip_index(indrel)) + continue; +It seems we can use will_parallel_vacuum[] here instead of invoking the function
again.Oops, I missed updating it in 0002 patch. Will fix.
I've attached updated patches. Please review them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v6-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v6-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From e21096e690340db04c0c8a7fef22b2b6265a4d8b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Dec 2021 14:35:05 +0900
Subject: [PATCH v6 2/2] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1183 ++-----------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 156 ++++
src/backend/commands/vacuumparallel.c | 1123 +++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 42 +
src/tools/pgindent/typedefs.list | 2 +
8 files changed, 1388 insertions(+), 1122 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index eff6b1cfed..6652094d99 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,148 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where
- * parallel index vacuuming is unsafe or not worthwhile (i.g.,
- * parallel_vacuum_should_skip_index() returns true). During parallel
- * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
- * copied into local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -308,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -332,9 +177,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -420,30 +270,12 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static void update_index_statistics(LVRelState *vacrel);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_should_skip_index(Relation indrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
-
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -901,7 +733,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2036,7 +1868,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2079,7 +1911,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2208,7 +2039,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2247,8 +2077,21 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2400,7 +2243,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2621,353 +2464,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
-
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- parallel_vacuum_index_is_parallel_safe(vacrel,
- vacrel->indrels[i],
- vacuum);
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing indexes that are unsafe for workers or unsuitable
- * target for parallel index vacuum (these are processed in
- * parallel_vacuum_process_unsafe_indexes() by leader)
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each
- * worker touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2992,8 +2494,23 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
}
}
@@ -3041,13 +2558,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = bulkdel_one_index(&ivinfo, istat, vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3082,7 +2593,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3098,24 +2608,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3477,19 +2970,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3500,9 +2980,12 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3526,16 +3009,22 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_begin(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3558,75 +3047,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3750,77 +3172,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3831,7 +3182,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3853,414 +3204,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- pfree(will_parallel_vacuum);
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Check if the index is a totally unsuitable target for all parallel
- * processing up front. For example, the index could be
- * < min_parallel_index_scan_size cutoff.
- */
-static bool
-parallel_vacuum_should_skip_index(Relation indrel)
-{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- return true;
-
- return false;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (parallel_vacuum_should_skip_index(indrel))
- return false;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally,
- * but we have already processed the index (for bulkdelete). We do
- * this to avoid the need to invoke workers when parallel index
- * cleanup doesn't need to scan the index. See the comments for
- * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
- * support parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- LVDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..70a719f16c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,155 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Delete all the index tuples containing a TID collected in
+ * dead_items. Also update running statistics. Exact details depend
+ * on index AM's ambulkdelete routine.
+ *
+ * reltuples is the number of table tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ * See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine. reltuples is the number
+ * of table tuples and estimated_count is true if reltuples is an
+ * estimated value. See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..e0b361c949
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1123 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input table tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for table scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as an index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (i.g.,
+ * parallel_vacuum_should_skip_index() returns true). During parallel
+ * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
+ * copied into local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success (when we can launch one or more workers), return parallel vacuum
+ * state. Otherwise, return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_begin(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = MAXALIGN(vac_max_items_to_alloc_size(max_items));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /*
+ * Skip indexes that are unsuitable target for parallel index vacuum
+ */
+ if (!will_parallel_vacuum[i])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ pfree(will_parallel_vacuum);
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions;
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
+
+ return false;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* no_bulkdel_call is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * no_bulkdel_call must be true if there was no parallel_vacuum_bulkdel_all_indexes
+ * call in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool no_bulkdel_call)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, no_bulkdel_call);
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ int nworkers;
+ ErrorContextCallback errcallback;
+ PVIndVacStatus new_status = bulkdel
+ ? PARALLEL_INDVAC_STATUS_NEED_BULKDELETE
+ : PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ Assert(!IsParallelWorker());
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (no_bulkdel_call)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
+ bulkdel,
+ no_bulkdel_call);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of table.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+
+ pvs->first_time = false;
+ }
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone vacuums all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to table scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Return false if the index is unsuitable target for parallel index
+ * vacuum
+ */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
+
+ /* In bulk-deletion case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (!no_bulkdel_call &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are vacuumed by workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip processing indexes that are unsafe for workers or unsuitable
+ * target for parallel index vacuum (these are processed in
+ * parallel_vacuum_process_unsafe_indexes() by leader)
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do bulkdelete or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space (set as worker's dead_items below) */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..88e0154d60 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores dead TIDs collected during the heap scan.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +302,28 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_begin(Relation rel, Relation *indrels,
+ int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f41ef0d2bc..017ea7091c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1808,6 +1808,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
@@ -2798,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
v6-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchapplication/octet-stream; name=v6-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchDownload
From 289600861a056012c70cb376c33e4d3d57397f25 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v6 1/2] Refactor parallel vacuum to remove bitmap-related
code.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming
is safe and had null-bitmap in shmem area to access them. This logic
was too complicated with a small benefit of saving only a few bits per
indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for
every index, even those indexes where parallel index vacuuming is
unsafe or not worthwhile. This commit makes the code clear by removing
all bitmap-related code.
Also, add the check each index vacuum status after parallel index
vacuum to make sure that all indexes have been processed.
Finaly, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
An upcoming patch also refactors parallel vacuum further to make it
generic so that any table AM can utilize parallel vacuum functionality.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 611 +++++++++++++--------------
1 file changed, 298 insertions(+), 313 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87b..eff6b1cfed 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -181,14 +182,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,6 +266,16 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where
+ * parallel index vacuuming is unsafe or not worthwhile (i.g.,
+ * parallel_vacuum_should_skip_index() returns true). During parallel
+ * vacuum, IndexBulkDeleteResult of each index is kept in DSM and is
+ * copied into local memory at the end of parallel vacuum.
+ */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
@@ -391,18 +405,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -425,14 +427,23 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_should_skip_index(Relation indrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
+
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2248,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2611,76 +2622,54 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
+ LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
+ Assert(!IsParallelWorker());
+ Assert(ParallelVacuumIsActive(vacrel));
+ Assert(vacrel->nindexes > 0);
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ }
else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
/* The leader process will participate */
nworkers--;
@@ -2688,21 +2677,36 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
/*
* It is possible that parallel context is initialized with fewer workers
* than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ * phase, so we need to consider it. See parallel_vacuum_compute_workers().
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ parallel_vacuum_index_is_parallel_safe(vacrel,
+ vacrel->indrels[i],
+ vacuum);
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2735,28 +2739,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ parallel_vacuum_process_unsafe_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2771,6 +2775,21 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2787,7 +2806,8 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2799,39 +2819,28 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
for (;;)
{
int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pis;
/* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
/* Done for all indexes? */
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ pis = &(pindstats[idx]);
/*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * Skip processing indexes that are unsafe for workers or unsuitable
+ * target for parallel index vacuum (these are processed in
+ * parallel_vacuum_process_unsafe_indexes() by leader)
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!pis->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
}
/*
@@ -2847,15 +2856,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * on details like whether we're performing index cleanup right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by parallel_vacuum_compute_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2866,28 +2876,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
-
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
/*
@@ -2904,29 +2901,37 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
/*
* Update the pointer to the corresponding bulk-deletion result if someone
* has already updated it
*/
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (pindstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2945,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -2987,7 +2993,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
@@ -3520,7 +3526,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3558,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3758,7 +3764,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
@@ -3779,14 +3785,15 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
continue;
will_parallel_vacuum[idx] = true;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
nindexes_parallel_bulkdel++;
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
@@ -3855,7 +3862,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3870,12 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3884,8 +3893,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
will_parallel_vacuum);
if (parallel_workers <= 0)
{
@@ -3902,47 +3910,19 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(LVShared);
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
+ est_dead_items_len = max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3973,6 +3953,41 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +4001,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4055,7 +4055,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +4066,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4095,67 +4086,58 @@ end_parallel_vacuum(LVRelState *vacrel)
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
+ * Check if the index is a totally unsuitable target for all parallel
+ * processing up front. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
*/
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
+static bool
+parallel_vacuum_should_skip_index(Relation indrel)
{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- p += sizeof(LVSharedIndStats);
- }
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ return true;
- return (LVSharedIndStats *) p;
+ return false;
}
/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (parallel_vacuum_should_skip_index(indrel))
+ return false;
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return false;
- }
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index. See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
return true;
}
@@ -4171,6 +4153,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4173,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4194,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4244,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
--
2.24.3 (Apple Git-128)
On Mon, Dec 6, 2021 at 10:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
Thank you for the comments!
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the functionAgreed.
and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.Okay.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().
This point doesn't seem to be addressed in the latest version (v6). Is
there a reason for not doing it? If we do this, then we don't need to
call parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe().
--
With Regards,
Amit Kapila.
On Thu, Dec 9, 2021 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 6, 2021 at 10:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().This point doesn't seem to be addressed in the latest version (v6). Is
there a reason for not doing it? If we do this, then we don't need to
call parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe().
Few minor comments on v6-0001
==========================
1.
The array
+ * element is allocated for every index, even those indexes where
+ * parallel index vacuuming is unsafe or not worthwhile (i.g.,
+ * parallel_vacuum_should_skip_index() returns true).
/i.g/e.g
2.
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared,
int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
+ bool *will_parallel_vacuum);
In declaration, parallel_vacuum_compute_workers() is declared after
update_index_statistics but later defined in reverse order. I suggest
to make the order of definitions same as their declaration. Similarly,
the order of definition of parallel_vacuum_process_all_indexes(),
parallel_vacuum_process_unsafe_indexes(),
parallel_vacuum_process_safe_indexes(),
parallel_vacuum_process_one_index() doesn't match the order of their
declaration. Can we change that as well?
--
With Regards,
Amit Kapila.
On Thu, Dec 9, 2021 at 7:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 6, 2021 at 10:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches.
I have a few comments on v4-0001.
Thank you for the comments!
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the functionAgreed.
and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.Okay.
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().This point doesn't seem to be addressed in the latest version (v6). Is
there a reason for not doing it? If we do this, then we don't need to
call parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe().
Probably I had misunderstood your point. I'll fix it in the next
version patch and send it soon.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 9, 2021 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 6, 2021 at 10:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code.Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?I was thinking to set the results of will_vacuum_parallel in
parallel_vacuum_begin().This point doesn't seem to be addressed in the latest version (v6). Is
there a reason for not doing it? If we do this, then we don't need to
call parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe().Few minor comments on v6-0001 ========================== 1. The array + * element is allocated for every index, even those indexes where + * parallel index vacuuming is unsafe or not worthwhile (i.g., + * parallel_vacuum_should_skip_index() returns true)./i.g/e.g
2. static void update_index_statistics(LVRelState *vacrel); -static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested); -static void end_parallel_vacuum(LVRelState *vacrel); -static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx); -static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared); + +static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested, + bool *will_parallel_vacuum);In declaration, parallel_vacuum_compute_workers() is declared after
update_index_statistics but later defined in reverse order. I suggest
to make the order of definitions same as their declaration. Similarly,
the order of definition of parallel_vacuum_process_all_indexes(),
parallel_vacuum_process_unsafe_indexes(),
parallel_vacuum_process_safe_indexes(),
parallel_vacuum_process_one_index() doesn't match the order of their
declaration. Can we change that as well?
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v7-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchapplication/octet-stream; name=v7-0001-Refactor-parallel-vacuum-to-remove-bitmap-related.patchDownload
From 0bf0b20a78f8ad420a56b82a02ba4340d9f786f5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v7 1/2] Refactor parallel vacuum to remove bitmap-related
code.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming
is safe and had null-bitmap in shmem area to access them. This logic
was too complicated with a small benefit of saving only a few bits per
indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for
every index, even those indexes where parallel index vacuuming is
unsafe or not worthwhile. This commit makes the code clear by removing
all bitmap-related code.
Also, add the check each index vacuum status after parallel index
vacuum to make sure that all indexes have been processed.
Finaly, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
An upcoming patch also refactors parallel vacuum further to make it
generic so that any table AM can utilize parallel vacuum functionality.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 609 +++++++++++++--------------
1 file changed, 287 insertions(+), 322 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87b..d66f0fbd41 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -181,14 +182,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,12 +266,29 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where
+ * parallel index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into local
+ * memory at the end of parallel vacuum.
+ */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -391,18 +412,13 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -425,14 +441,13 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
+ bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2252,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2611,76 +2626,54 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
+ LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
+ Assert(!IsParallelWorker());
+ Assert(ParallelVacuumIsActive(vacrel));
+ Assert(vacrel->nindexes > 0);
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ }
else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
/* The leader process will participate */
nworkers--;
@@ -2688,21 +2681,35 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
/*
* It is possible that parallel context is initialized with fewer workers
* than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ * phase, so we need to consider it. See parallel_vacuum_compute_workers().
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ (lps->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
+ vacuum));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2735,28 +2742,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ parallel_vacuum_process_unsafe_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2771,6 +2778,21 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2787,7 +2809,8 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2799,39 +2822,28 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
for (;;)
{
int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pis;
/* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
/* Done for all indexes? */
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ pis = &(pindstats[idx]);
/*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * Skip processing indexes that are unsafe for workers or unsuitable
+ * target for parallel index vacuum (these are processed in
+ * parallel_vacuum_process_unsafe_indexes() by leader)
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!pis->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
}
/*
@@ -2847,15 +2859,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * on details like whether we're performing index cleanup right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by parallel_vacuum_compute_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2866,28 +2879,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
/*
@@ -2904,29 +2904,37 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
/*
* Update the pointer to the corresponding bulk-deletion result if someone
* has already updated it
*/
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (pindstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2948,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each
+ * worker touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -2987,7 +2996,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
@@ -3445,8 +3454,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3520,7 +3527,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3559,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3758,7 +3765,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
@@ -3779,8 +3786,9 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;;
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
continue;
@@ -3855,7 +3863,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3871,12 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3884,8 +3894,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
will_parallel_vacuum);
if (parallel_workers <= 0)
{
@@ -3901,48 +3910,21 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
parallel_workers);
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
+ lps->will_parallel_vacuum = will_parallel_vacuum;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(LVShared);
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
+ est_dead_items_len = max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3973,6 +3955,41 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +4003,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4038,8 +4040,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
- pfree(will_parallel_vacuum);
-
/* Success -- set dead_items and lps in leader's vacrel state */
vacrel->dead_items = dead_items;
vacrel->lps = lps;
@@ -4055,7 +4055,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +4066,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
-
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4090,72 +4081,43 @@ end_parallel_vacuum(LVRelState *vacrel)
ExitParallelMode();
/* Deactivate parallel vacuum */
+ pfree(lps->will_parallel_vacuum);
pfree(lps);
vacrel->lps = NULL;
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
-/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return false;
- }
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete). We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index. See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
return true;
}
@@ -4171,6 +4133,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4153,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4174,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4224,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
--
2.24.3 (Apple Git-128)
v7-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v7-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From b121332f9e619be3b407fc2ec1328f61da9eaac8 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 1 Dec 2021 14:35:05 +0900
Subject: [PATCH v7 2/2] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1166 ++-----------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 156 ++++
src/backend/commands/vacuumparallel.c | 1106 +++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 42 +
src/tools/pgindent/typedefs.list | 2 +
8 files changed, 1371 insertions(+), 1105 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d66f0fbd41..53d6523281 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,155 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where
- * parallel index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into local
- * memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be
- * < min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -315,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -339,9 +177,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -412,13 +255,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -434,20 +270,11 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -905,7 +732,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +1867,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2083,7 +1910,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2212,7 +2038,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2251,8 +2076,21 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2404,7 +2242,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2625,352 +2463,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing indexes that are unsafe for workers or unsuitable
- * target for parallel index vacuum (these are processed in
- * parallel_vacuum_process_unsafe_indexes() by leader)
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each
- * worker touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2995,8 +2493,23 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
}
}
@@ -3044,13 +2557,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = bulkdel_one_index(&ivinfo, istat, vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3085,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3101,24 +2607,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3478,19 +2967,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3501,9 +2977,12 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3527,16 +3006,22 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_begin(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3559,75 +3044,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3751,77 +3169,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;;
-
- /* Skip indexes that are unsuitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3832,7 +3179,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3854,393 +3201,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally,
- * but we have already processed the index (for bulkdelete). We do
- * this to avoid the need to invoke workers when parallel index
- * cleanup doesn't need to scan the index. See the comments for
- * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
- * support parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- LVDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..70a719f16c 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,155 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Delete all the index tuples containing a TID collected in
+ * dead_items. Also update running statistics. Exact details depend
+ * on index AM's ambulkdelete routine.
+ *
+ * reltuples is the number of table tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ * See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine. reltuples is the number
+ * of table tuples and estimated_count is true if reltuples is an
+ * estimated value. See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..c8a04527bc
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1106 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk-deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items allocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input table tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for table scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as an index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/*
+ * Struct for maintaining a parallel vacuum state. This struct is used
+ * by both leader and worker processes. The parallel vacuum leader process
+ * uses it through a VACUUM operation. Therefore, the leader should use the
+ * same state to perform index bulk-deletion and index cleanup multiple times.
+ * The workers uses some fields of this structure.
+ */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be
+ * < min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success (when we can launch one or more workers), return parallel vacuum
+ * state. Otherwise, return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_begin(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(PVShared));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = MAXALIGN(vac_max_items_to_alloc_size(max_items));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ /*
+ * Skip indexes that are unsuitable target for parallel index vacuum
+ */
+ if (!will_parallel_vacuum[i])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers. The index
+ * is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip indexes that are unsuitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* no_bulkdel_call is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * no_bulkdel_call must be true if there was no parallel_vacuum_bulkdel_all_indexes
+ * call in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool no_bulkdel_call)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, no_bulkdel_call);
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ int nworkers;
+ ErrorContextCallback errcallback;
+ PVIndVacStatus new_status = bulkdel
+ ? PARALLEL_INDVAC_STATUS_NEED_BULKDELETE
+ : PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ Assert(!IsParallelWorker());
+
+ /* Determine the number of parallel workers to launch */
+ if (bulkdel)
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ else
+ {
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (no_bulkdel_call)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
+ bulkdel,
+ no_bulkdel_call));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of table.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+
+ pvs->first_time = false;
+ }
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader process alone vacuums all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to invalid (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to table scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Returns false, if the given index can't participate in parallel index
+ * vacuum or parallel index cleanup
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool no_bulkdel_call)
+{
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In bulk-deletion case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (!no_bulkdel_call &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, safe indexes as they are vacuumed by workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip processing indexes that are unsafe for workers or unsuitable
+ * target for parallel index vacuum (these are processed in
+ * parallel_vacuum_process_unsafe_indexes() by leader)
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do bulkdelete or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
+ false);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space (set as worker's dead_items below) */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..88e0154d60 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores dead TIDs collected during the heap scan.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +302,28 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_begin(Relation rel, Relation *indrels,
+ int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f41ef0d2bc..017ea7091c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1808,6 +1808,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
@@ -2798,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.
I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.
Let me know what you think of the attached?
--
With Regards,
Amit Kapila.
Attachments:
v8-0001-Improve-parallel-vacuum-implementation.patchapplication/octet-stream; name=v8-0001-Improve-parallel-vacuum-implementation.patchDownload
From 71b5041da4eef5ad2007a1a49f04ecaf2391bd5b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v8] Improve parallel vacuum implementation.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming is
safe and had null-bitmap in shmem area to access them. This logic was too
complicated with a small benefit of saving only a few bits per indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVParallelIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for every
index, even those indexes where parallel index vacuuming is unsafe or not
worthwhile. This commit makes the code clear by removing all
bitmap-related code.
Also, add the check each index vacuum status after parallel index vacuum
to make sure that all indexes have been processed.
Finally, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
Author: Masahiko Sawada, based on suggestions by Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1222 +++++++++++++++++-----------------
src/tools/pgindent/typedefs.list | 2 +
2 files changed, 597 insertions(+), 627 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f..6d9f890 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -182,14 +183,6 @@ typedef struct LVShared
int elevel;
/*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
- /*
* Fields for both index vacuum and cleanup.
*
* reltuples is the total number of input heap tuples. We set either old
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,6 +266,16 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
@@ -269,6 +283,13 @@ typedef struct LVParallelState
WalUsage *wal_usage;
/*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
*/
@@ -391,18 +412,6 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
@@ -425,14 +434,20 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2252,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2611,465 +2626,120 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
-{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
-
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
-
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
-
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
- else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
-
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
+ * lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
+lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- LVParallelState *lps = vacrel->lps;
-
Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
Assert(vacrel->nindexes > 0);
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /* Report that we are now cleaning up indexes */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
+ if (!ParallelVacuumIsActive(vacrel))
{
- if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
- ReinitializeParallelDSM(lps->pcxt);
- }
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
+ double reltuples = vacrel->new_rel_tuples;
+ bool estimated_count =
+ vacrel->tupcount_pages < vacrel->rel_pages;
- if (lps->pcxt->nworkers_launched > 0)
+ for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
+ Relation indrel = vacrel->indrels[idx];
+ IndexBulkDeleteResult *istat = vacrel->indstats[idx];
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+ vacrel->indstats[idx] =
+ lazy_cleanup_one_index(indrel, istat, reltuples,
+ estimated_count, vacrel);
}
-
- if (lps->lvshared->for_cleanup)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
-
- /*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
- */
- do_parallel_processing(vacrel, lps->lvshared);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
+ else
{
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
+ /* Outsource everything to parallel variant */
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
+ * lazy_vacuum_one_index() -- vacuum index relation.
+ *
+ * Delete all the index tuples containing a TID collected in
+ * vacrel->dead_items array. Also update running statistics.
+ * Exact details depend on index AM's ambulkdelete routine.
+ *
+ * reltuples is the number of heap tuples to be passed to the
+ * bulkdelete callback. It's always assumed to be estimated.
+ * See indexam.sgml for more info.
+ *
+ * Returns bulk delete stats derived from input stats
*/
-static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+static IndexBulkDeleteResult *
+lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
+ double reltuples, LVRelState *vacrel)
{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ IndexVacuumInfo ivinfo;
+ PGRUsage ru0;
+ LVSavedErrInfo saved_err_info;
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
+ pg_rusage_init(&ru0);
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.estimated_count = true;
+ ivinfo.message_level = elevel;
+ ivinfo.num_heap_tuples = reltuples;
+ ivinfo.strategy = vacrel->bstrategy;
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
+ /*
+ * Update error traceback information.
+ *
+ * The index name is saved during this phase and restored immediately
+ * after this phase. See vacuum_error_callback.
+ */
+ Assert(vacrel->indname == NULL);
+ vacrel->indname = pstrdup(RelationGetRelationName(indrel));
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_VACUUM_INDEX,
+ InvalidBlockNumber, InvalidOffsetNumber);
- indrel = vacrel->indrels[idx];
+ /* Do bulk deletion */
+ istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
+ (void *) vacrel->dead_items);
- /*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
- */
- if (!parallel_processing_is_safe(indrel, lvshared))
- continue;
+ ereport(elevel,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ vacrel->indname, vacrel->dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
+ pfree(vacrel->indname);
+ vacrel->indname = NULL;
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+ return istat;
}
/*
- * Perform parallel processing of indexes in leader process.
+ * lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
*
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * Calls index AM's amvacuumcleanup routine. reltuples is the number
+ * of heap tuples and estimated_count is true if reltuples is an
+ * estimated value. See indexam.sgml for more info.
*
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * Returns bulk delete stats derived from input stats
*/
-static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+static IndexBulkDeleteResult *
+lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
+ double reltuples, bool estimated_count,
+ LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+ IndexVacuumInfo ivinfo;
+ PGRUsage ru0;
+ LVSavedErrInfo saved_err_info;
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
-
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
- continue;
-
- /* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
-{
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
-
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
- {
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
- }
-
- return istat_res;
-}
-
-/*
- * lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
- */
-static void
-lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- Assert(!IsParallelWorker());
- Assert(vacrel->nindexes > 0);
-
- /* Report that we are now cleaning up indexes */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
-
- if (!ParallelVacuumIsActive(vacrel))
- {
- double reltuples = vacrel->new_rel_tuples;
- bool estimated_count =
- vacrel->tupcount_pages < vacrel->rel_pages;
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- IndexBulkDeleteResult *istat = vacrel->indstats[idx];
-
- vacrel->indstats[idx] =
- lazy_cleanup_one_index(indrel, istat, reltuples,
- estimated_count, vacrel);
- }
- }
- else
- {
- /* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
- }
-}
-
-/*
- * lazy_vacuum_one_index() -- vacuum index relation.
- *
- * Delete all the index tuples containing a TID collected in
- * vacrel->dead_items array. Also update running statistics.
- * Exact details depend on index AM's ambulkdelete routine.
- *
- * reltuples is the number of heap tuples to be passed to the
- * bulkdelete callback. It's always assumed to be estimated.
- * See indexam.sgml for more info.
- *
- * Returns bulk delete stats derived from input stats
- */
-static IndexBulkDeleteResult *
-lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
- double reltuples, LVRelState *vacrel)
-{
- IndexVacuumInfo ivinfo;
- PGRUsage ru0;
- LVSavedErrInfo saved_err_info;
-
- pg_rusage_init(&ru0);
-
- ivinfo.index = indrel;
- ivinfo.analyze_only = false;
- ivinfo.report_progress = false;
- ivinfo.estimated_count = true;
- ivinfo.message_level = elevel;
- ivinfo.num_heap_tuples = reltuples;
- ivinfo.strategy = vacrel->bstrategy;
-
- /*
- * Update error traceback information.
- *
- * The index name is saved during this phase and restored immediately
- * after this phase. See vacuum_error_callback.
- */
- Assert(vacrel->indname == NULL);
- vacrel->indname = pstrdup(RelationGetRelationName(indrel));
- update_vacuum_error_info(vacrel, &saved_err_info,
- VACUUM_ERRCB_PHASE_VACUUM_INDEX,
- InvalidBlockNumber, InvalidOffsetNumber);
-
- /* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
-
- /* Revert to the previous phase information for error traceback */
- restore_vacuum_error_info(vacrel, &saved_err_info);
- pfree(vacrel->indname);
- vacrel->indname = NULL;
-
- return istat;
-}
-
-/*
- * lazy_cleanup_one_index() -- do post-vacuum cleanup for index relation.
- *
- * Calls index AM's amvacuumcleanup routine. reltuples is the number
- * of heap tuples and estimated_count is true if reltuples is an
- * estimated value. See indexam.sgml for more info.
- *
- * Returns bulk delete stats derived from input stats
- */
-static IndexBulkDeleteResult *
-lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
- double reltuples, bool estimated_count,
- LVRelState *vacrel)
-{
- IndexVacuumInfo ivinfo;
- PGRUsage ru0;
- LVSavedErrInfo saved_err_info;
-
- pg_rusage_init(&ru0);
+ pg_rusage_init(&ru0);
ivinfo.index = indrel;
ivinfo.analyze_only = false;
@@ -3520,7 +3190,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3222,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3745,6 +3415,38 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
}
/*
+ * Update index statistics in pg_class if the statistics are accurate.
+ */
+static void
+update_index_statistics(LVRelState *vacrel)
+{
+ Relation *indrels = vacrel->indrels;
+ int nindexes = vacrel->nindexes;
+ IndexBulkDeleteResult **indstats = vacrel->indstats;
+
+ Assert(!IsInParallelMode());
+
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ IndexBulkDeleteResult *istat = indstats[idx];
+
+ if (istat == NULL || istat->estimated_count)
+ continue;
+
+ /* Update index statistics */
+ vac_update_relstats(indrel,
+ istat->num_pages,
+ istat->num_index_tuples,
+ 0,
+ false,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ false);
+ }
+}
+
+/*
* Compute the number of parallel worker processes to request. Both index
* vacuum and index cleanup can be executed with parallel workers. The index
* is eligible for parallel vacuum iff its size is greater than
@@ -3758,7 +3460,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
@@ -3781,6 +3483,7 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
Relation indrel = vacrel->indrels[idx];
uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ /* Skip index that is not a suitable target for parallel index vacuum */
if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
continue;
@@ -3815,38 +3518,6 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
}
/*
- * Update index statistics in pg_class if the statistics are accurate.
- */
-static void
-update_index_statistics(LVRelState *vacrel)
-{
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- IndexBulkDeleteResult **indstats = vacrel->indstats;
-
- Assert(!IsInParallelMode());
-
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- IndexBulkDeleteResult *istat = indstats[idx];
-
- if (istat == NULL || istat->estimated_count)
- continue;
-
- /* Update index statistics */
- vac_update_relstats(indrel,
- istat->num_pages,
- istat->num_index_tuples,
- 0,
- false,
- InvalidTransactionId,
- InvalidMultiXactId,
- false);
- }
-}
-
-/*
* Try to enter parallel mode and create a parallel context. Then initialize
* shared memory state.
*
@@ -3855,7 +3526,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3534,12 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3884,8 +3557,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
will_parallel_vacuum);
if (parallel_workers <= 0)
{
@@ -3901,50 +3573,23 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
parallel_workers);
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
+ lps->will_parallel_vacuum = will_parallel_vacuum;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(LVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ max_items = dead_items_max_items(vacrel);
+ est_dead_items_len = max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
/*
* Estimate space for BufferUsage and WalUsage --
@@ -3973,6 +3618,41 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +3666,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4038,8 +3703,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
- pfree(will_parallel_vacuum);
-
/* Success -- set dead_items and lps in leader's vacrel state */
vacrel->dead_items = dead_items;
vacrel->lps = lps;
@@ -4055,7 +3718,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +3729,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
-
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4090,74 +3744,385 @@ end_parallel_vacuum(LVRelState *vacrel)
ExitParallelMode();
/* Deactivate parallel vacuum */
+ pfree(lps->will_parallel_vacuum);
pfree(lps);
vacrel->lps = NULL;
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
+static bool
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- char *p;
+ uint8 vacoptions;
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
+{
+ LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
+
+ Assert(!IsParallelWorker());
+ Assert(ParallelVacuumIsActive(vacrel));
+ Assert(vacrel->nindexes > 0);
+
+ if (vacuum)
{
- if (IndStatsIsNull(lvshared, idx))
- continue;
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at
+ * least for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
+
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
+
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
- p += sizeof(LVSharedIndStats);
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, lps->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ (lps->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
+ vacuum));
}
- return (LVSharedIndStats *) p;
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (vacrel->num_index_scans > 0)
+ ReinitializeParallelDSM(lps->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(lps->pcxt, nworkers);
+
+ LaunchParallelWorkers(lps->pcxt);
+
+ if (lps->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
+ VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ lps->pcxt->nworkers_launched),
+ lps->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Process the indexes that can be processed by only leader process */
+ parallel_vacuum_process_unsafe_indexes(vacrel);
+
+ /*
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(lps->pcxt);
+
+ for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
}
/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to process the indexes in parallel.
*/
-static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+static void
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
- if (lvshared->for_cleanup)
+ /* Loop until all indexes are vacuumed */
+ for (;;)
{
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ int idx;
+ LVParallelIndStats *pis;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= vacrel->nindexes)
+ break;
+
+ pis = &(pindstats[idx]);
/*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
+ * Skip processing index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is processed in
+ * parallel_vacuum_process_unsafe_indexes() by the leader).
*/
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
+ if (!pis->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel processing of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs processing of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
+{
+ LVParallelState *lps = vacrel->lps;
+
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int idx = 0; idx < vacrel->nindexes; idx++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
+
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After processing the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
+
+ switch (pindstats->status)
{
- /* Skip if the index does not support parallel bulk deletion */
- return false;
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
}
- return true;
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!pindstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -4171,6 +4136,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4156,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4177,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4227,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f41ef0d..0c61ccb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1307,6 +1307,8 @@ LSEG
LUID
LVDeadTuples
LVPagePruneState
+LVParallelIndStats
+LVParallelIndVacStatus
LVParallelState
LVRelState
LVSavedErrInfo
--
1.8.3.1
Hi,
On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:
While working on the fix for #17255 (more specifically some cleanup that Peter
suggested in the context), I noticed another thing: Initializing parallelism
as part of dead_items_alloc() is a bad idea. Even if there are comments noting
that oddity.
I don't really see why we should do it this way? There's no "no-parallelism"
path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
it's not like we might just discover the inability to do parallelism during
parallel initialization?
It's also not particularly helpful to have a begin_parallel_vacuum() that
might not actually begin a parallel vacuum...
Minor nit:
begin_parallel_vacuum()'s comment says:
* On success (when we can launch one or more workers), will set dead_items and
* lps in vacrel for caller.
But it actually doesn't know whether we can start workers. It just checks
max_parallel_maintenance_workers, no?
Greetings,
Andres Freund
On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:
Thank you for the comments.
While working on the fix for #17255 (more specifically some cleanup that Peter
suggested in the context), I noticed another thing: Initializing parallelism
as part of dead_items_alloc() is a bad idea. Even if there are comments noting
that oddity.I don't really see why we should do it this way? There's no "no-parallelism"
path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
it's not like we might just discover the inability to do parallelism during
parallel initialization?
Right. Also, in parallel vacuum case, it allocates the space not only
for dead items but also other data required to do parallelism like
shared bulkdeletion results etc. Originally, in PG13,
begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
became part of dead_items_alloc() (see b4af70cb2). I agree to change
this part so that lazy_scan_heap() calls begin_parallel_vacuum()
(whatever we rename it). I'll incorporate this change in the
refactoring patch barring any objections.
It's also not particularly helpful to have a begin_parallel_vacuum() that
might not actually begin a parallel vacuum...
During the development, I found that we have some begin_* functions
that don't start the actual parallel job but prepare state data for
starting parallel job and referred to _bt_begin_parallel() so I named
begin_parallel_vacuum(). But I admit that considering what the
function actually does, something like
create_parallel_vacuum_context() would be clearer.
Minor nit:
begin_parallel_vacuum()'s comment says:
* On success (when we can launch one or more workers), will set dead_items and
* lps in vacrel for caller.But it actually doesn't know whether we can start workers. It just checks
max_parallel_maintenance_workers, no?
Yes, we cannot know whether we can actually start workers when
starting parallel index vacuuming. It returns non-NULL if we request
one or more workers.
Regards
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:Thank you for the comments.
While working on the fix for #17255 (more specifically some cleanup that Peter
suggested in the context), I noticed another thing: Initializing parallelism
as part of dead_items_alloc() is a bad idea. Even if there are comments noting
that oddity.I don't really see why we should do it this way? There's no "no-parallelism"
path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
it's not like we might just discover the inability to do parallelism during
parallel initialization?Right. Also, in parallel vacuum case, it allocates the space not only
for dead items but also other data required to do parallelism like
shared bulkdeletion results etc. Originally, in PG13,
begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
became part of dead_items_alloc() (see b4af70cb2). I agree to change
this part so that lazy_scan_heap() calls begin_parallel_vacuum()
(whatever we rename it). I'll incorporate this change in the
refactoring patch barring any objections.It's also not particularly helpful to have a begin_parallel_vacuum() that
might not actually begin a parallel vacuum...During the development, I found that we have some begin_* functions
that don't start the actual parallel job but prepare state data for
starting parallel job and referred to _bt_begin_parallel() so I named
begin_parallel_vacuum(). But I admit that considering what the
function actually does, something like
create_parallel_vacuum_context() would be clearer.
How about if we name it as parallel_vacuum_init() which will be
similar InitializeParallelDSM, ExecInitParallelPlan(). Now, I see
there is some reasoning to keep it in dead_items_alloc as both
primarily allocate memory for vacuum but maybe we should name the
function vacuum_space_alloc instead of dead_items_alloc and similarly
rename dead_items_cleanup to vacuum_space_free. The other idea could
be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I
personally prefer the idea to keep it where it is but improve function
names. Will it be better to do this as a separate patch as 0002
because this might require some change in the vacuum code path?
Minor nit:
begin_parallel_vacuum()'s comment says:
* On success (when we can launch one or more workers), will set dead_items and
* lps in vacrel for caller.But it actually doesn't know whether we can start workers. It just checks
max_parallel_maintenance_workers, no?Yes, we cannot know whether we can actually start workers when
starting parallel index vacuuming. It returns non-NULL if we request
one or more workers.
So can we adjust the comments? I think the part of the sentence "when
we can launch one or more workers" seems to be the cause of confusion,
can we remove it?
--
With Regards,
Amit Kapila.
On Mon, Dec 13, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:Thank you for the comments.
While working on the fix for #17255 (more specifically some cleanup that Peter
suggested in the context), I noticed another thing: Initializing parallelism
as part of dead_items_alloc() is a bad idea. Even if there are comments noting
that oddity.I don't really see why we should do it this way? There's no "no-parallelism"
path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
it's not like we might just discover the inability to do parallelism during
parallel initialization?Right. Also, in parallel vacuum case, it allocates the space not only
for dead items but also other data required to do parallelism like
shared bulkdeletion results etc. Originally, in PG13,
begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
became part of dead_items_alloc() (see b4af70cb2). I agree to change
this part so that lazy_scan_heap() calls begin_parallel_vacuum()
(whatever we rename it). I'll incorporate this change in the
refactoring patch barring any objections.It's also not particularly helpful to have a begin_parallel_vacuum() that
might not actually begin a parallel vacuum...During the development, I found that we have some begin_* functions
that don't start the actual parallel job but prepare state data for
starting parallel job and referred to _bt_begin_parallel() so I named
begin_parallel_vacuum(). But I admit that considering what the
function actually does, something like
create_parallel_vacuum_context() would be clearer.How about if we name it as parallel_vacuum_init() which will be
similar InitializeParallelDSM, ExecInitParallelPlan().
parallel_vacuum_init() sounds better.
Now, I see
there is some reasoning to keep it in dead_items_alloc as both
primarily allocate memory for vacuum but maybe we should name the
function vacuum_space_alloc instead of dead_items_alloc and similarly
rename dead_items_cleanup to vacuum_space_free. The other idea could
be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I
personally prefer the idea to keep it where it is but improve function
names. Will it be better to do this as a separate patch as 0002
because this might require some change in the vacuum code path?
Yeah, if we do just renaming functions, I think we can do that in 0001
patch. On the other hand, if we need to change the logic, it's better
to do that in a separate patch.
Minor nit:
begin_parallel_vacuum()'s comment says:
* On success (when we can launch one or more workers), will set dead_items and
* lps in vacrel for caller.But it actually doesn't know whether we can start workers. It just checks
max_parallel_maintenance_workers, no?Yes, we cannot know whether we can actually start workers when
starting parallel index vacuuming. It returns non-NULL if we request
one or more workers.So can we adjust the comments? I think the part of the sentence "when
we can launch one or more workers" seems to be the cause of confusion,
can we remove it?
Yes, we can remove it. Or replace "can launch" with "request".
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.
Okay, I thought it might be better to keep all parallel_vacuum_*
related functions together but we can keep that in a separate patch
Feel free to submit without those changes. In fact, if we go for your
current 0002 then that might not be even required as we move all those
functions to a new file.
--
With Regards,
Amit Kapila.
On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.Okay, I thought it might be better to keep all parallel_vacuum_*
related functions together but we can keep that in a separate patch
Feel free to submit without those changes.
I've attached the patch. I've just moved some functions back but not
done other changes.
In fact, if we go for your
current 0002 then that might not be even required as we move all those
functions to a new file.
Right. So it seems not necessary.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v9-0001-Improve-parallel-vacuum-implementation.patchapplication/octet-stream; name=v9-0001-Improve-parallel-vacuum-implementation.patchDownload
From 58c0d2ab7ec8dd285b523e8c5c0e2e080839e879 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 30 Nov 2021 23:26:28 +0900
Subject: [PATCH v9] Improve parallel vacuum implementation.
Previously, in parallel vacuum, we allocated shmem area of
IndexBulkDeleteResult only for indexes where parallel index vacuuming is
safe and had null-bitmap in shmem area to access them. This logic was too
complicated with a small benefit of saving only a few bits per indexes.
In this commit, we allocate a dedicated shmem area for the array of
LVParallelIndStats that includes a parallel-safety flag, the index vacuum
status, and IndexBulkdeleteResult. There is one array element for every
index, even those indexes where parallel index vacuuming is unsafe or not
worthwhile. This commit makes the code clear by removing all
bitmap-related code.
Also, add the check each index vacuum status after parallel index vacuum
to make sure that all indexes have been processed.
Finally, rename parallel vacuum functions to parallel_vacuum_* for
consistency.
Author: Masahiko Sawada, based on suggestions by Andres Freund
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 604 +++++++++++++--------------
src/tools/pgindent/typedefs.list | 2 +
2 files changed, 288 insertions(+), 318 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87b..db6becfed5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -130,6 +130,7 @@
#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
@@ -181,14 +182,6 @@ typedef struct LVShared
Oid relid;
int elevel;
- /*
- * An indication for vacuum workers to perform either index vacuum or
- * index cleanup. first_time is true only if for_cleanup is true and
- * bulk-deletion is not performed yet.
- */
- bool for_cleanup;
- bool first_time;
-
/*
* Fields for both index vacuum and cleanup.
*
@@ -226,33 +219,44 @@ typedef struct LVShared
*/
pg_atomic_uint32 active_nworkers;
- /*
- * Variables to control parallel vacuum. We have a bitmap to indicate
- * which index has stats in shared memory. The set bit in the map
- * indicates that the particular index supports a parallel vacuum.
- */
- pg_atomic_uint32 idx; /* counter for vacuuming and clean up */
- uint32 offset; /* sizeof header incl. bitmap */
- bits8 bitmap[FLEXIBLE_ARRAY_MEMBER]; /* bit map of NULLs */
-
- /* Shared index statistics data follows at end of struct */
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
} LVShared;
-#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
-#define GetSharedIndStats(s) \
- ((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
-#define IndStatsIsNull(s, i) \
- (!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
+/* Status used during parallel index vacuum or cleanup */
+typedef enum LVParallelIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} LVParallelIndVacStatus;
/*
- * Struct for an index bulk-deletion statistic used for parallel vacuum. This
- * is allocated in the DSM segment.
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
*/
-typedef struct LVSharedIndStats
+typedef struct LVParallelIndStats
{
- bool updated; /* are the stats updated? */
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ LVParallelIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
IndexBulkDeleteResult istat;
-} LVSharedIndStats;
+} LVParallelIndStats;
/* Struct for maintaining a parallel vacuum state. */
typedef struct LVParallelState
@@ -262,12 +266,29 @@ typedef struct LVParallelState
/* Shared information among parallel vacuum workers */
LVShared *lvshared;
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ LVParallelIndStats *lvpindstats;
+
/* Points to buffer usage area in DSM */
BufferUsage *buffer_usage;
/* Points to WAL usage area in DSM */
WalUsage *wal_usage;
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
/*
* The number of indexes that support parallel index bulk-deletion and
* parallel index cleanup respectively.
@@ -391,19 +412,14 @@ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
-static void do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel);
-static void do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers);
-static void do_parallel_processing(LVRelState *vacrel,
- LVShared *lvshared);
-static void do_serial_processing_for_unsafe_indexes(LVRelState *vacrel,
- LVShared *lvshared);
-static IndexBulkDeleteResult *parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_indstats,
- LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
+static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats);
+static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
+static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared,
+ LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -425,14 +441,13 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int compute_parallel_vacuum_workers(LVRelState *vacrel,
- int nrequested,
+static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void begin_parallel_vacuum(LVRelState *vacrel, int nrequested);
-static void end_parallel_vacuum(LVRelState *vacrel);
-static LVSharedIndStats *parallel_stats_for_idx(LVShared *lvshared, int getidx);
-static bool parallel_processing_is_safe(Relation indrel, LVShared *lvshared);
+static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
+static void parallel_vacuum_end(LVRelState *vacrel);
+static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2237,7 +2252,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_vacuum_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, true);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2611,76 +2626,54 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform lazy_vacuum_all_indexes() steps in parallel
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
*/
static void
-do_parallel_lazy_vacuum_all_indexes(LVRelState *vacrel)
+parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
{
- /* Tell parallel workers to do index vacuuming */
- vacrel->lps->lvshared->for_cleanup = false;
- vacrel->lps->lvshared->first_time = false;
-
- /*
- * We can only provide an approximate value of num_heap_tuples, at least
- * for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
+ LVParallelState *lps = vacrel->lps;
+ LVParallelIndVacStatus new_status;
+ int nworkers;
- do_parallel_vacuum_or_cleanup(vacrel,
- vacrel->lps->nindexes_parallel_bulkdel);
-}
+ Assert(!IsParallelWorker());
+ Assert(ParallelVacuumIsActive(vacrel));
+ Assert(vacrel->nindexes > 0);
-/*
- * Perform lazy_cleanup_all_indexes() steps in parallel
- */
-static void
-do_parallel_lazy_cleanup_all_indexes(LVRelState *vacrel)
-{
- int nworkers;
+ if (vacuum)
+ {
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at
+ * least for now. Matches serial VACUUM case.
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
+ vacrel->lps->lvshared->estimated_count = true;
- /*
- * If parallel vacuum is active we perform index cleanup with parallel
- * workers.
- *
- * Tell parallel workers to do index cleanup.
- */
- vacrel->lps->lvshared->for_cleanup = true;
- vacrel->lps->lvshared->first_time = (vacrel->num_index_scans == 0);
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
- /*
- * Now we can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- /* Determine the number of parallel workers to launch */
- if (vacrel->lps->lvshared->first_time)
- nworkers = vacrel->lps->nindexes_parallel_cleanup +
- vacrel->lps->nindexes_parallel_condcleanup;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_bulkdel;
+ }
else
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
+ {
+ /*
+ * We can provide a better estimate of total number of surviving
+ * tuples (we assume indexes are more interested in that than in the
+ * number of nominally live tuples).
+ */
+ vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
+ vacrel->lps->lvshared->estimated_count =
+ (vacrel->tupcount_pages < vacrel->rel_pages);
- do_parallel_vacuum_or_cleanup(vacrel, nworkers);
-}
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process. The caller must set
- * lps->lvshared->for_cleanup to indicate whether to perform vacuum or
- * cleanup.
- */
-static void
-do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
-{
- LVParallelState *lps = vacrel->lps;
+ /* Determine the number of parallel workers to launch */
+ nworkers = vacrel->lps->nindexes_parallel_cleanup;
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (vacrel->num_index_scans == 0)
+ nworkers += vacrel->lps->nindexes_parallel_condcleanup;
+ }
/* The leader process will participate */
nworkers--;
@@ -2688,21 +2681,36 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
/*
* It is possible that parallel context is initialized with fewer workers
* than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See compute_parallel_vacuum_workers.
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
*/
nworkers = Min(nworkers, lps->pcxt->nworkers);
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
+
+ Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ pindstats->status = new_status;
+ pindstats->parallel_workers_can_process =
+ (lps->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
+ vacuum));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(lps->lvshared->idx), 0);
+
/* Setup the shared cost-based vacuum delay and launch workers */
if (nworkers > 0)
{
+ /* Reinitialize parallel context to relaunch parallel workers */
if (vacrel->num_index_scans > 0)
- {
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Reinitialize the parallel context to relaunch parallel workers */
ReinitializeParallelDSM(lps->pcxt);
- }
/*
* Set up shared cost balance and the number of active workers for
@@ -2735,28 +2743,28 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
}
- if (lps->lvshared->for_cleanup)
+ if (vacuum)
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
else
ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
lps->pcxt->nworkers_launched),
lps->pcxt->nworkers_launched, nworkers)));
}
/* Process the indexes that can be processed by only leader process */
- do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared);
+ parallel_vacuum_process_unsafe_indexes(vacrel);
/*
- * Join as a parallel worker. The leader process alone processes all the
- * indexes in the case where no workers are launched.
+ * Join as a parallel worker. The leader process alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
*/
- do_parallel_processing(vacrel, lps->lvshared);
+ parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2771,6 +2779,21 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
+
+ if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(vacrel->indrels[i]));
+
+ pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
/*
* Carry the shared balance value to heap scan and disable shared costing
*/
@@ -2787,7 +2810,8 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
* vacuum worker processes to process the indexes in parallel.
*/
static void
-do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
+ LVParallelIndStats *pindstats)
{
/*
* Increment the active worker count if we are able to launch any worker.
@@ -2799,39 +2823,28 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
for (;;)
{
int idx;
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
+ LVParallelIndStats *pis;
/* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(lvshared->idx), 1);
+ idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
/* Done for all indexes? */
if (idx >= vacrel->nindexes)
break;
- /* Get the index statistics space from DSM, if any */
- shared_istat = parallel_stats_for_idx(lvshared, idx);
-
- /* Skip indexes not participating in parallelism */
- if (shared_istat == NULL)
- continue;
-
- indrel = vacrel->indrels[idx];
+ pis = &(pindstats[idx]);
/*
- * Skip processing indexes that are unsafe for workers (these are
- * processed in do_serial_processing_for_unsafe_indexes() by leader)
+ * Skip processing index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is processed in
+ * parallel_vacuum_process_unsafe_indexes() by the leader).
*/
- if (!parallel_processing_is_safe(indrel, lvshared))
+ if (!pis->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ shared, pis);
}
/*
@@ -2847,15 +2860,16 @@ do_parallel_processing(LVRelState *vacrel, LVShared *lvshared)
*
* Handles index vacuuming (or index cleanup) for indexes that are not
* parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing for_cleanup processing right now.
+ * on details like whether we're performing index cleanup right now.
*
* Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by compute_parallel_vacuum_workers(). These indexes never get a
- * slot for statistics in DSM.
+ * enforced by parallel_vacuum_compute_workers().
*/
static void
-do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
+parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
{
+ LVParallelState *lps = vacrel->lps;
+
Assert(!IsParallelWorker());
/*
@@ -2866,28 +2880,15 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
for (int idx = 0; idx < vacrel->nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
- Relation indrel;
- IndexBulkDeleteResult *istat;
-
- shared_istat = parallel_stats_for_idx(lvshared, idx);
- indrel = vacrel->indrels[idx];
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- /*
- * We're only here for the indexes that parallel workers won't
- * process. Note that the shared_istat test ensures that we process
- * indexes that fell under initial size cutoff.
- */
- if (shared_istat != NULL &&
- parallel_processing_is_safe(indrel, lvshared))
+ /* Skip, indexes that are safe for workers */
+ if (pindstats->parallel_workers_can_process)
continue;
/* Do vacuum or cleanup of the index */
- istat = vacrel->indstats[idx];
- vacrel->indstats[idx] = parallel_process_one_index(indrel, istat,
- lvshared,
- shared_istat,
- vacrel);
+ parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
+ lps->lvshared, pindstats);
}
/*
@@ -2904,29 +2905,37 @@ do_serial_processing_for_unsafe_indexes(LVRelState *vacrel, LVShared *lvshared)
* statistics returned from ambulkdelete and amvacuumcleanup to the DSM
* segment.
*/
-static IndexBulkDeleteResult *
-parallel_process_one_index(Relation indrel,
- IndexBulkDeleteResult *istat,
- LVShared *lvshared,
- LVSharedIndStats *shared_istat,
- LVRelState *vacrel)
+static void
+parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
+ LVShared *shared, LVParallelIndStats *pindstats)
{
+ IndexBulkDeleteResult *istat = NULL;
IndexBulkDeleteResult *istat_res;
/*
* Update the pointer to the corresponding bulk-deletion result if someone
* has already updated it
*/
- if (shared_istat && shared_istat->updated && istat == NULL)
- istat = &shared_istat->istat;
+ if (pindstats->istat_updated)
+ istat = &(pindstats->istat);
- /* Do vacuum or cleanup of the index */
- if (lvshared->for_cleanup)
- istat_res = lazy_cleanup_one_index(indrel, istat, lvshared->reltuples,
- lvshared->estimated_count, vacrel);
- else
- istat_res = lazy_vacuum_one_index(indrel, istat, lvshared->reltuples,
- vacrel);
+ switch (pindstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = lazy_vacuum_one_index(indrel, istat,
+ shared->reltuples, vacrel);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = lazy_cleanup_one_index(indrel, istat,
+ shared->reltuples,
+ shared->estimated_count,
+ vacrel);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ pindstats->status,
+ RelationGetRelationName(indrel));
+ }
/*
* Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2949,20 @@ parallel_process_one_index(Relation indrel,
* Since all vacuum workers write the bulk-deletion result at different
* slots we can write them without locking.
*/
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
{
- memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;
/* Free the locally-allocated bulk-deletion result */
pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return &shared_istat->istat;
}
- return istat_res;
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
}
/*
@@ -2987,7 +2997,7 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- do_parallel_lazy_cleanup_all_indexes(vacrel);
+ parallel_vacuum_process_all_indexes(vacrel, false);
}
}
@@ -3520,7 +3530,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- begin_parallel_vacuum(vacrel, nworkers);
+ parallel_vacuum_begin(vacrel, nworkers);
/* If parallel mode started, vacrel->dead_items allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
@@ -3552,7 +3562,7 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- end_parallel_vacuum(vacrel);
+ parallel_vacuum_end(vacrel);
}
/*
@@ -3758,7 +3768,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* vacuum.
*/
static int
-compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
+parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
bool *will_parallel_vacuum)
{
int nindexes_parallel = 0;
@@ -3781,6 +3791,7 @@ compute_parallel_vacuum_workers(LVRelState *vacrel, int nrequested,
Relation indrel = vacrel->indrels[idx];
uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ /* Skip index that is not a suitable target for parallel index vacuum */
if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
continue;
@@ -3855,7 +3866,7 @@ update_index_statistics(LVRelState *vacrel)
* VACUUM is currently active.
*/
static void
-begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
+parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
{
LVParallelState *lps;
Relation *indrels = vacrel->indrels;
@@ -3863,10 +3874,12 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
ParallelContext *pcxt;
LVShared *shared;
LVDeadItems *dead_items;
+ LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
bool *will_parallel_vacuum;
int max_items;
+ Size est_pindstats_len;
Size est_shared_len;
Size est_dead_items_len;
int nindexes_mwm = 0;
@@ -3884,8 +3897,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* Compute the number of parallel vacuum workers to launch
*/
will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = compute_parallel_vacuum_workers(vacrel,
- nrequested,
+ parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
will_parallel_vacuum);
if (parallel_workers <= 0)
{
@@ -3901,48 +3913,21 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
parallel_workers);
Assert(pcxt->nworkers > 0);
lps->pcxt = pcxt;
+ lps->will_parallel_vacuum = will_parallel_vacuum;
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- /* Skip indexes that don't participate in parallel vacuum */
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- est_shared_len = add_size(est_shared_len, sizeof(LVSharedIndStats));
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
+ est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(LVShared);
shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = MAXALIGN(max_items_to_alloc_size(max_items));
+ est_dead_items_len = max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -3973,6 +3958,41 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
InitializeParallelDSM(pcxt);
+ /* Prepare index vacuum stats */
+ pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ lps->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ lps->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ lps->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
+ lps->lvpindstats = pindstats;
+
/* Prepare shared information */
shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
MemSet(shared, 0, est_shared_len);
@@ -3986,21 +4006,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
pg_atomic_init_u32(&(shared->cost_balance), 0);
pg_atomic_init_u32(&(shared->active_nworkers), 0);
pg_atomic_init_u32(&(shared->idx), 0);
- shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
-
- /*
- * Initialize variables for shared index statistics, set NULL bitmap and
- * the size of stats for each index.
- */
- memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
- for (int idx = 0; idx < nindexes; idx++)
- {
- if (!will_parallel_vacuum[idx])
- continue;
-
- /* Set NOT NULL as this index does support parallelism */
- shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
- }
shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
lps->lvshared = shared;
@@ -4038,8 +4043,6 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
}
- pfree(will_parallel_vacuum);
-
/* Success -- set dead_items and lps in leader's vacrel state */
vacrel->dead_items = dead_items;
vacrel->lps = lps;
@@ -4055,7 +4058,7 @@ begin_parallel_vacuum(LVRelState *vacrel, int nrequested)
* context, but that won't be safe (see ExitParallelMode).
*/
static void
-end_parallel_vacuum(LVRelState *vacrel)
+parallel_vacuum_end(LVRelState *vacrel)
{
IndexBulkDeleteResult **indstats = vacrel->indstats;
LVParallelState *lps = vacrel->lps;
@@ -4066,21 +4069,12 @@ end_parallel_vacuum(LVRelState *vacrel)
/* Copy the updated statistics */
for (int idx = 0; idx < nindexes; idx++)
{
- LVSharedIndStats *shared_istat;
-
- shared_istat = parallel_stats_for_idx(lps->lvshared, idx);
-
- /*
- * Skip index -- it must have been processed by the leader, from
- * inside do_serial_processing_for_unsafe_indexes()
- */
- if (shared_istat == NULL)
- continue;
+ LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
- if (shared_istat->updated)
+ if (pindstats->istat_updated)
{
indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &shared_istat->istat, sizeof(IndexBulkDeleteResult));
+ memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
}
else
indstats[idx] = NULL;
@@ -4090,72 +4084,43 @@ end_parallel_vacuum(LVRelState *vacrel)
ExitParallelMode();
/* Deactivate parallel vacuum */
+ pfree(lps->will_parallel_vacuum);
pfree(lps);
vacrel->lps = NULL;
}
/*
- * Return shared memory statistics for index at offset 'getidx', if any
- *
- * Returning NULL indicates that compute_parallel_vacuum_workers() determined
- * that the index is a totally unsuitable target for all parallel processing
- * up front. For example, the index could be < min_parallel_index_scan_size
- * cutoff.
- */
-static LVSharedIndStats *
-parallel_stats_for_idx(LVShared *lvshared, int getidx)
-{
- char *p;
-
- if (IndStatsIsNull(lvshared, getidx))
- return NULL;
-
- p = (char *) GetSharedIndStats(lvshared);
- for (int idx = 0; idx < getidx; idx++)
- {
- if (IndStatsIsNull(lvshared, idx))
- continue;
-
- p += sizeof(LVSharedIndStats);
- }
-
- return (LVSharedIndStats *) p;
-}
-
-/*
- * Returns false, if the given index can't participate in parallel index
- * vacuum or parallel index cleanup
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
*/
static bool
-parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
+parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
+ bool vacuum)
{
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+ uint8 vacoptions;
- /* first_time must be true only if for_cleanup is true */
- Assert(lvshared->for_cleanup || !lvshared->first_time);
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
- if (lvshared->for_cleanup)
- {
- /* Skip, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
- /*
- * Skip, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). See the
- * comments for option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know
- * when indexes support parallel cleanup conditionally.
- */
- if (!lvshared->first_time &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
- }
- else if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) == 0)
- {
- /* Skip if the index does not support parallel bulk deletion */
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (vacrel->num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
return false;
- }
return true;
}
@@ -4171,6 +4136,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
{
Relation rel;
Relation *indrels;
+ LVParallelIndStats *lvpindstats;
LVShared *lvshared;
LVDeadItems *dead_items;
BufferUsage *buffer_usage;
@@ -4190,10 +4156,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
elevel = lvshared->elevel;
- if (lvshared->for_cleanup)
- elog(DEBUG1, "starting parallel vacuum worker for cleanup");
- else
- elog(DEBUG1, "starting parallel vacuum worker for bulk delete");
+ elog(DEBUG1, "starting parallel vacuum worker");
/* Set debug_query_string for individual workers */
sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
@@ -4214,6 +4177,11 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
Assert(nindexes > 0);
+ /* Set index statistics */
+ lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
/* Set dead_items space (set as worker's vacrel dead_items below) */
dead_items = (LVDeadItems *) shm_toc_lookup(toc,
PARALLEL_VACUUM_KEY_DEAD_ITEMS,
@@ -4259,7 +4227,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
InstrStartParallelQuery();
/* Process indexes to perform vacuum/cleanup */
- do_parallel_processing(&vacrel, lvshared);
+ parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
/* Report buffer/WAL usage during parallel execution */
buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f41ef0d2bc..0c61ccbdd0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1307,6 +1307,8 @@ LSEG
LUID
LVDeadTuples
LVPagePruneState
+LVParallelIndStats
+LVParallelIndVacStatus
LVParallelState
LVRelState
LVSavedErrInfo
--
2.24.3 (Apple Git-128)
On Monday, December 13, 2021 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.Okay, I thought it might be better to keep all parallel_vacuum_*
related functions together but we can keep that in a separate patch
Feel free to submit without those changes.I've attached the patch. I've just moved some functions back but not
done other changes.
Thanks for your patch.
I tested your patch and tried some cases, like large indexes, different types of indexes, it worked well.
Besides, I noticed a typo as follows:
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
"PARALLEL_VACUUM_KEY_STATS" should be "PARALLEL_VACUUM_KEY_INDEX_STATS".
Regards,
Tang
On Tuesday, December 14, 2021 10:11 AM Tang, Haiying wrote:
On Monday, December 13, 2021 2:12 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:I've attached the patch. I've just moved some functions back but not
done other changes.Thanks for your patch.
I tested your patch and tried some cases, like large indexes, different types of
indexes, it worked well.
+1, the patch looks good to me.
Best regards,
Hou zj
On Tue, Dec 14, 2021 at 7:40 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:
On Monday, December 13, 2021 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.Okay, I thought it might be better to keep all parallel_vacuum_*
related functions together but we can keep that in a separate patch
Feel free to submit without those changes.I've attached the patch. I've just moved some functions back but not
done other changes.Thanks for your patch.
I tested your patch and tried some cases, like large indexes, different types of indexes, it worked well.
Besides, I noticed a typo as follows:
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
"PARALLEL_VACUUM_KEY_STATS" should be "PARALLEL_VACUUM_KEY_INDEX_STATS".
Thanks, I can take care of this before committing. The v9-0001* looks
good to me as well, so, I am planning to commit that tomorrow unless I
see more comments or any objection to that. There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?
--
With Regards,
Amit Kapila.
On Mon, Dec 13, 2021 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I can take care of this before committing. The v9-0001* looks
good to me as well, so, I am planning to commit that tomorrow unless I
see more comments or any objection to that.
I would like to thank both Masahiko and yourself for working on this.
It's important.
There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?
I'm +1 on moving it into a new file. I think that that division makes
perfect sense. It will make the design of parallel VACUUM easier to
understand. I believe that index vacuuming (whether or not it involves
parallel workers) ought to be a more or less distinct operation to
heap vacuuming. With a distinct autovacuum schedule (well, the
schedule would be related, but still distinct).
--
Peter Geoghegan
On Wed, Dec 15, 2021 at 8:23 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Dec 13, 2021 at 7:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thanks, I can take care of this before committing. The v9-0001* looks
good to me as well, so, I am planning to commit that tomorrow unless I
see more comments or any objection to that.I would like to thank both Masahiko and yourself for working on this.
It's important.There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?I'm +1 on moving it into a new file. I think that that division makes
perfect sense. It will make the design of parallel VACUUM easier to
understand.
Agreed and thanks for your support.
--
With Regards,
Amit Kapila.
On Tue, Dec 14, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 14, 2021 at 7:40 AM tanghy.fnst@fujitsu.com
<tanghy.fnst@fujitsu.com> wrote:On Monday, December 13, 2021 2:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Agreed with the above two points.
I've attached updated patches that incorporated the above comments
too. Please review them.I have made the following minor changes to the 0001 patch: (a) An
assert was removed from dead_items_max_items() which I added back. (b)
Removed an unnecessary semicolon from one of the statements in
compute_parallel_vacuum_workers(). (c) Changed comments at a few
places. (d) moved all parallel_vacuum_* related functions together.
(e) ran pgindent and slightly modify the commit message.Let me know what you think of the attached?
Thank you for updating the patch!
The patch also moves some functions, e.g., update_index_statistics()
is moved without code changes. I agree to move functions for
consistency but that makes the review hard and the patch complicated.
I think it's better to do improving the parallel vacuum code and
moving functions in separate patches.Okay, I thought it might be better to keep all parallel_vacuum_*
related functions together but we can keep that in a separate patch
Feel free to submit without those changes.I've attached the patch. I've just moved some functions back but not
done other changes.Thanks for your patch.
I tested your patch and tried some cases, like large indexes, different types of indexes, it worked well.
Besides, I noticed a typo as follows:
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
"PARALLEL_VACUUM_KEY_STATS" should be "PARALLEL_VACUUM_KEY_INDEX_STATS".
Thanks, I can take care of this before committing. The v9-0001* looks
good to me as well, so, I am planning to commit that tomorrow unless I
see more comments or any objection to that.
Thanks!
There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?
I'm on the same page.
I've attached an updated patch. The patch incorporated several changes
from the last version:
* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".
* Fix the comment of parallel_vacuum_init() pointed out by Andres
* Fix a typo that is left in commit 22bd3cbe0c (pointed out by Hou)
Please review it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v8-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v8-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From d77f78756449ed3069bc8194baa6046d5cbf4071 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 15 Dec 2021 16:49:01 +0900
Subject: [PATCH v8] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1170 ++-----------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 148 ++++
src/backend/commands/vacuumparallel.c | 1097 +++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 41 +
src/tools/pgindent/typedefs.list | 2 +
8 files changed, 1353 insertions(+), 1109 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index db6becfed5..dd1f2ed4d3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,155 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where parallel
- * index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into
- * local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be <
- * min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -315,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -339,9 +177,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -413,13 +256,6 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -433,21 +269,12 @@ static bool should_attempt_truncation(LVRelState *vacrel);
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
-static int dead_items_max_items(LVRelState *vacrel);
static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -905,7 +732,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +1867,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2083,7 +1910,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2212,7 +2038,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2251,8 +2076,21 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2404,7 +2242,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -2625,353 +2463,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at
- * least for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See
- * parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing index that is unsafe for workers or has an
- * unsuitable target for parallel index vacuum (this is processed in
- * parallel_vacuum_process_unsafe_indexes() by the leader).
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each worker
- * touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2996,8 +2493,23 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
}
else
{
- /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
}
}
@@ -3045,13 +2557,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = bulkdel_one_index(&ivinfo, istat, vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
ivinfo.strategy = vacrel->bstrategy;
@@ -3102,24 +2607,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3455,8 +2943,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3481,19 +2967,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3504,9 +2977,12 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3530,16 +3006,21 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
- max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3562,75 +3043,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
-}
-
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3754,77 +3168,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* Skip index that is not a suitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3835,7 +3178,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3857,393 +3200,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- LVDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). We do this to avoid
- * the need to invoke workers when parallel index cleanup doesn't need to
- * scan the index. See the comments for option
- * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
- * parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- LVDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..2eb73bf1ce 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,147 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Calls index AM's ambulkdelete routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..86f53bc003
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1097 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items albulklocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index bulk-deletion or parallel index cleanup. These fields
+ * are not fixed for the entire VACUUM operation. They are only fixed for
+ * an individual parallel index bulk-deletion and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index bulk-deletion or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /* Error reporting state */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool have_done_bulkdel);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool have_done_bulkdel);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success, return parallel vacuum state. Otherwise return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_INDEX_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(PVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* have_done_bulkdel is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * have_done_bulkdel is true if the caller has done index bulk-deletion one
+ * or more times in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool have_done_bulkdel)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, have_done_bulkdel);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * bulk-deletion and index cleanup can be executed with parallel workers.
+ * The index is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip index that is not a suitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Perform index bulk-deletion or index cleanup with parallel workers. This
+ * function must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool bulkdel,
+ bool have_done_bulkdel)
+{
+ int nworkers;
+ ErrorContextCallback errcallback;
+ PVIndVacStatus new_status;
+
+ Assert(!IsParallelWorker());
+
+ if (bulkdel)
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (!have_done_bulkdel)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i], bulkdel,
+ have_done_bulkdel));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (bulkdel)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader vacuums alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Index bulk-deletion/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is vacuumed in
+ * parallel_vacuum_unsafe_indexes() by the leader).
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index bulk-deletion (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, indexes that are safe for workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Do bulk-deletion or cleanup index either by leader process or by one of the
+ * worker process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum.
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool bulkdel,
+ bool have_done_bulkdel)
+{
+ uint8 vacoptions;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (bulkdel)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (have_done_bulkdel &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index bulk-deletion or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED, false);
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..7ed58af9d8 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -62,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -230,6 +235,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +302,27 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
+
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
+ int nindexes, int nrequested_workers,
+ int max_items, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61ccbdd0..469c7c2dd7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1810,6 +1810,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
@@ -2800,6 +2801,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. The patch incorporated several changes
from the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".
I am not sure it is a good idea to do this as part of the main patch
as the intention of that is to just refactor parallel vacuum code. I
suggest doing this as a separate patch. Also, can we move the common
code to be shared between vacuumparallel.c and vacuumlazy.c as a
separate patch?
Few other comments and questions:
============================
1. /* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ LVSavedErrInfo saved_err_info;
+
+ /*
+ * Outsource everything to parallel variant. Since parallel vacuum will
+ * set the error context on an error we temporarily disable setting our
+ * error context.
+ */
+ update_vacuum_error_info(vacrel, &saved_err_info,
+ VACUUM_ERRCB_PHASE_UNKNOWN,
+ InvalidBlockNumber, InvalidOffsetNumber);
+
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
+
+ /* Revert to the previous phase information for error traceback */
+ restore_vacuum_error_info(vacrel, &saved_err_info);
Is this change because you want a separate error callback for parallel
vacuum? If so, I suggest we can discuss this as a separate patch from
the refactoring patch.
2. Is introducing bulkdel_one_index/cleanup_one_index related to new
error context, or "Unify the terminology" task? Is there any other
reason for the same?
3. Why did you introduce
parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
Is it because of your task "Unify the terminology"?
4.
@@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;
This seems like an unrelated change.
--
With Regards,
Amit Kapila.
On Wed, Dec 15, 2021 4:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 14, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?I'm on the same page.
I've attached an updated patch. The patch incorporated several changes from
the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".
* Fix the comment of parallel_vacuum_init() pointed out by Andres
* Fix a typo that is left in commit 22bd3cbe0c (pointed out by Hou)Please review it.
Thanks for updating the patch.
Here are a few comments:
1)
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",
I noticed current code uses error msg "While cleaning up index xxx" which seems a little
different from the patch's maybe we can use the previous one ?
2)
static inline Size max_items_to_alloc_size(int max_items);
This old function declaration can be deleted.
3)
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
I think we need to remove LVShared, LVSharedIndStats, LVDeadItems and
LVParallelState from typedefs.list and add PVShared and PVIndStats to the file.
Best regards,
Hou zj
On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. The patch incorporated several changes
from the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".I am not sure it is a good idea to do this as part of the main patch
as the intention of that is to just refactor parallel vacuum code. I
suggest doing this as a separate patch.
Okay.
Also, can we move the common
code to be shared between vacuumparallel.c and vacuumlazy.c as a
separate patch?
You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
vacuumparallel.c and vacuumlazy.c have the same functions?
Few other comments and questions: ============================ 1. /* Outsource everything to parallel variant */ - parallel_vacuum_process_all_indexes(vacrel, true); + LVSavedErrInfo saved_err_info; + + /* + * Outsource everything to parallel variant. Since parallel vacuum will + * set the error context on an error we temporarily disable setting our + * error context. + */ + update_vacuum_error_info(vacrel, &saved_err_info, + VACUUM_ERRCB_PHASE_UNKNOWN, + InvalidBlockNumber, InvalidOffsetNumber); + + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples); + + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrel, &saved_err_info);Is this change because you want a separate error callback for parallel
vacuum? If so, I suggest we can discuss this as a separate patch from
the refactoring patch.
Because it seems natural to me that the leader and worker use the same
error callback.
Okay, I'll remove that change in the next version patch.
2. Is introducing bulkdel_one_index/cleanup_one_index related to new
error context, or "Unify the terminology" task? Is there any other
reason for the same?
Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
same functions.
3. Why did you introduce
parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
Is it because of your task "Unify the terminology"?
This is because parallel bulk-deletion and cleanup require different
numbers of inputs (num_table_tuples etc.) and the caller
(vacuumlazy.c) cannot set them directly to ParallelVacuumState.
4.
@@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;This seems like an unrelated change.
Yes, but I think it's an unnecessary break so we can change it
together. Should it be done in a separate patch?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Dec 16, 2021 at 6:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. The patch incorporated several changes
from the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".I am not sure it is a good idea to do this as part of the main patch
as the intention of that is to just refactor parallel vacuum code. I
suggest doing this as a separate patch.Okay.
Also, can we move the common
code to be shared between vacuumparallel.c and vacuumlazy.c as a
separate patch?You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
vacuumparallel.c and vacuumlazy.c have the same functions?
Why that would be required? I think both can call the common exposed
function like the one you have in your patch bulkdel_one_index or if
we directly move lazy_vacuum_one_index as part of common code. Similar
for cleanup function.
Few other comments and questions: ============================ 1. /* Outsource everything to parallel variant */ - parallel_vacuum_process_all_indexes(vacrel, true); + LVSavedErrInfo saved_err_info; + + /* + * Outsource everything to parallel variant. Since parallel vacuum will + * set the error context on an error we temporarily disable setting our + * error context. + */ + update_vacuum_error_info(vacrel, &saved_err_info, + VACUUM_ERRCB_PHASE_UNKNOWN, + InvalidBlockNumber, InvalidOffsetNumber); + + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples); + + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrel, &saved_err_info);Is this change because you want a separate error callback for parallel
vacuum? If so, I suggest we can discuss this as a separate patch from
the refactoring patch.Because it seems natural to me that the leader and worker use the same
error callback.Okay, I'll remove that change in the next version patch.
2. Is introducing bulkdel_one_index/cleanup_one_index related to new
error context, or "Unify the terminology" task? Is there any other
reason for the same?Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
same functions.3. Why did you introduce
parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
Is it because of your task "Unify the terminology"?This is because parallel bulk-deletion and cleanup require different
numbers of inputs (num_table_tuples etc.) and the caller
(vacuumlazy.c) cannot set them directly to ParallelVacuumState.
oh, yeah, the other possibility could be to have a common structure
that can be used for both cases. I am not sure if that is better than
what you have.
4.
@@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;This seems like an unrelated change.
Yes, but I think it's an unnecessary break so we can change it
together. Should it be done in a separate patch?
Isn't this just spurious line removal which shouldn't be part of any patch?
--
With Regards,
Amit Kapila.
On Thu, Dec 16, 2021 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 16, 2021 at 6:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Thu, Dec 16, 2021 at 1:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 15, 2021 at 1:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached an updated patch. The patch incorporated several changes
from the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".I am not sure it is a good idea to do this as part of the main patch
as the intention of that is to just refactor parallel vacuum code. I
suggest doing this as a separate patch.Okay.
Also, can we move the common
code to be shared between vacuumparallel.c and vacuumlazy.c as a
separate patch?You mean vac_tid_reaped() and vac_cmp_itemptr() etc.? If so, do both
vacuumparallel.c and vacuumlazy.c have the same functions?Why that would be required? I think both can call the common exposed
function like the one you have in your patch bulkdel_one_index or if
we directly move lazy_vacuum_one_index as part of common code. Similar
for cleanup function.
Understood.
Few other comments and questions: ============================ 1. /* Outsource everything to parallel variant */ - parallel_vacuum_process_all_indexes(vacrel, true); + LVSavedErrInfo saved_err_info; + + /* + * Outsource everything to parallel variant. Since parallel vacuum will + * set the error context on an error we temporarily disable setting our + * error context. + */ + update_vacuum_error_info(vacrel, &saved_err_info, + VACUUM_ERRCB_PHASE_UNKNOWN, + InvalidBlockNumber, InvalidOffsetNumber); + + parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples); + + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrel, &saved_err_info);Is this change because you want a separate error callback for parallel
vacuum? If so, I suggest we can discuss this as a separate patch from
the refactoring patch.Because it seems natural to me that the leader and worker use the same
error callback.Okay, I'll remove that change in the next version patch.
2. Is introducing bulkdel_one_index/cleanup_one_index related to new
error context, or "Unify the terminology" task? Is there any other
reason for the same?Because otherwise both vacuumlazy.c and vacuumparallel.c will have the
same functions.3. Why did you introduce
parallel_vacuum_bulkdel_all_indexes()/parallel_vacuum_cleanup_all_indexes()?
Is it because of your task "Unify the terminology"?This is because parallel bulk-deletion and cleanup require different
numbers of inputs (num_table_tuples etc.) and the caller
(vacuumlazy.c) cannot set them directly to ParallelVacuumState.oh, yeah, the other possibility could be to have a common structure
that can be used for both cases. I am not sure if that is better than
what you have.
Yes, I left them as they are in an updated patch for now. But we can
change them if others think it’s not a good idea.
4.
@@ -3086,7 +2592,6 @@ lazy_cleanup_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
ivinfo.report_progress = false;
ivinfo.estimated_count = estimated_count;
ivinfo.message_level = elevel;
-
ivinfo.num_heap_tuples = reltuples;This seems like an unrelated change.
Yes, but I think it's an unnecessary break so we can change it
together. Should it be done in a separate patch?Isn't this just spurious line removal which shouldn't be part of any patch?
Okay.
I've attached updated patches. The first patch just moves common
function for index bulk-deletion and cleanup to vacuum.c. And the
second patch moves parallel vacuum code to vacuumparallel.c. The
comments I got so far are incorporated into these patches. Please
review them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v9-0001-Move-index-vacuum-routines-to-vacuum.c.patchapplication/octet-stream; name=v9-0001-Move-index-vacuum-routines-to-vacuum.c.patchDownload
From 852b98fab2bbea9e45a6d05551ca0b79249ab50e Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 17 Dec 2021 12:15:33 +0900
Subject: [PATCH v9 1/2] Move index vacuum routines to vacuum.c
This commit moves these routines to vacuum.c so that other table AM
can use them.
An upcoming patch moves parallle vaucum code out of vacuumlazy.c, and
both lazy vacuum and parallel vacuum use these index vacuum functions.
---
src/backend/access/heap/vacuumlazy.c | 166 ++++-----------------------
src/backend/commands/vacuum.c | 148 ++++++++++++++++++++++++
src/include/commands/vacuum.h | 22 ++++
src/tools/pgindent/typedefs.list | 2 +-
4 files changed, 193 insertions(+), 145 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index db6becfed5..66cd6b7721 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -149,26 +149,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
/*
* Shared information among parallel workers. So this is allocated in the DSM
* segment.
@@ -339,9 +319,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -434,11 +419,8 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
@@ -905,7 +887,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +2022,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2404,7 +2386,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -3045,13 +3027,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vac_bulkdel_one_index(&ivinfo, istat, (void *) vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3102,24 +3078,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = vac_cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3481,19 +3440,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3504,7 +3450,7 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
/*
@@ -3539,7 +3485,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3565,74 +3511,6 @@ dead_items_cleanup(LVRelState *vacrel)
parallel_vacuum_end(vacrel);
}
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
-}
-
/*
* Check if every tuple in the given page is visible to all current and future
* transactions. Also return the visibility_cutoff_xid which is the highest
@@ -3873,7 +3751,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
int nindexes = vacrel->nindexes;
ParallelContext *pcxt;
LVShared *shared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
@@ -3927,7 +3805,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -4011,8 +3889,8 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
lps->lvshared = shared;
/* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
dead_items->max_items = max_items;
dead_items->num_items = 0;
MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
@@ -4138,7 +4016,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVParallelIndStats *lvpindstats;
LVShared *lvshared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
int nindexes;
@@ -4183,9 +4061,9 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
/* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
/* Set cost-based vacuum delay */
VacuumCostActive = (VacuumCostDelay > 0);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..0b590bb16a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,147 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * vac_bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Calls index AM's ambulkdelete routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..97bffa8ff1 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -230,6 +231,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +298,12 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vac_bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61ccbdd0..9863508791 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1305,7 +1305,6 @@ LPVOID
LPWSTR
LSEG
LUID
-LVDeadTuples
LVPagePruneState
LVParallelIndStats
LVParallelIndVacStatus
@@ -2800,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
v9-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v9-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From 908d1b1254169b32fbadbb415d046db1918a7d80 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 15 Dec 2021 16:49:01 +0900
Subject: [PATCH v9 2/2] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 990 +---------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuumparallel.c | 1092 +++++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 19 +
src/tools/pgindent/typedefs.list | 9 +-
7 files changed, 1138 insertions(+), 976 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 66cd6b7721..ae2e07a7b7 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,135 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where parallel
- * index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into
- * local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be <
- * min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -295,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -398,13 +256,6 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -418,18 +269,11 @@ static bool should_attempt_truncation(LVRelState *vacrel);
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
-static int dead_items_max_items(LVRelState *vacrel);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2065,7 +1909,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2194,7 +2037,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2234,7 +2076,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2607,353 +2449,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at
- * least for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See
- * parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing index that is unsafe for workers or has an
- * unsuitable target for parallel index vacuum (this is processed in
- * parallel_vacuum_process_unsafe_indexes() by the leader).
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each worker
- * touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2979,7 +2480,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
}
}
@@ -3414,8 +2917,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3453,6 +2954,9 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3476,15 +2980,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
- max_items = dead_items_max_items(vacrel);
dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3508,7 +3017,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3632,77 +3142,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* Skip index that is not a suitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3713,7 +3152,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
for (int idx = 0; idx < nindexes; idx++)
{
@@ -3735,393 +3174,6 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- VacDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = vac_max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). We do this to avoid
- * the need to invoke workers when parallel index cleanup doesn't need to
- * scan the index. See the comments for option
- * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
- * parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- VacDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (VacDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
*/
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..556acf66ec
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1092 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items albulklocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /*
+ * Error reporting state. The error callback is set only for workers
+ * processes during parallel index vacuum.
+ */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success, return parallel vacuum state. Otherwise return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_INDEX_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(PVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* have_done_bulkdel is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * have_done_bulkdel is true if the caller has done index bulk-deletion one
+ * or more times in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool have_done_bulkdel)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, have_done_bulkdel);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers.
+ * The index is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip index that is not a suitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel)
+{
+ int nworkers;
+ PVIndVacStatus new_status;
+
+ Assert(!IsParallelWorker());
+
+ if (vacuum)
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (!have_done_bulkdel)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i], vacuum,
+ have_done_bulkdel));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader vacuums alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->first_time = false;
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is vacuumed in
+ * parallel_vacuum_unsafe_indexes() by the leader).
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, indexes that are safe for workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = vac_cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel)
+{
+ uint8 vacoptions;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (have_done_bulkdel &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED, false);
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 97bffa8ff1..8bda1cc38d 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -16,6 +16,7 @@
#include "access/htup.h"
#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -63,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -305,6 +309,21 @@ extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
IndexBulkDeleteResult *istat);
extern Size vac_max_items_to_alloc_size(int max_items);
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
+ int nindexes, int nrequested_workers,
+ int max_items, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
+
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
VacuumParams *params, List *va_cols, bool in_outer_xact,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9863508791..f093605472 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1306,13 +1306,8 @@ LPWSTR
LSEG
LUID
LVPagePruneState
-LVParallelIndStats
-LVParallelIndVacStatus
-LVParallelState
LVRelState
LVSavedErrInfo
-LVShared
-LVSharedIndStats
LWLock
LWLockHandle
LWLockMode
@@ -1775,7 +1770,10 @@ PTIterationArray
PTOKEN_PRIVILEGES
PTOKEN_USER
PUTENVPROC
+PVIndStats
+PvIndVacStatus
PVOID
+PVShared
PX_Alias
PX_Cipher
PX_Combo
@@ -1809,6 +1807,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
--
2.24.3 (Apple Git-128)
On Thu, Dec 16, 2021 at 4:27 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Wed, Dec 15, 2021 4:03 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 14, 2021 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?I'm on the same page.
I've attached an updated patch. The patch incorporated several changes from
the last version:* Rename parallel_vacuum_begin() to parallel_vacuum_init()
* Unify the terminology; use "index bulk-deletion" and "index cleanup"
instead of "index vacuum" and "index cleanup".
* Fix the comment of parallel_vacuum_init() pointed out by Andres
* Fix a typo that is left in commit 22bd3cbe0c (pointed out by Hou)Please review it.
Thanks for updating the patch.
Here are a few comments:
Thank you for the comments!
1) + case PARALLEL_INDVAC_STATUS_NEED_CLEANUP: + errcontext("while cleanup index \"%s\" of relation \"%s.%s\"",I noticed current code uses error msg "While cleaning up index xxx" which seems a little
different from the patch's maybe we can use the previous one ?
Right. Fixed.
2)
static inline Size max_items_to_alloc_size(int max_items);This old function declaration can be deleted.
Removed.
3) --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.listI think we need to remove LVShared, LVSharedIndStats, LVDeadItems and
LVParallelState from typedefs.list and add PVShared and PVIndStats to the file.
Fixed.
These comments are incorporated into the patch I just submitted[1]/messages/by-id/CAD21AoB66GqEjHttbRd0_fy9hnBPJp8kBCWnMq87mG6V_BODSA@mail.gmail.com.
Regards,
[1]: /messages/by-id/CAD21AoB66GqEjHttbRd0_fy9hnBPJp8kBCWnMq87mG6V_BODSA@mail.gmail.com
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Dec 17, 2021 at 10:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches. The first patch just moves common
function for index bulk-deletion and cleanup to vacuum.c. And the
second patch moves parallel vacuum code to vacuumparallel.c. The
comments I got so far are incorporated into these patches. Please
review them.
Thanks, it is helpful for the purpose of review.
Few comments:
=============
1.
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
Isn't it better to keep these comments atop the structure VacDeadItems
declaration?
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum? The other thing related to this is that if we have to
do the way you have it here then we don't need pg_rusage_init() in
functions lazy_vacuum_one_index/lazy_cleanup_one_index.
3.
@@ -3713,7 +3152,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;
- Assert(!IsInParallelMode());
+ Assert(!ParallelVacuumIsActive(vacrel));
I think we can retain the older Assert. If we do that then I think we
don't need to define ParallelVacuumIsActive in vacuumlazy.c.
--
With Regards,
Amit Kapila.
On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Dec 17, 2021 at 10:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached updated patches. The first patch just moves common
function for index bulk-deletion and cleanup to vacuum.c. And the
second patch moves parallel vacuum code to vacuumparallel.c. The
comments I got so far are incorporated into these patches. Please
review them.Thanks, it is helpful for the purpose of review.
Few comments: ============= 1. + * dead_items stores TIDs whose index tuples are deleted by index vacuuming. + * Each TID points to an LP_DEAD line pointer from a heap page that has been + * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass. */ - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */Isn't it better to keep these comments atop the structure VacDeadItems
declaration?
I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
VacDeadItems to vacuum.c, I thought it's better to keep it as generic
TID storage.
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum?
Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.
The other thing related to this is that if we have to
do the way you have it here then we don't need pg_rusage_init() in
functions lazy_vacuum_one_index/lazy_cleanup_one_index.
Right. It should be removed.
3.
@@ -3713,7 +3152,7 @@ update_index_statistics(LVRelState *vacrel)
int nindexes = vacrel->nindexes;
IndexBulkDeleteResult **indstats = vacrel->indstats;- Assert(!IsInParallelMode()); + Assert(!ParallelVacuumIsActive(vacrel));I think we can retain the older Assert. If we do that then I think we
don't need to define ParallelVacuumIsActive in vacuumlazy.c.
Right, will fix in the next version patch.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Dec 20, 2021 at 8:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments: ============= 1. + * dead_items stores TIDs whose index tuples are deleted by index vacuuming. + * Each TID points to an LP_DEAD line pointer from a heap page that has been + * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass. */ - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */Isn't it better to keep these comments atop the structure VacDeadItems
declaration?I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
VacDeadItems to vacuum.c, I thought it's better to keep it as generic
TID storage.
Okay, that makes sense.
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum?Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.
How about exposing it via heapam.h? We have already exposed a few
things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
proposal, we need to have separate callbacks and phases for index
vacuuming so that it can be used by both vacuumlazy.c and
vacuumparallel.c which might not be a good idea.
--
With Regards,
Amit Kapila.
On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 20, 2021 at 8:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments: ============= 1. + * dead_items stores TIDs whose index tuples are deleted by index vacuuming. + * Each TID points to an LP_DEAD line pointer from a heap page that has been + * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass. */ - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */Isn't it better to keep these comments atop the structure VacDeadItems
declaration?I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
VacDeadItems to vacuum.c, I thought it's better to keep it as generic
TID storage.Okay, that makes sense.
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum?Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.How about exposing it via heapam.h? We have already exposed a few
things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
proposal, we need to have separate callbacks and phases for index
vacuuming so that it can be used by both vacuumlazy.c and
vacuumparallel.c which might not be a good idea.
Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
need to also expose LVRelState and vacuumparallel.c also uses it,
which seems not a good idea. So we will need to introduce a new struct
dedicated to the error callback function. Is that right?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum?Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.How about exposing it via heapam.h? We have already exposed a few
things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
proposal, we need to have separate callbacks and phases for index
vacuuming so that it can be used by both vacuumlazy.c and
vacuumparallel.c which might not be a good idea.Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
need to also expose LVRelState and vacuumparallel.c also uses it,
which seems not a good idea. So we will need to introduce a new struct
dedicated to the error callback function. Is that right?
Right, but that also doesn't sound good to me. I think it is better to
keep a separate error context for parallel vacuum workers as you have
in the patch. However, let's add some comments to indicate that if
there is a change in one of the context functions, the other should be
changed. BTW, if we go with that then we should set the correct phase
for workers as well?
--
With Regards,
Amit Kapila.
On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Dec 20, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
2. What is the reason for not moving
lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
can be called from vacuumlazy.c and vacuumparallel.c? Without this
refactoring patch, I think both leader and workers set the same error
context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
vacuuming? Is it because you want a separate context phase for a
parallel vacuum?Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.How about exposing it via heapam.h? We have already exposed a few
things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
proposal, we need to have separate callbacks and phases for index
vacuuming so that it can be used by both vacuumlazy.c and
vacuumparallel.c which might not be a good idea.Yeah, but if we expose VacErrPhase and vacuum_error_callback(), we
need to also expose LVRelState and vacuumparallel.c also uses it,
which seems not a good idea. So we will need to introduce a new struct
dedicated to the error callback function. Is that right?Right, but that also doesn't sound good to me.
Me too.
I think it is better to
keep a separate error context for parallel vacuum workers as you have
in the patch. However, let's add some comments to indicate that if
there is a change in one of the context functions, the other should be
changed.
Agreed.
BTW, if we go with that then we should set the correct phase
for workers as well?
If we have separate error context for the leader (vacuumlazy.c) and
workers (vacuumparallel.c), workers don't necessarily need to have the
phases such as VACUUM_ERRCB_PHASE_VACUUM_INDEX and
VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
error callback function as the patch does.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Dec 21, 2021 at 10:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
BTW, if we go with that then we should set the correct phase
for workers as well?If we have separate error context for the leader (vacuumlazy.c) and
workers (vacuumparallel.c), workers don't necessarily need to have the
phases such as VACUUM_ERRCB_PHASE_VACUUM_INDEX and
VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
error callback function as the patch does.
Okay. One minor point, let's change comments atop vacuum.c considering
the movement of new functions.
--
With Regards,
Amit Kapila.
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 21, 2021 at 10:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 12:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Dec 20, 2021 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
BTW, if we go with that then we should set the correct phase
for workers as well?If we have separate error context for the leader (vacuumlazy.c) and
workers (vacuumparallel.c), workers don't necessarily need to have the
phases such as VACUUM_ERRCB_PHASE_VACUUM_INDEX and
VACUUM_ERRCB_PHASE_INDEX_CLEANUP. They can use PVIndVacStatus in the
error callback function as the patch does.Okay. One minor point, let's change comments atop vacuum.c considering
the movement of new functions.
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v10-0001-Move-index-vacuum-routines-to-vacuum.c.patchapplication/octet-stream; name=v10-0001-Move-index-vacuum-routines-to-vacuum.c.patchDownload
From 98fc8658ef42b3e2a9e9d11014b3ba00af16f4d5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 17 Dec 2021 12:15:33 +0900
Subject: [PATCH v10 1/2] Move index vacuum routines to vacuum.c
This commit moves these routines to vacuum.c so that other table AM
can use them.
An upcoming patch moves parallle vaucum code out of vacuumlazy.c, and
both lazy vacuum and parallel vacuum use these index vacuum functions.
---
src/backend/access/heap/vacuumlazy.c | 172 ++++-----------------------
src/backend/commands/vacuum.c | 156 +++++++++++++++++++++++-
src/include/commands/vacuum.h | 22 ++++
src/tools/pgindent/typedefs.list | 2 +-
4 files changed, 197 insertions(+), 155 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index db6becfed5..4f11f63156 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -149,26 +149,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
/*
* Shared information among parallel workers. So this is allocated in the DSM
* segment.
@@ -339,9 +319,14 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
+ * Each TID points to an LP_DEAD line pointer from a heap page that has been
+ * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which
+ * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -434,11 +419,8 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
@@ -905,7 +887,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +2022,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2404,7 +2386,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -3019,11 +3001,8 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
double reltuples, LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
- PGRUsage ru0;
LVSavedErrInfo saved_err_info;
- pg_rusage_init(&ru0);
-
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
@@ -3045,13 +3024,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vac_bulkdel_one_index(&ivinfo, istat, (void *) vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3076,11 +3049,8 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
- PGRUsage ru0;
LVSavedErrInfo saved_err_info;
- pg_rusage_init(&ru0);
-
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
@@ -3102,24 +3072,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = vac_cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3481,19 +3434,6 @@ dead_items_max_items(LVRelState *vacrel)
return (int) max_items;
}
-/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
@@ -3504,7 +3444,7 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
/*
@@ -3539,7 +3479,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3565,74 +3505,6 @@ dead_items_cleanup(LVRelState *vacrel)
parallel_vacuum_end(vacrel);
}
-/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
-}
-
/*
* Check if every tuple in the given page is visible to all current and future
* transactions. Also return the visibility_cutoff_xid which is the highest
@@ -3873,7 +3745,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
int nindexes = vacrel->nindexes;
ParallelContext *pcxt;
LVShared *shared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
@@ -3927,7 +3799,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -4011,8 +3883,8 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
lps->lvshared = shared;
/* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
dead_items->max_items = max_items;
dead_items->num_items = 0;
MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
@@ -4138,7 +4010,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVParallelIndStats *lvpindstats;
LVShared *lvshared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
int nindexes;
@@ -4183,9 +4055,9 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
/* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
/* Set cost-based vacuum delay */
VacuumCostActive = (VacuumCostDelay > 0);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15b44..77de35f4eb 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -3,10 +3,10 @@
* vacuum.c
* The postgres vacuum cleaner.
*
- * This file now includes only control and dispatch code for VACUUM and
- * ANALYZE commands. Regular VACUUM is implemented in vacuumlazy.c,
- * ANALYZE in analyze.c, and VACUUM FULL is a variant of CLUSTER, handled
- * in cluster.c.
+ * This file includes control and dispatch code for VACUUM and ANALYZE
+ * commands, and index vacuum code. VACUUM for heap AM is implemented in
+ * vacuumlazy.c, ANALYZE in analyze.c, and VACUUM FULL is a variant of
+ * CLUSTER, handled in cluster.c.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
@@ -32,6 +32,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +52,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +91,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2262,147 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * vac_bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Calls index AM's ambulkdelete routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Calls index AM's amvacuumcleanup routine.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52eaf4..97bffa8ff1 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -230,6 +231,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +298,12 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vac_bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61ccbdd0..9863508791 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1305,7 +1305,6 @@ LPVOID
LPWSTR
LSEG
LUID
-LVDeadTuples
LVPagePruneState
LVParallelIndStats
LVParallelIndVacStatus
@@ -2800,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
2.24.3 (Apple Git-128)
v10-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v10-0002-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From cf61f51c57d0e1f656b30faddd4fa2894d3888b3 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 15 Dec 2021 16:49:01 +0900
Subject: [PATCH v10 2/2] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 990 +---------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 4 +-
src/backend/commands/vacuumparallel.c | 1094 +++++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 19 +
src/tools/pgindent/typedefs.list | 9 +-
8 files changed, 1143 insertions(+), 977 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4f11f63156..f2446166a0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,135 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where parallel
- * index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into
- * local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be <
- * min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -295,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -398,13 +256,6 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -418,18 +269,11 @@ static bool should_attempt_truncation(LVRelState *vacrel);
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
-static int dead_items_max_items(LVRelState *vacrel);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2065,7 +1909,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2194,7 +2037,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2234,7 +2076,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2607,353 +2449,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at
- * least for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See
- * parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing index that is unsafe for workers or has an
- * unsuitable target for parallel index vacuum (this is processed in
- * parallel_vacuum_process_unsafe_indexes() by the leader).
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each worker
- * touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2979,7 +2480,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
}
}
@@ -3408,8 +2911,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3447,6 +2948,9 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3470,15 +2974,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
- max_items = dead_items_max_items(vacrel);
dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3502,7 +3011,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3626,77 +3136,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* Skip index that is not a suitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3729,395 +3168,10 @@ update_index_statistics(LVRelState *vacrel)
}
}
-/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- VacDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = vac_max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). We do this to avoid
- * the need to invoke workers when parallel index cleanup doesn't need to
- * scan the index. See the comments for option
- * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
- * parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- VacDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (VacDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
/*
* Error context callback for errors occurring during vacuum.
+ * The error context messages match parallel vacuum error context messages.
+ * If you change this function, also change parallel_vacuum_error_callback().
*/
static void
vacuum_error_callback(void *arg)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 77de35f4eb..a69e52a5c1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -5,8 +5,8 @@
*
* This file includes control and dispatch code for VACUUM and ANALYZE
* commands, and index vacuum code. VACUUM for heap AM is implemented in
- * vacuumlazy.c, ANALYZE in analyze.c, and VACUUM FULL is a variant of
- * CLUSTER, handled in cluster.c.
+ * vacuumlazy.c, parallel vacuum is implemented in vacuumparallel.c ANALYZE
+ * in analyze.c, and VACUUM FULL is a variant of CLUSTER, handled in cluster.c.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..46f7d57fd9
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1094 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items albulklocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /*
+ * Error reporting state. The error callback is set only for workers
+ * processes during parallel index vacuum.
+ */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success, return parallel vacuum state. Otherwise return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_INDEX_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(PVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int idx = 0; idx < nindexes; idx++)
+ {
+ Relation indrel = indrels[idx];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[idx])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* have_done_bulkdel is not used in parallel bulkdel cases */
+ parallel_vacuum_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * have_done_bulkdel is true if the caller has done index bulk-deletion one
+ * or more times in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool have_done_bulkdel)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_all_indexes(pvs, false, have_done_bulkdel);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers.
+ * The index is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip index that is not a suitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel)
+{
+ int nworkers;
+ PVIndVacStatus new_status;
+
+ Assert(!IsParallelWorker());
+
+ if (vacuum)
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (!have_done_bulkdel)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i], vacuum,
+ have_done_bulkdel));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader vacuums alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_safe_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->first_time = false;
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is vacuumed in
+ * parallel_vacuum_unsafe_indexes() by the leader).
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, indexes that are safe for workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_one_index(ParallelVacuumState *pvs, Relation indrel, PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = vac_cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel)
+{
+ uint8 vacoptions;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (have_done_bulkdel &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED, false);
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_safe_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ * The error context messages match lazy vacuum error context messages. If you
+ * change this function, also change vacuum_error_callback().
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 97bffa8ff1..8bda1cc38d 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -16,6 +16,7 @@
#include "access/htup.h"
#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -63,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -305,6 +309,21 @@ extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
IndexBulkDeleteResult *istat);
extern Size vac_max_items_to_alloc_size(int max_items);
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
+ int nindexes, int nrequested_workers,
+ int max_items, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
+
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
VacuumParams *params, List *va_cols, bool in_outer_xact,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9863508791..f093605472 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1306,13 +1306,8 @@ LPWSTR
LSEG
LUID
LVPagePruneState
-LVParallelIndStats
-LVParallelIndVacStatus
-LVParallelState
LVRelState
LVSavedErrInfo
-LVShared
-LVSharedIndStats
LWLock
LWLockHandle
LWLockMode
@@ -1775,7 +1770,10 @@ PTIterationArray
PTOKEN_PRIVILEGES
PTOKEN_USER
PUTENVPROC
+PVIndStats
+PvIndVacStatus
PVOID
+PVShared
PX_Alias
PX_Cipher
PX_Combo
@@ -1809,6 +1807,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
--
2.24.3 (Apple Git-128)
On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
These look mostly good to me. Please find attached the slightly edited
version of the 0001 patch. I have modified comments, ran pgindent, and
modify the commit message. I'll commit this tomorrow if you are fine
with it.
--
With Regards,
Amit Kapila.
Attachments:
v11-0001-Move-index-vacuum-routines-to-vacuum.c.patchapplication/octet-stream; name=v11-0001-Move-index-vacuum-routines-to-vacuum.c.patchDownload
From 5d34bf904d52969e830dca55264c7d560a6cf02f Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Fri, 17 Dec 2021 12:15:33 +0900
Subject: [PATCH v11] Move index vacuum routines to vacuum.c.
An upcoming patch moves parallel vacuum code out of vacuumlazy.c. This
code restructuring will allow both lazy vacuum and parallel vacuum to use
index vacuum functions.
Author: Masahiko Sawada
Reviewed-by: Hou Zhijie, Amit Kapila
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 173 +++++------------------------------
src/backend/commands/vacuum.c | 154 ++++++++++++++++++++++++++++++-
src/include/commands/vacuum.h | 22 +++++
src/tools/pgindent/typedefs.list | 2 +-
4 files changed, 196 insertions(+), 155 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index db6becf..d8f1217 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -150,26 +150,6 @@ typedef enum
} VacErrPhase;
/*
- * LVDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- * Each TID points to an LP_DEAD line pointer from a heap page that has been
- * processed by lazy_scan_prune.
- *
- * Also needed by lazy_vacuum_heap_rel, which marks the same LP_DEAD line
- * pointers as LP_UNUSED during second heap pass.
- */
-typedef struct LVDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} LVDeadItems;
-
-#define MAXDEADITEMS(avail_mem) \
- (((avail_mem) - offsetof(LVDeadItems, items)) / sizeof(ItemPointerData))
-
-/*
* Shared information among parallel workers. So this is allocated in the DSM
* segment.
*/
@@ -339,9 +319,15 @@ typedef struct LVRelState
VacErrPhase phase;
/*
- * State managed by lazy_scan_heap() follows
+ * State managed by lazy_scan_heap() follows.
+ *
+ * dead_items stores TIDs whose index tuples are deleted by index
+ * vacuuming. Each TID points to an LP_DEAD line pointer from a heap page
+ * that has been processed by lazy_scan_prune. Also needed by
+ * lazy_vacuum_heap_rel, which marks the same LP_DEAD line pointers as
+ * LP_UNUSED during second heap pass.
*/
- LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
+ VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages skipped due to a pin */
@@ -434,11 +420,8 @@ static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
static int dead_items_max_items(LVRelState *vacrel);
-static inline Size max_items_to_alloc_size(int max_items);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
-static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
@@ -905,7 +888,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
static void
lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BlockNumber nblocks,
blkno,
next_unskippable_block,
@@ -2040,7 +2023,7 @@ retry:
*/
if (lpdead_items > 0)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
ItemPointerData tmp;
Assert(!prunestate->all_visible);
@@ -2404,7 +2387,7 @@ static int
lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
int index, Buffer *vmbuffer)
{
- LVDeadItems *dead_items = vacrel->dead_items;
+ VacDeadItems *dead_items = vacrel->dead_items;
Page page = BufferGetPage(buffer);
OffsetNumber unused[MaxHeapTuplesPerPage];
int uncnt = 0;
@@ -3019,11 +3002,8 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
double reltuples, LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
- PGRUsage ru0;
LVSavedErrInfo saved_err_info;
- pg_rusage_init(&ru0);
-
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
@@ -3045,13 +3025,7 @@ lazy_vacuum_one_index(Relation indrel, IndexBulkDeleteResult *istat,
InvalidBlockNumber, InvalidOffsetNumber);
/* Do bulk deletion */
- istat = index_bulk_delete(&ivinfo, istat, lazy_tid_reaped,
- (void *) vacrel->dead_items);
-
- ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
- vacrel->indname, vacrel->dead_items->num_items),
- errdetail_internal("%s", pg_rusage_show(&ru0))));
+ istat = vac_bulkdel_one_index(&ivinfo, istat, (void *) vacrel->dead_items);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3076,11 +3050,8 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
LVRelState *vacrel)
{
IndexVacuumInfo ivinfo;
- PGRUsage ru0;
LVSavedErrInfo saved_err_info;
- pg_rusage_init(&ru0);
-
ivinfo.index = indrel;
ivinfo.analyze_only = false;
ivinfo.report_progress = false;
@@ -3102,24 +3073,7 @@ lazy_cleanup_one_index(Relation indrel, IndexBulkDeleteResult *istat,
VACUUM_ERRCB_PHASE_INDEX_CLEANUP,
InvalidBlockNumber, InvalidOffsetNumber);
- istat = index_vacuum_cleanup(&ivinfo, istat);
-
- if (istat)
- {
- ereport(elevel,
- (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
- RelationGetRelationName(indrel),
- istat->num_index_tuples,
- istat->num_pages),
- errdetail("%.0f index row versions were removed.\n"
- "%u index pages were newly deleted.\n"
- "%u index pages are currently deleted, of which %u are currently reusable.\n"
- "%s.",
- istat->tuples_removed,
- istat->pages_newly_deleted,
- istat->pages_deleted, istat->pages_free,
- pg_rusage_show(&ru0))));
- }
+ istat = vac_cleanup_one_index(&ivinfo, istat);
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
@@ -3482,19 +3436,6 @@ dead_items_max_items(LVRelState *vacrel)
}
/*
- * Returns the total required space for VACUUM's dead_items array given a
- * max_items value returned by dead_items_max_items
- */
-static inline Size
-max_items_to_alloc_size(int max_items)
-{
- Assert(max_items >= MaxHeapTuplesPerPage);
- Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
-
- return offsetof(LVDeadItems, items) + sizeof(ItemPointerData) * max_items;
-}
-
-/*
* Allocate dead_items (either using palloc, or in dynamic shared memory).
* Sets dead_items in vacrel for caller.
*
@@ -3504,7 +3445,7 @@ max_items_to_alloc_size(int max_items)
static void
dead_items_alloc(LVRelState *vacrel, int nworkers)
{
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
int max_items;
/*
@@ -3539,7 +3480,7 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
/* Serial VACUUM case */
max_items = dead_items_max_items(vacrel);
- dead_items = (LVDeadItems *) palloc(max_items_to_alloc_size(max_items));
+ dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3566,74 +3507,6 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * lazy_tid_reaped() -- is a particular tid deletable?
- *
- * This has the right signature to be an IndexBulkDeleteCallback.
- *
- * Assumes dead_items array is sorted (in ascending TID order).
- */
-static bool
-lazy_tid_reaped(ItemPointer itemptr, void *state)
-{
- LVDeadItems *dead_items = (LVDeadItems *) state;
- int64 litem,
- ritem,
- item;
- ItemPointer res;
-
- litem = itemptr_encode(&dead_items->items[0]);
- ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
- item = itemptr_encode(itemptr);
-
- /*
- * Doing a simple bound check before bsearch() is useful to avoid the
- * extra cost of bsearch(), especially if dead items on the heap are
- * concentrated in a certain range. Since this function is called for
- * every index tuple, it pays to be really fast.
- */
- if (item < litem || item > ritem)
- return false;
-
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) dead_items->items,
- dead_items->num_items,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
-
- return (res != NULL);
-}
-
-/*
- * Comparator routines for use with qsort() and bsearch().
- */
-static int
-vac_cmp_itemptr(const void *left, const void *right)
-{
- BlockNumber lblk,
- rblk;
- OffsetNumber loff,
- roff;
-
- lblk = ItemPointerGetBlockNumber((ItemPointer) left);
- rblk = ItemPointerGetBlockNumber((ItemPointer) right);
-
- if (lblk < rblk)
- return -1;
- if (lblk > rblk)
- return 1;
-
- loff = ItemPointerGetOffsetNumber((ItemPointer) left);
- roff = ItemPointerGetOffsetNumber((ItemPointer) right);
-
- if (loff < roff)
- return -1;
- if (loff > roff)
- return 1;
-
- return 0;
-}
-
-/*
* Check if every tuple in the given page is visible to all current and future
* transactions. Also return the visibility_cutoff_xid which is the highest
* xmin amongst the visible tuples. Set *all_frozen to true if every tuple
@@ -3873,7 +3746,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
int nindexes = vacrel->nindexes;
ParallelContext *pcxt;
LVShared *shared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
LVParallelIndStats *pindstats;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
@@ -3927,7 +3800,7 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
/* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
max_items = dead_items_max_items(vacrel);
- est_dead_items_len = max_items_to_alloc_size(max_items);
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -4011,8 +3884,8 @@ parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
lps->lvshared = shared;
/* Prepare the dead_items space */
- dead_items = (LVDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
dead_items->max_items = max_items;
dead_items->num_items = 0;
MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
@@ -4138,7 +4011,7 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
Relation *indrels;
LVParallelIndStats *lvpindstats;
LVShared *lvshared;
- LVDeadItems *dead_items;
+ VacDeadItems *dead_items;
BufferUsage *buffer_usage;
WalUsage *wal_usage;
int nindexes;
@@ -4183,9 +4056,9 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
false);
/* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (LVDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
/* Set cost-based vacuum delay */
VacuumCostActive = (VacuumCostDelay > 0);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 5c4bc15..3b481bc 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -3,10 +3,12 @@
* vacuum.c
* The postgres vacuum cleaner.
*
- * This file now includes only control and dispatch code for VACUUM and
- * ANALYZE commands. Regular VACUUM is implemented in vacuumlazy.c,
- * ANALYZE in analyze.c, and VACUUM FULL is a variant of CLUSTER, handled
- * in cluster.c.
+ * This file includes (a) control and dispatch code for VACUUM and ANALYZE
+ * commands, (b) code to compute various vacuum thresholds, and (c) index
+ * vacuum code.
+ *
+ * VACUUM for heap AM is implemented in vacuumlazy.c, ANALYZE in analyze.c, and
+ * VACUUM FULL is a variant of CLUSTER, handled in cluster.c.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
@@ -32,6 +34,7 @@
#include "access/transam.h"
#include "access/xact.h"
#include "catalog/namespace.h"
+#include "catalog/index.h"
#include "catalog/pg_database.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_namespace.h"
@@ -51,6 +54,7 @@
#include "utils/fmgroids.h"
#include "utils/guc.h"
#include "utils/memutils.h"
+#include "utils/pg_rusage.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -89,6 +93,8 @@ static void vac_truncate_clog(TransactionId frozenXID,
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
+static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static int vac_cmp_itemptr(const void *left, const void *right);
/*
* Primary entry point for manual VACUUM and ANALYZE commands
@@ -2258,3 +2264,143 @@ get_vacoptval_from_boolean(DefElem *def)
{
return defGetBoolean(def) ? VACOPTVALUE_ENABLED : VACOPTVALUE_DISABLED;
}
+
+/*
+ * vac_bulkdel_one_index() -- bulk-deletion for index relation.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_bulkdel_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ /* Do bulk deletion */
+ istat = index_bulk_delete(ivinfo, istat, vac_tid_reaped,
+ (void *) dead_items);
+
+ ereport(ivinfo->message_level,
+ (errmsg("scanned index \"%s\" to remove %d row versions",
+ RelationGetRelationName(ivinfo->index),
+ dead_items->num_items),
+ errdetail_internal("%s", pg_rusage_show(&ru0))));
+
+ return istat;
+}
+
+/*
+ * vac_cleanup_one_index() -- do post-vacuum cleanup for index relation.
+ *
+ * Returns bulk delete stats derived from input stats
+ */
+IndexBulkDeleteResult *
+vac_cleanup_one_index(IndexVacuumInfo *ivinfo, IndexBulkDeleteResult *istat)
+{
+ PGRUsage ru0;
+
+ pg_rusage_init(&ru0);
+
+ istat = index_vacuum_cleanup(ivinfo, istat);
+
+ if (istat)
+ {
+ ereport(ivinfo->message_level,
+ (errmsg("index \"%s\" now contains %.0f row versions in %u pages",
+ RelationGetRelationName(ivinfo->index),
+ istat->num_index_tuples,
+ istat->num_pages),
+ errdetail("%.0f index row versions were removed.\n"
+ "%u index pages were newly deleted.\n"
+ "%u index pages are currently deleted, of which %u are currently reusable.\n"
+ "%s.",
+ istat->tuples_removed,
+ istat->pages_newly_deleted,
+ istat->pages_deleted, istat->pages_free,
+ pg_rusage_show(&ru0))));
+ }
+
+ return istat;
+}
+
+/*
+ * Returns the total required space for VACUUM's dead_items array given a
+ * max_items value.
+ */
+inline Size
+vac_max_items_to_alloc_size(int max_items)
+{
+ Assert(max_items <= MAXDEADITEMS(MaxAllocSize));
+
+ return offsetof(VacDeadItems, items) + sizeof(ItemPointerData) * max_items;
+}
+
+/*
+ * vac_tid_reaped() -- is a particular tid deletable?
+ *
+ * This has the right signature to be an IndexBulkDeleteCallback.
+ *
+ * Assumes dead_items array is sorted (in ascending TID order).
+ */
+static bool
+vac_tid_reaped(ItemPointer itemptr, void *state)
+{
+ VacDeadItems *dead_items = (VacDeadItems *) state;
+ int64 litem,
+ ritem,
+ item;
+ ItemPointer res;
+
+ litem = itemptr_encode(&dead_items->items[0]);
+ ritem = itemptr_encode(&dead_items->items[dead_items->num_items - 1]);
+ item = itemptr_encode(itemptr);
+
+ /*
+ * Doing a simple bound check before bsearch() is useful to avoid the
+ * extra cost of bsearch(), especially if dead items on the heap are
+ * concentrated in a certain range. Since this function is called for
+ * every index tuple, it pays to be really fast.
+ */
+ if (item < litem || item > ritem)
+ return false;
+
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_items->items,
+ dead_items->num_items,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
+
+ return (res != NULL);
+}
+
+/*
+ * Comparator routines for use with qsort() and bsearch().
+ */
+static int
+vac_cmp_itemptr(const void *left, const void *right)
+{
+ BlockNumber lblk,
+ rblk;
+ OffsetNumber loff,
+ roff;
+
+ lblk = ItemPointerGetBlockNumber((ItemPointer) left);
+ rblk = ItemPointerGetBlockNumber((ItemPointer) right);
+
+ if (lblk < rblk)
+ return -1;
+ if (lblk > rblk)
+ return 1;
+
+ loff = ItemPointerGetOffsetNumber((ItemPointer) left);
+ roff = ItemPointerGetOffsetNumber((ItemPointer) right);
+
+ if (loff < roff)
+ return -1;
+ if (loff > roff)
+ return 1;
+
+ return 0;
+}
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4cfd52e..97bffa8 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/genam.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -230,6 +231,21 @@ typedef struct VacuumParams
int nworkers;
} VacuumParams;
+/*
+ * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
+ */
+typedef struct VacDeadItems
+{
+ int max_items; /* # slots allocated in array */
+ int num_items; /* current # of entries */
+
+ /* Sorted array of TIDs to delete from indexes */
+ ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
+} VacDeadItems;
+
+#define MAXDEADITEMS(avail_mem) \
+ (((avail_mem) - offsetof(VacDeadItems, items)) / sizeof(ItemPointerData))
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for PostGIS */
extern int vacuum_freeze_min_age;
@@ -282,6 +298,12 @@ extern bool vacuum_is_relation_owner(Oid relid, Form_pg_class reltuple,
extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
bits32 options, bool verbose,
LOCKMODE lmode);
+extern IndexBulkDeleteResult *vac_bulkdel_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat,
+ VacDeadItems *dead_items);
+extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
+ IndexBulkDeleteResult *istat);
+extern Size vac_max_items_to_alloc_size(int max_items);
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c61ccb..9863508 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1305,7 +1305,6 @@ LPVOID
LPWSTR
LSEG
LUID
-LVDeadTuples
LVPagePruneState
LVParallelIndStats
LVParallelIndVacStatus
@@ -2800,6 +2799,7 @@ UserMapping
UserOpts
VacAttrStats
VacAttrStatsP
+VacDeadItems
VacErrPhase
VacOptValue
VacuumParams
--
1.8.3.1
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
These look mostly good to me. Please find attached the slightly edited
version of the 0001 patch. I have modified comments, ran pgindent, and
modify the commit message. I'll commit this tomorrow if you are fine
with it.
Thank you for the patch! It looks good to me.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Wed, Dec 22, 2021 8:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
These look mostly good to me. Please find attached the slightly edited
version of the 0001 patch. I have modified comments, ran pgindent, and
modify the commit message. I'll commit this tomorrow if you are fine
with it.It looks good to me.
+1, the patch passes check-world and looks good to me.
Best regards,
Hou zj
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
These look mostly good to me. Please find attached the slightly edited
version of the 0001 patch. I have modified comments, ran pgindent, and
modify the commit message. I'll commit this tomorrow if you are fine
with it.
Thank you for committing the first patch.
I've attached an updated version second patch. Please review it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
v11-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v11-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From c30d9a1d53746e6b2399ddb918f15c8ac8d2bf7c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 22 Dec 2021 12:10:33 +0900
Subject: [PATCH v11] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 992 +---------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 5 +-
src/backend/commands/vacuumparallel.c | 1098 +++++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 19 +
src/tools/pgindent/typedefs.list | 9 +-
8 files changed, 1149 insertions(+), 978 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d8f1217504..a307dce707 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -120,23 +119,11 @@
*/
#define PREFETCH_SIZE ((BlockNumber) 32)
-/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,135 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where parallel
- * index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into
- * local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be <
- * min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -295,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -399,13 +257,6 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -419,18 +270,11 @@ static bool should_attempt_truncation(LVRelState *vacrel);
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
-static int dead_items_max_items(LVRelState *vacrel);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -2066,7 +1910,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2195,7 +2038,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2235,7 +2077,7 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2608,353 +2450,12 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
return false;
}
-/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at
- * least for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See
- * parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing index that is unsafe for workers or has an
- * unsuitable target for parallel index vacuum (this is processed in
- * parallel_vacuum_process_unsafe_indexes() by the leader).
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each worker
- * touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2980,7 +2481,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ (vacrel->tupcount_pages < vacrel->rel_pages),
+ vacrel->num_index_scans);
}
}
@@ -3409,8 +2912,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3448,6 +2949,9 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3471,15 +2975,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
- max_items = dead_items_max_items(vacrel);
dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3503,7 +3012,8 @@ dead_items_cleanup(LVRelState *vacrel)
* End parallel mode before updating index statistics as we cannot write
* during parallel mode.
*/
- parallel_vacuum_end(vacrel);
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3627,77 +3137,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
return all_visible;
}
-/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* Skip index that is not a suitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
/*
* Update index statistics in pg_class if the statistics are accurate.
*/
@@ -3731,394 +3170,9 @@ update_index_statistics(LVRelState *vacrel)
}
/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- VacDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = vac_max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). We do this to avoid
- * the need to invoke workers when parallel index cleanup doesn't need to
- * scan the index. See the comments for option
- * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
- * parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- VacDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (VacDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
-/*
- * Error context callback for errors occurring during vacuum.
+ * Error context callback for errors occurring during vacuum. The error
+ * context messages match the messages set in parallel vacuum. If you
+ * change this function, also change parallel_vacuum_error_callback().
*/
static void
vacuum_error_callback(void *arg)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f573..ae7c7133dd 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0ae4..48f7348f91 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3b481bcf86..c94c187d36 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -7,8 +7,9 @@
* commands, (b) code to compute various vacuum thresholds, and (c) index
* vacuum code.
*
- * VACUUM for heap AM is implemented in vacuumlazy.c, ANALYZE in analyze.c, and
- * VACUUM FULL is a variant of CLUSTER, handled in cluster.c.
+ * VACUUM for heap AM is implemented in vacuumlazy.c, parallel vacuum in
+ * vacuumparallel.c, ANALYZE in analyze.c, and VACUUM FULL is a variant of
+ * CLUSTER, handled in cluster.c.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000000..f802431d4c
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1098 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well
+ * as the memory space for storing dead items albulklocated in the DSM segment.
+ * When starting either parallel index bulk-deletion or index cleanup, we
+ * launch parallel worker processes. Once all index are processed, the
+ * parallel worker processes exit. In the next time, the parallel context
+ * is re-initialized so that the same DSM can be used for multiple passes of
+ * index bulk-deletion and index cleanup. At the end of a parallel vacuum,
+ * ParallelVacuumState is destroyed while returning index statistics so
+ * that we can update them after exiting from the parallel mode.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "access/table.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "miscadmin.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "storage/lmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/elog.h"
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* True if we need to reinitialize parallel DSM before launching workers */
+ bool first_time;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /*
+ * Error reporting state. The error callback is set only for workers
+ * processes during parallel index vacuum.
+ */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success, return parallel vacuum state. Otherwise return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->first_time = true;
+ pvs->bstrategy = bstrategy;
+
+ /*
+ * Set error traceback information. Other field will be filled during
+ * vacuuming indexes.
+ */
+ pvs->relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs->relname = pstrdup(RelationGetRelationName(rel));
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_INDEX_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(PVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[i])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs->relnamespace);
+ pfree(pvs->relname);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ /* have_done_bulkdel is not used in parallel bulkdel cases */
+ parallel_vacuum_process_all_indexes(pvs, true, false);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ *
+ * have_done_bulkdel must be true if the caller has done index bulk-deletion
+ * one or more times in the vacuum execution.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ bool estimated_count, bool have_done_bulkdel)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_process_all_indexes(pvs, false, have_done_bulkdel);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers.
+ * The index is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip index that is not a suitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ *
+ * have_done_bulkdel must be true if the caller has done index bulk-deletion
+ * one or more times in the vacuum execution.
+ */
+static void
+parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, bool vacuum,
+ bool have_done_bulkdel)
+{
+ int nworkers;
+ PVIndVacStatus new_status;
+
+ Assert(!IsParallelWorker());
+
+ if (vacuum)
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (!have_done_bulkdel)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i], vacuum,
+ have_done_bulkdel));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (!pvs->first_time)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_process_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader vacuums alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_process_safe_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+
+ pvs->first_time = false;
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is vacuumed in
+ * parallel_vacuum_process_unsafe_indexes() by the leader).
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, indexes that are safe for workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = vac_cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, bool vacuum,
+ bool have_done_bulkdel)
+{
+ uint8 vacoptions;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (have_done_bulkdel &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED, false);
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ * The error context messages match the messages set in lazy vacuum error
+ * context. If you change this function, also change vacuum_error_callback().
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ break;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd288e5..f3fb1e93a5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 97bffa8ff1..8bda1cc38d 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -16,6 +16,7 @@
#include "access/htup.h"
#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -63,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -305,6 +309,21 @@ extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
IndexBulkDeleteResult *istat);
extern Size vac_max_items_to_alloc_size(int max_items);
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
+ int nindexes, int nrequested_workers,
+ int max_items, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ bool estimated_count,
+ bool no_bulkdel_call);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
+
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
VacuumParams *params, List *va_cols, bool in_outer_xact,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9863508791..f093605472 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1306,13 +1306,8 @@ LPWSTR
LSEG
LUID
LVPagePruneState
-LVParallelIndStats
-LVParallelIndVacStatus
-LVParallelState
LVRelState
LVSavedErrInfo
-LVShared
-LVSharedIndStats
LWLock
LWLockHandle
LWLockMode
@@ -1775,7 +1770,10 @@ PTIterationArray
PTOKEN_PRIVILEGES
PTOKEN_USER
PUTENVPROC
+PVIndStats
+PvIndVacStatus
PVOID
+PVShared
PX_Alias
PX_Cipher
PX_Combo
@@ -1809,6 +1807,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
--
2.24.3 (Apple Git-128)
On Wed, Dec 22, 2021 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:On Tue, Dec 21, 2021 at 11:24 AM Masahiko Sawada
<sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 2:04 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Thank you for the comment. Agreed.
I've attached updated version patches. Please review them.
These look mostly good to me. Please find attached the slightly edited
version of the 0001 patch. I have modified comments, ran pgindent, and
modify the commit message. I'll commit this tomorrow if you are fine
with it.Thank you for committing the first patch.
I've attached an updated version second patch. Please review it.
Regards,
Hi,
The patch looks mostly good to me.
I only have few comments.
1)
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples)
+{
+ Assert(!IsParallelWorker());
+
Would it be better to also put Assert(pvs != NULL) here ? Because we removed
the Assert(ParallelVacuumIsActive(vacrel)) check in the old function.
2)
+#include "utils/rel.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
It might be better to keep the header file in alphabetical order.
:
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
Best regards,
Hou zj
On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
On Wed, Dec 22, 2021 11:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Dec 21, 2021 at 10:24 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:The patch looks mostly good to me.
I only have few comments.1) +/* + * Do parallel index bulk-deletion with parallel workers. + */ +void +parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples) +{ + Assert(!IsParallelWorker()); +Would it be better to also put Assert(pvs != NULL) here ? Because we removed
the Assert(ParallelVacuumIsActive(vacrel)) check in the old function.
I am not sure if that is helpful or not because there is only one
caller of it which checks pvs before calling this function.
2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"
Right, I'll take care of this as I am already making some other edits
in the patch.
--
With Regards,
Amit Kapila.
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"Right, I'll take care of this as I am already making some other edits
in the patch.
Fixed this and made a few other changes in the patch that includes (a)
passed down the num_index_scans information in parallel APIs based on
which it can make the decision whether to reinitialize DSM or consider
conditional parallel vacuum clean up; (b) get rid of first-time
variable in ParallelVacuumState as that is not required if we have
num_index_scans information; (c) there seems to be quite a few
unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState
in leader backend; (e) changed/added comments at quite a few places.
Can you please once verify the changes in the attached?
--
With Regards,
Amit Kapila.
Attachments:
v12-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchapplication/octet-stream; name=v12-0001-Move-parallel-vacuum-code-to-vacuumparallel.c.patchDownload
From f9705a8d502e1f9f0c4ea4feccecf41155163b8b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 22 Dec 2021 12:10:33 +0900
Subject: [PATCH v12] Move parallel vacuum code to vacuumparallel.c
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously, parallel vacuum was specific to lazy vacuum, i.g., heap
table AM. But the job that parallel vacuum does isn’t really specific
to heap.
This commit moves parallel vacuum realted code to new file
commands/vacuumparallel.c so that any table AM supporting indexes can
utilize parallel vacuum in order to call index AM
callbacks (ambulkdelete and amvacuumcleanup) with parallel workers.
With that, also moves some vacuum related functions and structures to
commands/vacuum.c so that both lazy vacuum and parallel vacuum can
refer to them.
Suggestion from Andres Freund.
Discussion: https://www.postgresql.org/message-id/20211030212101.ae3qcouatwmy7tbr%40alap3.anarazel.de
---
src/backend/access/heap/vacuumlazy.c | 1002 +------------------------------
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 1 +
src/backend/commands/vacuum.c | 5 +-
src/backend/commands/vacuumparallel.c | 1068 +++++++++++++++++++++++++++++++++
src/include/access/heapam.h | 1 -
src/include/commands/vacuum.h | 20 +
src/tools/pgindent/typedefs.list | 9 +-
8 files changed, 1125 insertions(+), 983 deletions(-)
create mode 100644 src/backend/commands/vacuumparallel.c
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d8f1217..cd603e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -40,7 +40,6 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
-#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
#include "access/xact.h"
@@ -121,22 +120,10 @@
#define PREFETCH_SIZE ((BlockNumber) 32)
/*
- * DSM keys for parallel vacuum. Unlike other parallel execution code, since
- * we don't need to worry about DSM keys conflicting with plan_node_id we can
- * use small integers.
- */
-#define PARALLEL_VACUUM_KEY_SHARED 1
-#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
-#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
-#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
-#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
-#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
-
-/*
* Macro to check if we are in a parallel vacuum. If true, we are in the
* parallel mode and the DSM segment is initialized.
*/
-#define ParallelVacuumIsActive(vacrel) ((vacrel)->lps != NULL)
+#define ParallelVacuumIsActive(vacrel) ((vacrel)->pvs != NULL)
/* Phases of vacuum during which we report error context. */
typedef enum
@@ -149,135 +136,6 @@ typedef enum
VACUUM_ERRCB_PHASE_TRUNCATE
} VacErrPhase;
-/*
- * Shared information among parallel workers. So this is allocated in the DSM
- * segment.
- */
-typedef struct LVShared
-{
- /*
- * Target table relid and log level. These fields are not modified during
- * the lazy vacuum.
- */
- Oid relid;
- int elevel;
-
- /*
- * Fields for both index vacuum and cleanup.
- *
- * reltuples is the total number of input heap tuples. We set either old
- * live tuples in the index vacuum case or the new live tuples in the
- * index cleanup case.
- *
- * estimated_count is true if reltuples is an estimated value. (Note that
- * reltuples could be -1 in this case, indicating we have no idea.)
- */
- double reltuples;
- bool estimated_count;
-
- /*
- * In single process lazy vacuum we could consume more memory during index
- * vacuuming or cleanup apart from the memory for heap scanning. In
- * parallel vacuum, since individual vacuum workers can consume memory
- * equal to maintenance_work_mem, the new maintenance_work_mem for each
- * worker is set such that the parallel operation doesn't consume more
- * memory than single process lazy vacuum.
- */
- int maintenance_work_mem_worker;
-
- /*
- * Shared vacuum cost balance. During parallel vacuum,
- * VacuumSharedCostBalance points to this value and it accumulates the
- * balance of each parallel vacuum worker.
- */
- pg_atomic_uint32 cost_balance;
-
- /*
- * Number of active parallel workers. This is used for computing the
- * minimum threshold of the vacuum cost balance before a worker sleeps for
- * cost-based delay.
- */
- pg_atomic_uint32 active_nworkers;
-
- /* Counter for vacuuming and cleanup */
- pg_atomic_uint32 idx;
-} LVShared;
-
-/* Status used during parallel index vacuum or cleanup */
-typedef enum LVParallelIndVacStatus
-{
- PARALLEL_INDVAC_STATUS_INITIAL = 0,
- PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
- PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
- PARALLEL_INDVAC_STATUS_COMPLETED
-} LVParallelIndVacStatus;
-
-/*
- * Struct for index vacuum statistics of an index that is used for parallel vacuum.
- * This includes the status of parallel index vacuum as well as index statistics.
- */
-typedef struct LVParallelIndStats
-{
- /*
- * The following two fields are set by leader process before executing
- * parallel index vacuum or parallel index cleanup. These fields are not
- * fixed for the entire VACUUM operation. They are only fixed for an
- * individual parallel index vacuum and cleanup.
- *
- * parallel_workers_can_process is true if both leader and worker can
- * process the index, otherwise only leader can process it.
- */
- LVParallelIndVacStatus status;
- bool parallel_workers_can_process;
-
- /*
- * Individual worker or leader stores the result of index vacuum or
- * cleanup.
- */
- bool istat_updated; /* are the stats updated? */
- IndexBulkDeleteResult istat;
-} LVParallelIndStats;
-
-/* Struct for maintaining a parallel vacuum state. */
-typedef struct LVParallelState
-{
- ParallelContext *pcxt;
-
- /* Shared information among parallel vacuum workers */
- LVShared *lvshared;
-
- /*
- * Shared index statistics among parallel vacuum workers. The array
- * element is allocated for every index, even those indexes where parallel
- * index vacuuming is unsafe or not worthwhile (e.g.,
- * will_parallel_vacuum[] is false). During parallel vacuum,
- * IndexBulkDeleteResult of each index is kept in DSM and is copied into
- * local memory at the end of parallel vacuum.
- */
- LVParallelIndStats *lvpindstats;
-
- /* Points to buffer usage area in DSM */
- BufferUsage *buffer_usage;
-
- /* Points to WAL usage area in DSM */
- WalUsage *wal_usage;
-
- /*
- * False if the index is totally unsuitable target for all parallel
- * processing. For example, the index could be <
- * min_parallel_index_scan_size cutoff.
- */
- bool *will_parallel_vacuum;
-
- /*
- * The number of indexes that support parallel index bulk-deletion and
- * parallel index cleanup respectively.
- */
- int nindexes_parallel_bulkdel;
- int nindexes_parallel_cleanup;
- int nindexes_parallel_condcleanup;
-} LVParallelState;
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -295,9 +153,9 @@ typedef struct LVRelState
bool do_index_cleanup;
bool do_rel_truncate;
- /* Buffer access strategy and parallel state */
+ /* Buffer access strategy and parallel vacuum state */
BufferAccessStrategy bstrategy;
- LVParallelState *lps;
+ ParallelVacuumState *pvs;
/* rel's initial relfrozenxid and relminmxid */
TransactionId relfrozenxid;
@@ -399,13 +257,6 @@ static bool lazy_check_needs_freeze(Buffer buf, bool *hastup,
LVRelState *vacrel);
static bool lazy_check_wraparound_failsafe(LVRelState *vacrel);
static void lazy_cleanup_all_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum);
-static void parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats);
-static void parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel);
-static void parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared,
- LVParallelIndStats *pindstats);
static IndexBulkDeleteResult *lazy_vacuum_one_index(Relation indrel,
IndexBulkDeleteResult *istat,
double reltuples,
@@ -419,18 +270,11 @@ static bool should_attempt_truncation(LVRelState *vacrel);
static void lazy_truncate_heap(LVRelState *vacrel);
static BlockNumber count_nondeletable_pages(LVRelState *vacrel,
bool *lock_waiter_detected);
-static int dead_items_max_items(LVRelState *vacrel);
static void dead_items_alloc(LVRelState *vacrel, int nworkers);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static int parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum);
static void update_index_statistics(LVRelState *vacrel);
-static void parallel_vacuum_begin(LVRelState *vacrel, int nrequested);
-static void parallel_vacuum_end(LVRelState *vacrel);
-static bool parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
LVSavedErrInfo *saved_vacrel,
@@ -1601,7 +1445,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
/*
* Free resources managed by dead_items_alloc. This will end parallel
- * mode when needed (it must end before we update index statistics).
+ * mode when needed (it must end before updating index statistics as we
+ * can't write in parallel mode).
*/
dead_items_cleanup(vacrel);
@@ -2066,7 +1911,6 @@ lazy_vacuum(LVRelState *vacrel)
/* Should not end up here with no indexes */
Assert(vacrel->nindexes > 0);
- Assert(!IsParallelWorker());
Assert(vacrel->lpdead_item_pages > 0);
if (!vacrel->do_index_vacuuming)
@@ -2195,7 +2039,6 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
{
bool allindexes = true;
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
Assert(vacrel->do_index_vacuuming);
Assert(vacrel->do_index_cleanup);
@@ -2235,7 +2078,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, true);
+ parallel_vacuum_bulkdel_all_indexes(vacrel->pvs, vacrel->old_live_tuples,
+ vacrel->num_index_scans);
/*
* Do a postcheck to consider applying wraparound failsafe now. Note
@@ -2609,352 +2453,11 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
}
/*
- * Perform index vacuum or index cleanup with parallel workers. This function
- * must be used by the parallel vacuum leader process.
- */
-static void
-parallel_vacuum_process_all_indexes(LVRelState *vacrel, bool vacuum)
-{
- LVParallelState *lps = vacrel->lps;
- LVParallelIndVacStatus new_status;
- int nworkers;
-
- Assert(!IsParallelWorker());
- Assert(ParallelVacuumIsActive(vacrel));
- Assert(vacrel->nindexes > 0);
-
- if (vacuum)
- {
- /*
- * We can only provide an approximate value of num_heap_tuples, at
- * least for now. Matches serial VACUUM case.
- */
- vacrel->lps->lvshared->reltuples = vacrel->old_live_tuples;
- vacrel->lps->lvshared->estimated_count = true;
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_bulkdel;
- }
- else
- {
- /*
- * We can provide a better estimate of total number of surviving
- * tuples (we assume indexes are more interested in that than in the
- * number of nominally live tuples).
- */
- vacrel->lps->lvshared->reltuples = vacrel->new_rel_tuples;
- vacrel->lps->lvshared->estimated_count =
- (vacrel->tupcount_pages < vacrel->rel_pages);
-
- new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
-
- /* Determine the number of parallel workers to launch */
- nworkers = vacrel->lps->nindexes_parallel_cleanup;
-
- /* Add conditionally parallel-aware indexes if in the first time call */
- if (vacrel->num_index_scans == 0)
- nworkers += vacrel->lps->nindexes_parallel_condcleanup;
- }
-
- /* The leader process will participate */
- nworkers--;
-
- /*
- * It is possible that parallel context is initialized with fewer workers
- * than the number of indexes that need a separate worker in the current
- * phase, so we need to consider it. See
- * parallel_vacuum_compute_workers().
- */
- nworkers = Min(nworkers, lps->pcxt->nworkers);
-
- /*
- * Set index vacuum status and mark whether parallel vacuum worker can
- * process it.
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(vacrel->lps->lvpindstats[i]);
-
- Assert(pindstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
- pindstats->status = new_status;
- pindstats->parallel_workers_can_process =
- (lps->will_parallel_vacuum[i] &
- parallel_vacuum_index_is_parallel_safe(vacrel, vacrel->indrels[i],
- vacuum));
- }
-
- /* Reset the parallel index processing counter */
- pg_atomic_write_u32(&(lps->lvshared->idx), 0);
-
- /* Setup the shared cost-based vacuum delay and launch workers */
- if (nworkers > 0)
- {
- /* Reinitialize parallel context to relaunch parallel workers */
- if (vacrel->num_index_scans > 0)
- ReinitializeParallelDSM(lps->pcxt);
-
- /*
- * Set up shared cost balance and the number of active workers for
- * vacuum delay. We need to do this before launching workers as
- * otherwise, they might not see the updated values for these
- * parameters.
- */
- pg_atomic_write_u32(&(lps->lvshared->cost_balance), VacuumCostBalance);
- pg_atomic_write_u32(&(lps->lvshared->active_nworkers), 0);
-
- /*
- * The number of workers can vary between bulkdelete and cleanup
- * phase.
- */
- ReinitializeParallelWorkers(lps->pcxt, nworkers);
-
- LaunchParallelWorkers(lps->pcxt);
-
- if (lps->pcxt->nworkers_launched > 0)
- {
- /*
- * Reset the local cost values for leader backend as we have
- * already accumulated the remaining balance of heap.
- */
- VacuumCostBalance = 0;
- VacuumCostBalanceLocal = 0;
-
- /* Enable shared cost balance for leader backend */
- VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
- VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
- }
-
- if (vacuum)
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
- "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- else
- ereport(elevel,
- (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
- "launched %d parallel vacuum workers for index cleanup (planned: %d)",
- lps->pcxt->nworkers_launched),
- lps->pcxt->nworkers_launched, nworkers)));
- }
-
- /* Process the indexes that can be processed by only leader process */
- parallel_vacuum_process_unsafe_indexes(vacrel);
-
- /*
- * Join as a parallel worker. The leader process alone processes all
- * parallel-safe indexes in the case where no workers are launched.
- */
- parallel_vacuum_process_safe_indexes(vacrel, lps->lvshared, lps->lvpindstats);
-
- /*
- * Next, accumulate buffer and WAL usage. (This must wait for the workers
- * to finish, or we might get incomplete data.)
- */
- if (nworkers > 0)
- {
- /* Wait for all vacuum workers to finish */
- WaitForParallelWorkersToFinish(lps->pcxt);
-
- for (int i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
- }
-
- /*
- * Reset all index status back to initial (while checking that we have
- * processed all indexes).
- */
- for (int i = 0; i < vacrel->nindexes; i++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[i]);
-
- if (pindstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
- elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
- RelationGetRelationName(vacrel->indrels[i]));
-
- pindstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
- }
-
- /*
- * Carry the shared balance value to heap scan and disable shared costing
- */
- if (VacuumSharedCostBalance)
- {
- VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
- VacuumSharedCostBalance = NULL;
- VacuumActiveNWorkers = NULL;
- }
-}
-
-/*
- * Index vacuum/cleanup routine used by the leader process and parallel
- * vacuum worker processes to process the indexes in parallel.
- */
-static void
-parallel_vacuum_process_safe_indexes(LVRelState *vacrel, LVShared *shared,
- LVParallelIndStats *pindstats)
-{
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- /* Loop until all indexes are vacuumed */
- for (;;)
- {
- int idx;
- LVParallelIndStats *pis;
-
- /* Get an index number to process */
- idx = pg_atomic_fetch_add_u32(&(shared->idx), 1);
-
- /* Done for all indexes? */
- if (idx >= vacrel->nindexes)
- break;
-
- pis = &(pindstats[idx]);
-
- /*
- * Skip processing index that is unsafe for workers or has an
- * unsuitable target for parallel index vacuum (this is processed in
- * parallel_vacuum_process_unsafe_indexes() by the leader).
- */
- if (!pis->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- shared, pis);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Perform parallel processing of indexes in leader process.
- *
- * Handles index vacuuming (or index cleanup) for indexes that are not
- * parallel safe. It's possible that this will vary for a given index, based
- * on details like whether we're performing index cleanup right now.
- *
- * Also performs processing of smaller indexes that fell under the size cutoff
- * enforced by parallel_vacuum_compute_workers().
- */
-static void
-parallel_vacuum_process_unsafe_indexes(LVRelState *vacrel)
-{
- LVParallelState *lps = vacrel->lps;
-
- Assert(!IsParallelWorker());
-
- /*
- * Increment the active worker count if we are able to launch any worker.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
-
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- /* Skip, indexes that are safe for workers */
- if (pindstats->parallel_workers_can_process)
- continue;
-
- /* Do vacuum or cleanup of the index */
- parallel_vacuum_process_one_index(vacrel, vacrel->indrels[idx],
- lps->lvshared, pindstats);
- }
-
- /*
- * We have completed the index vacuum so decrement the active worker
- * count.
- */
- if (VacuumActiveNWorkers)
- pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
-}
-
-/*
- * Vacuum or cleanup index either by leader process or by one of the worker
- * process. After processing the index this function copies the index
- * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
- * segment.
- */
-static void
-parallel_vacuum_process_one_index(LVRelState *vacrel, Relation indrel,
- LVShared *shared, LVParallelIndStats *pindstats)
-{
- IndexBulkDeleteResult *istat = NULL;
- IndexBulkDeleteResult *istat_res;
-
- /*
- * Update the pointer to the corresponding bulk-deletion result if someone
- * has already updated it
- */
- if (pindstats->istat_updated)
- istat = &(pindstats->istat);
-
- switch (pindstats->status)
- {
- case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
- istat_res = lazy_vacuum_one_index(indrel, istat,
- shared->reltuples, vacrel);
- break;
- case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
- istat_res = lazy_cleanup_one_index(indrel, istat,
- shared->reltuples,
- shared->estimated_count,
- vacrel);
- break;
- default:
- elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
- pindstats->status,
- RelationGetRelationName(indrel));
- }
-
- /*
- * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first cycle because they
- * allocate locally and it's possible that an index will be vacuumed by a
- * different vacuum process the next cycle. Copying the result normally
- * happens only the first time an index is vacuumed. For any additional
- * vacuum pass, we directly point to the result on the DSM segment and
- * pass it to vacuum index APIs so that workers can update it directly.
- *
- * Since all vacuum workers write the bulk-deletion result at different
- * slots we can write them without locking.
- */
- if (!pindstats->istat_updated && istat_res != NULL)
- {
- memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
- pindstats->istat_updated = true;
-
- /* Free the locally-allocated bulk-deletion result */
- pfree(istat_res);
- }
-
- /*
- * Update the status to completed. No need to lock here since each worker
- * touches different indexes.
- */
- pindstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
-}
-
-/*
* lazy_cleanup_all_indexes() -- cleanup all indexes of relation.
*/
static void
lazy_cleanup_all_indexes(LVRelState *vacrel)
{
- Assert(!IsParallelWorker());
Assert(vacrel->nindexes > 0);
/* Report that we are now cleaning up indexes */
@@ -2980,7 +2483,9 @@ lazy_cleanup_all_indexes(LVRelState *vacrel)
else
{
/* Outsource everything to parallel variant */
- parallel_vacuum_process_all_indexes(vacrel, false);
+ parallel_vacuum_cleanup_all_indexes(vacrel->pvs, vacrel->new_rel_tuples,
+ vacrel->num_index_scans,
+ (vacrel->tupcount_pages < vacrel->rel_pages));
}
}
@@ -3409,8 +2914,6 @@ dead_items_max_items(LVRelState *vacrel)
autovacuum_work_mem != -1 ?
autovacuum_work_mem : maintenance_work_mem;
- Assert(!IsParallelWorker());
-
if (vacrel->nindexes > 0)
{
BlockNumber rel_pages = vacrel->rel_pages;
@@ -3448,6 +2951,9 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
VacDeadItems *dead_items;
int max_items;
+ max_items = dead_items_max_items(vacrel);
+ Assert(max_items >= MaxHeapTuplesPerPage);
+
/*
* Initialize state for a parallel vacuum. As of now, only one worker can
* be used for an index, so we invoke parallelism only if there are at
@@ -3471,15 +2977,20 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
vacrel->relname)));
}
else
- parallel_vacuum_begin(vacrel, nworkers);
+ vacrel->pvs = parallel_vacuum_init(vacrel->rel, vacrel->indrels,
+ vacrel->nindexes, nworkers,
+ max_items, elevel,
+ vacrel->bstrategy);
- /* If parallel mode started, vacrel->dead_items allocated in DSM */
+ /* If parallel mode started, dead_items space is allocated in DSM */
if (ParallelVacuumIsActive(vacrel))
+ {
+ vacrel->dead_items = parallel_vacuum_get_dead_items(vacrel->pvs);
return;
+ }
}
/* Serial VACUUM case */
- max_items = dead_items_max_items(vacrel);
dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
dead_items->max_items = max_items;
dead_items->num_items = 0;
@@ -3499,11 +3010,9 @@ dead_items_cleanup(LVRelState *vacrel)
return;
}
- /*
- * End parallel mode before updating index statistics as we cannot write
- * during parallel mode.
- */
- parallel_vacuum_end(vacrel);
+ /* End parallel mode */
+ parallel_vacuum_end(vacrel->pvs, vacrel->indstats);
+ vacrel->pvs = NULL;
}
/*
@@ -3628,77 +3137,6 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
}
/*
- * Compute the number of parallel worker processes to request. Both index
- * vacuum and index cleanup can be executed with parallel workers. The index
- * is eligible for parallel vacuum iff its size is greater than
- * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt performance.
- *
- * nrequested is the number of parallel workers that user requested. If
- * nrequested is 0, we compute the parallel degree based on nindexes, that is
- * the number of indexes that support parallel vacuum. This function also
- * sets will_parallel_vacuum to remember indexes that participate in parallel
- * vacuum.
- */
-static int
-parallel_vacuum_compute_workers(LVRelState *vacrel, int nrequested,
- bool *will_parallel_vacuum)
-{
- int nindexes_parallel = 0;
- int nindexes_parallel_bulkdel = 0;
- int nindexes_parallel_cleanup = 0;
- int parallel_workers;
-
- /*
- * We don't allow performing parallel operation in standalone backend or
- * when parallelism is disabled.
- */
- if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
- return 0;
-
- /*
- * Compute the number of indexes that can participate in parallel vacuum.
- */
- for (int idx = 0; idx < vacrel->nindexes; idx++)
- {
- Relation indrel = vacrel->indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* Skip index that is not a suitable target for parallel index vacuum */
- if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
- RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
- continue;
-
- will_parallel_vacuum[idx] = true;
-
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- nindexes_parallel_bulkdel++;
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- nindexes_parallel_cleanup++;
- }
-
- nindexes_parallel = Max(nindexes_parallel_bulkdel,
- nindexes_parallel_cleanup);
-
- /* The leader process takes one index */
- nindexes_parallel--;
-
- /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
-
- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
-
- return parallel_workers;
-}
-
-/*
* Update index statistics in pg_class if the statistics are accurate.
*/
static void
@@ -3731,394 +3169,10 @@ update_index_statistics(LVRelState *vacrel)
}
/*
- * Try to enter parallel mode and create a parallel context. Then initialize
- * shared memory state.
- *
- * On success (when we can launch one or more workers), will set dead_items and
- * lps in vacrel for caller. A set lps in vacrel state indicates that parallel
- * VACUUM is currently active.
- */
-static void
-parallel_vacuum_begin(LVRelState *vacrel, int nrequested)
-{
- LVParallelState *lps;
- Relation *indrels = vacrel->indrels;
- int nindexes = vacrel->nindexes;
- ParallelContext *pcxt;
- LVShared *shared;
- VacDeadItems *dead_items;
- LVParallelIndStats *pindstats;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- bool *will_parallel_vacuum;
- int max_items;
- Size est_pindstats_len;
- Size est_shared_len;
- Size est_dead_items_len;
- int nindexes_mwm = 0;
- int parallel_workers = 0;
- int querylen;
-
- /*
- * A parallel vacuum must be requested and there must be indexes on the
- * relation
- */
- Assert(nrequested >= 0);
- Assert(nindexes > 0);
-
- /*
- * Compute the number of parallel vacuum workers to launch
- */
- will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
- parallel_workers = parallel_vacuum_compute_workers(vacrel, nrequested,
- will_parallel_vacuum);
- if (parallel_workers <= 0)
- {
- /* Can't perform vacuum in parallel -- lps not set in vacrel */
- pfree(will_parallel_vacuum);
- return;
- }
-
- lps = (LVParallelState *) palloc0(sizeof(LVParallelState));
-
- EnterParallelMode();
- pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
- parallel_workers);
- Assert(pcxt->nworkers > 0);
- lps->pcxt = pcxt;
- lps->will_parallel_vacuum = will_parallel_vacuum;
-
- /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */
- est_pindstats_len = mul_size(sizeof(LVParallelIndStats), nindexes);
- shm_toc_estimate_chunk(&pcxt->estimator, est_pindstats_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
- est_shared_len = sizeof(LVShared);
- shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
- max_items = dead_items_max_items(vacrel);
- est_dead_items_len = vac_max_items_to_alloc_size(max_items);
- shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /*
- * Estimate space for BufferUsage and WalUsage --
- * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
- *
- * If there are no extensions loaded that care, we could skip this. We
- * have no way of knowing whether anyone's looking at pgBufferUsage or
- * pgWalUsage, so do it unconditionally.
- */
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- shm_toc_estimate_chunk(&pcxt->estimator,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_estimate_keys(&pcxt->estimator, 1);
-
- /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
- if (debug_query_string)
- {
- querylen = strlen(debug_query_string);
- shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
- shm_toc_estimate_keys(&pcxt->estimator, 1);
- }
- else
- querylen = 0; /* keep compiler quiet */
-
- InitializeParallelDSM(pcxt);
-
- /* Prepare index vacuum stats */
- pindstats = (LVParallelIndStats *) shm_toc_allocate(pcxt->toc, est_pindstats_len);
- for (int idx = 0; idx < nindexes; idx++)
- {
- Relation indrel = indrels[idx];
- uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /*
- * Cleanup option should be either disabled, always performing in
- * parallel or conditionally performing in parallel.
- */
- Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
- Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
-
- if (!will_parallel_vacuum[idx])
- continue;
-
- if (indrel->rd_indam->amusemaintenanceworkmem)
- nindexes_mwm++;
-
- /*
- * Remember the number of indexes that support parallel operation for
- * each phase.
- */
- if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
- lps->nindexes_parallel_bulkdel++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
- lps->nindexes_parallel_cleanup++;
- if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
- lps->nindexes_parallel_condcleanup++;
- }
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, pindstats);
- lps->lvpindstats = pindstats;
-
- /* Prepare shared information */
- shared = (LVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
- MemSet(shared, 0, est_shared_len);
- shared->relid = RelationGetRelid(vacrel->rel);
- shared->elevel = elevel;
- shared->maintenance_work_mem_worker =
- (nindexes_mwm > 0) ?
- maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
- maintenance_work_mem;
-
- pg_atomic_init_u32(&(shared->cost_balance), 0);
- pg_atomic_init_u32(&(shared->active_nworkers), 0);
- pg_atomic_init_u32(&(shared->idx), 0);
-
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
- lps->lvshared = shared;
-
- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
-
- /*
- * Allocate space for each worker's BufferUsage and WalUsage; no need to
- * initialize
- */
- buffer_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(BufferUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
- lps->buffer_usage = buffer_usage;
- wal_usage = shm_toc_allocate(pcxt->toc,
- mul_size(sizeof(WalUsage), pcxt->nworkers));
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
- lps->wal_usage = wal_usage;
-
- /* Store query string for workers */
- if (debug_query_string)
- {
- char *sharedquery;
-
- sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
- memcpy(sharedquery, debug_query_string, querylen + 1);
- sharedquery[querylen] = '\0';
- shm_toc_insert(pcxt->toc,
- PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
- }
-
- /* Success -- set dead_items and lps in leader's vacrel state */
- vacrel->dead_items = dead_items;
- vacrel->lps = lps;
-}
-
-/*
- * Destroy the parallel context, and end parallel mode.
- *
- * Since writes are not allowed during parallel mode, copy the
- * updated index statistics from DSM into local memory and then later use that
- * to update the index statistics. One might think that we can exit from
- * parallel mode, update the index statistics and then destroy parallel
- * context, but that won't be safe (see ExitParallelMode).
- */
-static void
-parallel_vacuum_end(LVRelState *vacrel)
-{
- IndexBulkDeleteResult **indstats = vacrel->indstats;
- LVParallelState *lps = vacrel->lps;
- int nindexes = vacrel->nindexes;
-
- Assert(!IsParallelWorker());
-
- /* Copy the updated statistics */
- for (int idx = 0; idx < nindexes; idx++)
- {
- LVParallelIndStats *pindstats = &(lps->lvpindstats[idx]);
-
- if (pindstats->istat_updated)
- {
- indstats[idx] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
- memcpy(indstats[idx], &pindstats->istat, sizeof(IndexBulkDeleteResult));
- }
- else
- indstats[idx] = NULL;
- }
-
- DestroyParallelContext(lps->pcxt);
- ExitParallelMode();
-
- /* Deactivate parallel vacuum */
- pfree(lps->will_parallel_vacuum);
- pfree(lps);
- vacrel->lps = NULL;
-}
-
-/*
- * Returns false, if the given index can't participate in the next execution of
- * parallel index vacuum or parallel index cleanup.
- */
-static bool
-parallel_vacuum_index_is_parallel_safe(LVRelState *vacrel, Relation indrel,
- bool vacuum)
-{
- uint8 vacoptions;
-
- vacoptions = indrel->rd_indam->amparallelvacuumoptions;
-
- /* In parallel vacuum case, check if it supports parallel bulk-deletion */
- if (vacuum)
- return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
-
- /* Not safe, if the index does not support parallel cleanup */
- if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
- return false;
-
- /*
- * Not safe, if the index supports parallel cleanup conditionally, but we
- * have already processed the index (for bulkdelete). We do this to avoid
- * the need to invoke workers when parallel index cleanup doesn't need to
- * scan the index. See the comments for option
- * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
- * parallel cleanup conditionally.
- */
- if (vacrel->num_index_scans > 0 &&
- ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
- return false;
-
- return true;
-}
-
-/*
- * Perform work within a launched parallel process.
- *
- * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report progress information.
- */
-void
-parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
-{
- Relation rel;
- Relation *indrels;
- LVParallelIndStats *lvpindstats;
- LVShared *lvshared;
- VacDeadItems *dead_items;
- BufferUsage *buffer_usage;
- WalUsage *wal_usage;
- int nindexes;
- char *sharedquery;
- LVRelState vacrel;
- ErrorContextCallback errcallback;
-
- /*
- * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
- * don't support parallel vacuum for autovacuum as of now.
- */
- Assert(MyProc->statusFlags == PROC_IN_VACUUM);
-
- lvshared = (LVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED,
- false);
- elevel = lvshared->elevel;
-
- elog(DEBUG1, "starting parallel vacuum worker");
-
- /* Set debug_query_string for individual workers */
- sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
- debug_query_string = sharedquery;
- pgstat_report_activity(STATE_RUNNING, debug_query_string);
-
- /*
- * Open table. The lock mode is the same as the leader process. It's
- * okay because the lock mode does not conflict among the parallel
- * workers.
- */
- rel = table_open(lvshared->relid, ShareUpdateExclusiveLock);
-
- /*
- * Open all indexes. indrels are sorted in order by OID, which should be
- * matched to the leader's one.
- */
- vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
- Assert(nindexes > 0);
-
- /* Set index statistics */
- lvpindstats = (LVParallelIndStats *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_INDEX_STATS,
- false);
-
- /* Set dead_items space (set as worker's vacrel dead_items below) */
- dead_items = (VacDeadItems *) shm_toc_lookup(toc,
- PARALLEL_VACUUM_KEY_DEAD_ITEMS,
- false);
-
- /* Set cost-based vacuum delay */
- VacuumCostActive = (VacuumCostDelay > 0);
- VacuumCostBalance = 0;
- VacuumPageHit = 0;
- VacuumPageMiss = 0;
- VacuumPageDirty = 0;
- VacuumCostBalanceLocal = 0;
- VacuumSharedCostBalance = &(lvshared->cost_balance);
- VacuumActiveNWorkers = &(lvshared->active_nworkers);
-
- vacrel.rel = rel;
- vacrel.indrels = indrels;
- vacrel.nindexes = nindexes;
- /* Each parallel VACUUM worker gets its own access strategy */
- vacrel.bstrategy = GetAccessStrategy(BAS_VACUUM);
- vacrel.indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
-
- if (lvshared->maintenance_work_mem_worker > 0)
- maintenance_work_mem = lvshared->maintenance_work_mem_worker;
-
- /*
- * Initialize vacrel for use as error callback arg by parallel worker.
- */
- vacrel.relnamespace = get_namespace_name(RelationGetNamespace(rel));
- vacrel.relname = pstrdup(RelationGetRelationName(rel));
- vacrel.indname = NULL;
- vacrel.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
- vacrel.dead_items = dead_items;
-
- /* Setup error traceback support for ereport() */
- errcallback.callback = vacuum_error_callback;
- errcallback.arg = &vacrel;
- errcallback.previous = error_context_stack;
- error_context_stack = &errcallback;
-
- /* Prepare to track buffer usage during parallel execution */
- InstrStartParallelQuery();
-
- /* Process indexes to perform vacuum/cleanup */
- parallel_vacuum_process_safe_indexes(&vacrel, lvshared, lvpindstats);
-
- /* Report buffer/WAL usage during parallel execution */
- buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
- wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
- InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
- &wal_usage[ParallelWorkerNumber]);
-
- /* Pop the error context stack */
- error_context_stack = errcallback.previous;
-
- vac_close_indexes(nindexes, indrels, RowExclusiveLock);
- table_close(rel, ShareUpdateExclusiveLock);
- FreeAccessStrategy(vacrel.bstrategy);
- pfree(vacrel.indstats);
-}
-
-/*
- * Error context callback for errors occurring during vacuum.
+ * Error context callback for errors occurring during vacuum. The error
+ * context messages for index phases should match the messages set in parallel
+ * vacuum. If you change this function for those phases, change
+ * parallel_vacuum_error_callback() as well.
*/
static void
vacuum_error_callback(void *arg)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index bb1881f..ae7c713 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -14,7 +14,6 @@
#include "postgres.h"
-#include "access/heapam.h"
#include "access/nbtree.h"
#include "access/parallel.h"
#include "access/session.h"
@@ -25,6 +24,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/vacuum.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index e8504f0..48f7348 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -59,6 +59,7 @@ OBJS = \
typecmds.o \
user.o \
vacuum.o \
+ vacuumparallel.o \
variable.o \
view.o
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 3b481bc..c94c187 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -7,8 +7,9 @@
* commands, (b) code to compute various vacuum thresholds, and (c) index
* vacuum code.
*
- * VACUUM for heap AM is implemented in vacuumlazy.c, ANALYZE in analyze.c, and
- * VACUUM FULL is a variant of CLUSTER, handled in cluster.c.
+ * VACUUM for heap AM is implemented in vacuumlazy.c, parallel vacuum in
+ * vacuumparallel.c, ANALYZE in analyze.c, and VACUUM FULL is a variant of
+ * CLUSTER, handled in cluster.c.
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c
new file mode 100644
index 0000000..5dd70c5
--- /dev/null
+++ b/src/backend/commands/vacuumparallel.c
@@ -0,0 +1,1068 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuumparallel.c
+ * Support routines for parallel vacuum execution.
+ *
+ * This file contains routines that are intended to support setting up, using,
+ * and tearing down a ParallelVacuumState.
+ *
+ * In a parallel vacuum, we perform both index bulk deletion and index cleanup
+ * with parallel worker processes. Individual indexes are processed by one
+ * vacuum process. ParalleVacuumState contains shared information as well as
+ * the memory space for storing dead items allocated in the DSM segment. We
+ * launch parallel worker processes at the start of parallel index
+ * bulk-deletion and index cleanup and once all indexes are processed, the
+ * parallel worker processes exit. Each time we process indexes parallelly,
+ * the parallel context is re-initialized so that the same DSM can be used for
+ * multiple passes of index bulk-deletion and index cleanup.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/backend/commands/vacuumparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/amapi.h"
+#include "access/table.h"
+#include "catalog/index.h"
+#include "commands/vacuum.h"
+#include "optimizer/paths.h"
+#include "pgstat.h"
+#include "storage/bufmgr.h"
+#include "tcop/tcopprot.h"
+#include "utils/lsyscache.h"
+
+/*
+ * DSM keys for parallel vacuum. Unlike other parallel execution code, since
+ * we don't need to worry about DSM keys conflicting with plan_node_id we can
+ * use small integers.
+ */
+#define PARALLEL_VACUUM_KEY_SHARED 1
+#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT 3
+#define PARALLEL_VACUUM_KEY_BUFFER_USAGE 4
+#define PARALLEL_VACUUM_KEY_WAL_USAGE 5
+#define PARALLEL_VACUUM_KEY_INDEX_STATS 6
+
+/*
+ * Shared information among parallel workers. So this is allocated in the DSM
+ * segment.
+ */
+typedef struct PVShared
+{
+ /*
+ * Target table relid and log level. These fields are not modified during
+ * the parallel vacuum.
+ */
+ Oid relid;
+ int elevel;
+
+ /*
+ * Fields for both index vacuum and cleanup.
+ *
+ * reltuples is the total number of input heap tuples. We set either old
+ * live tuples in the index vacuum case or the new live tuples in the
+ * index cleanup case.
+ *
+ * estimated_count is true if reltuples is an estimated value. (Note that
+ * reltuples could be -1 in this case, indicating we have no idea.)
+ */
+ double reltuples;
+ bool estimated_count;
+
+ /*
+ * In single process vacuum we could consume more memory during index
+ * vacuuming or cleanup apart from the memory for heap scanning. In
+ * parallel vacuum, since individual vacuum workers can consume memory
+ * equal to maintenance_work_mem, the new maintenance_work_mem for each
+ * worker is set such that the parallel operation doesn't consume more
+ * memory than single process vacuum.
+ */
+ int maintenance_work_mem_worker;
+
+ /*
+ * Shared vacuum cost balance. During parallel vacuum,
+ * VacuumSharedCostBalance points to this value and it accumulates the
+ * balance of each parallel vacuum worker.
+ */
+ pg_atomic_uint32 cost_balance;
+
+ /*
+ * Number of active parallel workers. This is used for computing the
+ * minimum threshold of the vacuum cost balance before a worker sleeps for
+ * cost-based delay.
+ */
+ pg_atomic_uint32 active_nworkers;
+
+ /* Counter for vacuuming and cleanup */
+ pg_atomic_uint32 idx;
+} PVShared;
+
+/* Status used during parallel index vacuum or cleanup */
+typedef enum PVIndVacStatus
+{
+ PARALLEL_INDVAC_STATUS_INITIAL = 0,
+ PARALLEL_INDVAC_STATUS_NEED_BULKDELETE,
+ PARALLEL_INDVAC_STATUS_NEED_CLEANUP,
+ PARALLEL_INDVAC_STATUS_COMPLETED
+} PVIndVacStatus;
+
+/*
+ * Struct for index vacuum statistics of an index that is used for parallel vacuum.
+ * This includes the status of parallel index vacuum as well as index statistics.
+ */
+typedef struct PVIndStats
+{
+ /*
+ * The following two fields are set by leader process before executing
+ * parallel index vacuum or parallel index cleanup. These fields are not
+ * fixed for the entire VACUUM operation. They are only fixed for an
+ * individual parallel index vacuum and cleanup.
+ *
+ * parallel_workers_can_process is true if both leader and worker can
+ * process the index, otherwise only leader can process it.
+ */
+ PVIndVacStatus status;
+ bool parallel_workers_can_process;
+
+ /*
+ * Individual worker or leader stores the result of index vacuum or
+ * cleanup.
+ */
+ bool istat_updated; /* are the stats updated? */
+ IndexBulkDeleteResult istat;
+} PVIndStats;
+
+/* Struct for maintaining a parallel vacuum state. */
+typedef struct ParallelVacuumState
+{
+ /* NULL for worker processes */
+ ParallelContext *pcxt;
+
+ /* Target indexes */
+ Relation *indrels;
+ int nindexes;
+
+ /* Shared information among parallel vacuum workers */
+ PVShared *shared;
+
+ /*
+ * Shared index statistics among parallel vacuum workers. The array
+ * element is allocated for every index, even those indexes where parallel
+ * index vacuuming is unsafe or not worthwhile (e.g.,
+ * will_parallel_vacuum[] is false). During parallel vacuum,
+ * IndexBulkDeleteResult of each index is kept in DSM and is copied into
+ * local memory at the end of parallel vacuum.
+ */
+ PVIndStats *indstats;
+
+ /* Shared dead items space among parallel vacuum workers */
+ VacDeadItems *dead_items;
+
+ /* Points to buffer usage area in DSM */
+ BufferUsage *buffer_usage;
+
+ /* Points to WAL usage area in DSM */
+ WalUsage *wal_usage;
+
+ /*
+ * False if the index is totally unsuitable target for all parallel
+ * processing. For example, the index could be <
+ * min_parallel_index_scan_size cutoff.
+ */
+ bool *will_parallel_vacuum;
+
+ /*
+ * The number of indexes that support parallel index bulk-deletion and
+ * parallel index cleanup respectively.
+ */
+ int nindexes_parallel_bulkdel;
+ int nindexes_parallel_cleanup;
+ int nindexes_parallel_condcleanup;
+
+ /* Buffer access strategy used by leader process */
+ BufferAccessStrategy bstrategy;
+
+ /*
+ * Error reporting state. The error callback is set only for workers
+ * processes during parallel index vacuum.
+ */
+ char *relnamespace;
+ char *relname;
+ char *indname;
+ PVIndVacStatus status;
+} ParallelVacuumState;
+
+static int parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum);
+static void parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
+ bool vacuum);
+static void parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs);
+static void parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats);
+static bool parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
+ bool vacuum);
+static void parallel_vacuum_error_callback(void *arg);
+
+/*
+ * Try to enter parallel mode and create a parallel context. Then initialize
+ * shared memory state.
+ *
+ * On success, return parallel vacuum state. Otherwise return NULL.
+ */
+ParallelVacuumState *
+parallel_vacuum_init(Relation rel, Relation *indrels, int nindexes,
+ int nrequested_workers, int max_items,
+ int elevel, BufferAccessStrategy bstrategy)
+{
+ ParallelVacuumState *pvs;
+ ParallelContext *pcxt;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ PVIndStats *indstats;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ bool *will_parallel_vacuum;
+ Size est_indstats_len;
+ Size est_shared_len;
+ Size est_dead_items_len;
+ int nindexes_mwm = 0;
+ int parallel_workers = 0;
+ int querylen;
+
+ /*
+ * A parallel vacuum must be requested and there must be indexes on the
+ * relation
+ */
+ Assert(nrequested_workers >= 0);
+ Assert(nindexes > 0);
+
+ /*
+ * Compute the number of parallel vacuum workers to launch
+ */
+ will_parallel_vacuum = (bool *) palloc0(sizeof(bool) * nindexes);
+ parallel_workers = parallel_vacuum_compute_workers(indrels, nindexes,
+ nrequested_workers,
+ will_parallel_vacuum);
+ if (parallel_workers <= 0)
+ {
+ /* Can't perform vacuum in parallel -- return NULL */
+ pfree(will_parallel_vacuum);
+ return NULL;
+ }
+
+ pvs = (ParallelVacuumState *) palloc0(sizeof(ParallelVacuumState));
+ pvs->indrels = indrels;
+ pvs->nindexes = nindexes;
+ pvs->will_parallel_vacuum = will_parallel_vacuum;
+ pvs->bstrategy = bstrategy;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "parallel_vacuum_main",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+ pvs->pcxt = pcxt;
+
+ /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_INDEX_STATS */
+ est_indstats_len = mul_size(sizeof(PVIndStats), nindexes);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_indstats_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = sizeof(PVShared);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate size for dead_items -- PARALLEL_VACUUM_KEY_DEAD_ITEMS */
+ est_dead_items_len = vac_max_items_to_alloc_size(max_items);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_dead_items_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate space for BufferUsage and WalUsage --
+ * PARALLEL_VACUUM_KEY_BUFFER_USAGE and PARALLEL_VACUUM_KEY_WAL_USAGE.
+ *
+ * If there are no extensions loaded that care, we could skip this. We
+ * have no way of knowing whether anyone's looking at pgBufferUsage or
+ * pgWalUsage, so do it unconditionally.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Finally, estimate PARALLEL_VACUUM_KEY_QUERY_TEXT space */
+ if (debug_query_string)
+ {
+ querylen = strlen(debug_query_string);
+ shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+ else
+ querylen = 0; /* keep compiler quiet */
+
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare index vacuum stats */
+ indstats = (PVIndStats *) shm_toc_allocate(pcxt->toc, est_indstats_len);
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /*
+ * Cleanup option should be either disabled, always performing in
+ * parallel or conditionally performing in parallel.
+ */
+ Assert(((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0));
+ Assert(vacoptions <= VACUUM_OPTION_MAX_VALID_VALUE);
+
+ if (!will_parallel_vacuum[i])
+ continue;
+
+ if (indrel->rd_indam->amusemaintenanceworkmem)
+ nindexes_mwm++;
+
+ /*
+ * Remember the number of indexes that support parallel operation for
+ * each phase.
+ */
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ pvs->nindexes_parallel_bulkdel++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0)
+ pvs->nindexes_parallel_cleanup++;
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0)
+ pvs->nindexes_parallel_condcleanup++;
+ }
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_INDEX_STATS, indstats);
+ pvs->indstats = indstats;
+
+ /* Prepare shared information */
+ shared = (PVShared *) shm_toc_allocate(pcxt->toc, est_shared_len);
+ MemSet(shared, 0, est_shared_len);
+ shared->relid = RelationGetRelid(rel);
+ shared->elevel = elevel;
+ shared->maintenance_work_mem_worker =
+ (nindexes_mwm > 0) ?
+ maintenance_work_mem / Min(parallel_workers, nindexes_mwm) :
+ maintenance_work_mem;
+
+ pg_atomic_init_u32(&(shared->cost_balance), 0);
+ pg_atomic_init_u32(&(shared->active_nworkers), 0);
+ pg_atomic_init_u32(&(shared->idx), 0);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_SHARED, shared);
+ pvs->shared = shared;
+
+ /* Prepare the dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
+ est_dead_items_len);
+ dead_items->max_items = max_items;
+ dead_items->num_items = 0;
+ MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
+ pvs->dead_items = dead_items;
+
+ /*
+ * Allocate space for each worker's BufferUsage and WalUsage; no need to
+ * initialize
+ */
+ buffer_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, buffer_usage);
+ pvs->buffer_usage = buffer_usage;
+ wal_usage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_WAL_USAGE, wal_usage);
+ pvs->wal_usage = wal_usage;
+
+ /* Store query string for workers */
+ if (debug_query_string)
+ {
+ char *sharedquery;
+
+ sharedquery = (char *) shm_toc_allocate(pcxt->toc, querylen + 1);
+ memcpy(sharedquery, debug_query_string, querylen + 1);
+ sharedquery[querylen] = '\0';
+ shm_toc_insert(pcxt->toc,
+ PARALLEL_VACUUM_KEY_QUERY_TEXT, sharedquery);
+ }
+
+ /* Success -- return parallel vacuum state */
+ return pvs;
+}
+
+/*
+ * Destroy the parallel context, and end parallel mode.
+ *
+ * Since writes are not allowed during parallel mode, copy the
+ * updated index statistics from DSM into local memory and then later use that
+ * to update the index statistics. One might think that we can exit from
+ * parallel mode, update the index statistics and then destroy parallel
+ * context, but that won't be safe (see ExitParallelMode).
+ */
+void
+parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats)
+{
+ Assert(!IsParallelWorker());
+
+ /* Copy the updated statistics */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->istat_updated)
+ {
+ istats[i] = (IndexBulkDeleteResult *) palloc0(sizeof(IndexBulkDeleteResult));
+ memcpy(istats[i], &indstats->istat, sizeof(IndexBulkDeleteResult));
+ }
+ else
+ istats[i] = NULL;
+ }
+
+ DestroyParallelContext(pvs->pcxt);
+ ExitParallelMode();
+
+ pfree(pvs->will_parallel_vacuum);
+ pfree(pvs);
+}
+
+/* Returns the dead items space */
+VacDeadItems *
+parallel_vacuum_get_dead_items(ParallelVacuumState *pvs)
+{
+ return pvs->dead_items;
+}
+
+/*
+ * Do parallel index bulk-deletion with parallel workers.
+ */
+void
+parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ int num_index_scans)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can only provide an approximate value of num_heap_tuples, at least
+ * for now.
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = true;
+
+ parallel_vacuum_process_all_indexes(pvs, num_index_scans, true);
+}
+
+/*
+ * Do parallel index cleanup with parallel workers.
+ */
+void
+parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs, long num_table_tuples,
+ int num_index_scans, bool estimated_count)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * We can provide a better estimate of total number of surviving tuples
+ * (we assume indexes are more interested in that than in the number of
+ * nominally live tuples).
+ */
+ pvs->shared->reltuples = num_table_tuples;
+ pvs->shared->estimated_count = estimated_count;
+
+ parallel_vacuum_process_all_indexes(pvs, num_index_scans, false);
+}
+
+/*
+ * Compute the number of parallel worker processes to request. Both index
+ * vacuum and index cleanup can be executed with parallel workers.
+ * The index is eligible for parallel vacuum iff its size is greater than
+ * min_parallel_index_scan_size as invoking workers for very small indexes
+ * can hurt performance.
+ *
+ * nrequested is the number of parallel workers that user requested. If
+ * nrequested is 0, we compute the parallel degree based on nindexes, that is
+ * the number of indexes that support parallel vacuum. This function also
+ * sets will_parallel_vacuum to remember indexes that participate in parallel
+ * vacuum.
+ */
+static int
+parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int nrequested,
+ bool *will_parallel_vacuum)
+{
+ int nindexes_parallel = 0;
+ int nindexes_parallel_bulkdel = 0;
+ int nindexes_parallel_cleanup = 0;
+ int parallel_workers;
+
+ /*
+ * We don't allow performing parallel operation in standalone backend or
+ * when parallelism is disabled.
+ */
+ if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
+ return 0;
+
+ /*
+ * Compute the number of indexes that can participate in parallel vacuum.
+ */
+ for (int i = 0; i < nindexes; i++)
+ {
+ Relation indrel = indrels[i];
+ uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* Skip index that is not a suitable target for parallel index vacuum */
+ if (vacoptions == VACUUM_OPTION_NO_PARALLEL ||
+ RelationGetNumberOfBlocks(indrel) < min_parallel_index_scan_size)
+ continue;
+
+ will_parallel_vacuum[i] = true;
+
+ if ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0)
+ nindexes_parallel_bulkdel++;
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) != 0) ||
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ nindexes_parallel_cleanup++;
+ }
+
+ nindexes_parallel = Max(nindexes_parallel_bulkdel,
+ nindexes_parallel_cleanup);
+
+ /* The leader process takes one index */
+ nindexes_parallel--;
+
+ /* No index supports parallel vacuum */
+ if (nindexes_parallel <= 0)
+ return 0;
+
+ /* Compute the parallel degree */
+ parallel_workers = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;
+
+ /* Cap by max_parallel_maintenance_workers */
+ parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+
+ return parallel_workers;
+}
+
+/*
+ * Perform index vacuum or index cleanup with parallel workers. This function
+ * must be used by the parallel vacuum leader process.
+ */
+static void
+parallel_vacuum_process_all_indexes(ParallelVacuumState *pvs, int num_index_scans,
+ bool vacuum)
+{
+ int nworkers;
+ PVIndVacStatus new_status;
+
+ Assert(!IsParallelWorker());
+
+ if (vacuum)
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_BULKDELETE;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_bulkdel;
+ }
+ else
+ {
+ new_status = PARALLEL_INDVAC_STATUS_NEED_CLEANUP;
+
+ /* Determine the number of parallel workers to launch */
+ nworkers = pvs->nindexes_parallel_cleanup;
+
+ /* Add conditionally parallel-aware indexes if in the first time call */
+ if (num_index_scans == 0)
+ nworkers += pvs->nindexes_parallel_condcleanup;
+ }
+
+ /* The leader process will participate */
+ nworkers--;
+
+ /*
+ * It is possible that parallel context is initialized with fewer workers
+ * than the number of indexes that need a separate worker in the current
+ * phase, so we need to consider it. See
+ * parallel_vacuum_compute_workers().
+ */
+ nworkers = Min(nworkers, pvs->pcxt->nworkers);
+
+ /*
+ * Set index vacuum status and mark whether parallel vacuum worker can
+ * process it.
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ Assert(indstats->status == PARALLEL_INDVAC_STATUS_INITIAL);
+ indstats->status = new_status;
+ indstats->parallel_workers_can_process =
+ (pvs->will_parallel_vacuum[i] &
+ parallel_vacuum_index_is_parallel_safe(pvs->indrels[i],
+ num_index_scans,
+ vacuum));
+ }
+
+ /* Reset the parallel index processing counter */
+ pg_atomic_write_u32(&(pvs->shared->idx), 0);
+
+ /* Setup the shared cost-based vacuum delay and launch workers */
+ if (nworkers > 0)
+ {
+ /* Reinitialize parallel context to relaunch parallel workers */
+ if (num_index_scans > 0)
+ ReinitializeParallelDSM(pvs->pcxt);
+
+ /*
+ * Set up shared cost balance and the number of active workers for
+ * vacuum delay. We need to do this before launching workers as
+ * otherwise, they might not see the updated values for these
+ * parameters.
+ */
+ pg_atomic_write_u32(&(pvs->shared->cost_balance), VacuumCostBalance);
+ pg_atomic_write_u32(&(pvs->shared->active_nworkers), 0);
+
+ /*
+ * The number of workers can vary between bulkdelete and cleanup
+ * phase.
+ */
+ ReinitializeParallelWorkers(pvs->pcxt, nworkers);
+
+ LaunchParallelWorkers(pvs->pcxt);
+
+ if (pvs->pcxt->nworkers_launched > 0)
+ {
+ /*
+ * Reset the local cost values for leader backend as we have
+ * already accumulated the remaining balance of heap.
+ */
+ VacuumCostBalance = 0;
+ VacuumCostBalanceLocal = 0;
+
+ /* Enable shared cost balance for leader backend */
+ VacuumSharedCostBalance = &(pvs->shared->cost_balance);
+ VacuumActiveNWorkers = &(pvs->shared->active_nworkers);
+ }
+
+ if (vacuum)
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index vacuuming (planned: %d)",
+ "launched %d parallel vacuum workers for index vacuuming (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ else
+ ereport(pvs->shared->elevel,
+ (errmsg(ngettext("launched %d parallel vacuum worker for index cleanup (planned: %d)",
+ "launched %d parallel vacuum workers for index cleanup (planned: %d)",
+ pvs->pcxt->nworkers_launched),
+ pvs->pcxt->nworkers_launched, nworkers)));
+ }
+
+ /* Vacuum the indexes that can be processed by only leader process */
+ parallel_vacuum_process_unsafe_indexes(pvs);
+
+ /*
+ * Join as a parallel worker. The leader vacuums alone processes all
+ * parallel-safe indexes in the case where no workers are launched.
+ */
+ parallel_vacuum_process_safe_indexes(pvs);
+
+ /*
+ * Next, accumulate buffer and WAL usage. (This must wait for the workers
+ * to finish, or we might get incomplete data.)
+ */
+ if (nworkers > 0)
+ {
+ /* Wait for all vacuum workers to finish */
+ WaitForParallelWorkersToFinish(pvs->pcxt);
+
+ for (int i = 0; i < pvs->pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&pvs->buffer_usage[i], &pvs->wal_usage[i]);
+ }
+
+ /*
+ * Reset all index status back to initial (while checking that we have
+ * vacuumed all indexes).
+ */
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ if (indstats->status != PARALLEL_INDVAC_STATUS_COMPLETED)
+ elog(ERROR, "parallel index vacuum on index \"%s\" is not completed",
+ RelationGetRelationName(pvs->indrels[i]));
+
+ indstats->status = PARALLEL_INDVAC_STATUS_INITIAL;
+ }
+
+ /*
+ * Carry the shared balance value to heap scan and disable shared costing
+ */
+ if (VacuumSharedCostBalance)
+ {
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }
+}
+
+/*
+ * Index vacuum/cleanup routine used by the leader process and parallel
+ * vacuum worker processes to vacuum the indexes in parallel.
+ */
+static void
+parallel_vacuum_process_safe_indexes(ParallelVacuumState *pvs)
+{
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+ PVIndStats *indstats;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(pvs->shared->idx), 1);
+
+ /* Done for all indexes? */
+ if (idx >= pvs->nindexes)
+ break;
+
+ indstats = &(pvs->indstats[idx]);
+
+ /*
+ * Skip vacuuming index that is unsafe for workers or has an
+ * unsuitable target for parallel index vacuum (this is vacuumed in
+ * parallel_vacuum_process_unsafe_indexes() by the leader).
+ */
+ if (!indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[idx], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Perform parallel vacuuming of indexes in leader process.
+ *
+ * Handles index vacuuming (or index cleanup) for indexes that are not
+ * parallel safe. It's possible that this will vary for a given index, based
+ * on details like whether we're performing index cleanup right now.
+ *
+ * Also performs vacuuming of smaller indexes that fell under the size cutoff
+ * enforced by parallel_vacuum_compute_workers().
+ */
+static void
+parallel_vacuum_process_unsafe_indexes(ParallelVacuumState *pvs)
+{
+ Assert(!IsParallelWorker());
+
+ /*
+ * Increment the active worker count if we are able to launch any worker.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_add_fetch_u32(VacuumActiveNWorkers, 1);
+
+ for (int i = 0; i < pvs->nindexes; i++)
+ {
+ PVIndStats *indstats = &(pvs->indstats[i]);
+
+ /* Skip, indexes that are safe for workers */
+ if (indstats->parallel_workers_can_process)
+ continue;
+
+ /* Do vacuum or cleanup of the index */
+ parallel_vacuum_process_one_index(pvs, pvs->indrels[i], indstats);
+ }
+
+ /*
+ * We have completed the index vacuum so decrement the active worker
+ * count.
+ */
+ if (VacuumActiveNWorkers)
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+/*
+ * Vacuum or cleanup index either by leader process or by one of the worker
+ * process. After vacuuming the index this function copies the index
+ * statistics returned from ambulkdelete and amvacuumcleanup to the DSM
+ * segment.
+ */
+static void
+parallel_vacuum_process_one_index(ParallelVacuumState *pvs, Relation indrel,
+ PVIndStats *indstats)
+{
+ IndexBulkDeleteResult *istat = NULL;
+ IndexBulkDeleteResult *istat_res;
+ IndexVacuumInfo ivinfo;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result if someone
+ * has already updated it
+ */
+ if (indstats->istat_updated)
+ istat = &(indstats->istat);
+
+ ivinfo.index = indrel;
+ ivinfo.analyze_only = false;
+ ivinfo.report_progress = false;
+ ivinfo.message_level = pvs->shared->elevel;
+ ivinfo.estimated_count = pvs->shared->estimated_count;
+ ivinfo.num_heap_tuples = pvs->shared->reltuples;
+ ivinfo.strategy = pvs->bstrategy;
+
+ /* Update error traceback information */
+ pvs->indname = pstrdup(RelationGetRelationName(indrel));
+ pvs->status = indstats->status;
+
+ switch (indstats->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ istat_res = vac_cleanup_one_index(&ivinfo, istat);
+ break;
+ default:
+ elog(ERROR, "unexpected parallel vacuum index status %d for index \"%s\"",
+ indstats->status,
+ RelationGetRelationName(indrel));
+ }
+
+ /*
+ * Copy the index bulk-deletion result returned from ambulkdelete and
+ * amvacuumcleanup to the DSM segment if it's the first cycle because they
+ * allocate locally and it's possible that an index will be vacuumed by a
+ * different vacuum process the next cycle. Copying the result normally
+ * happens only the first time an index is vacuumed. For any additional
+ * vacuum pass, we directly point to the result on the DSM segment and
+ * pass it to vacuum index APIs so that workers can update it directly.
+ *
+ * Since all vacuum workers write the bulk-deletion result at different
+ * slots we can write them without locking.
+ */
+ if (!indstats->istat_updated && istat_res != NULL)
+ {
+ memcpy(&(indstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ indstats->istat_updated = true;
+
+ /* Free the locally-allocated bulk-deletion result */
+ pfree(istat_res);
+ }
+
+ /*
+ * Update the status to completed. No need to lock here since each worker
+ * touches different indexes.
+ */
+ indstats->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+
+ /* Reset error traceback information */
+ pvs->status = PARALLEL_INDVAC_STATUS_COMPLETED;
+ pfree(pvs->indname);
+ pvs->indname = NULL;
+}
+
+/*
+ * Returns false, if the given index can't participate in the next execution of
+ * parallel index vacuum or parallel index cleanup.
+ */
+static bool
+parallel_vacuum_index_is_parallel_safe(Relation indrel, int num_index_scans,
+ bool vacuum)
+{
+ uint8 vacoptions;
+
+ vacoptions = indrel->rd_indam->amparallelvacuumoptions;
+
+ /* In parallel vacuum case, check if it supports parallel bulk-deletion */
+ if (vacuum)
+ return ((vacoptions & VACUUM_OPTION_PARALLEL_BULKDEL) != 0);
+
+ /* Not safe, if the index does not support parallel cleanup */
+ if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
+ return false;
+
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally, but we
+ * have already processed the index (for bulkdelete). We do this to avoid
+ * the need to invoke workers when parallel index cleanup doesn't need to
+ * scan the index. See the comments for option
+ * VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes support
+ * parallel cleanup conditionally.
+ */
+ if (num_index_scans > 0 &&
+ ((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) != 0))
+ return false;
+
+ return true;
+}
+
+/*
+ * Perform work within a launched parallel process.
+ *
+ * Since parallel vacuum workers perform only index vacuum or index cleanup,
+ * we don't need to report progress information.
+ */
+void
+parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelVacuumState pvs;
+ Relation rel;
+ Relation *indrels;
+ PVIndStats *indstats;
+ PVShared *shared;
+ VacDeadItems *dead_items;
+ BufferUsage *buffer_usage;
+ WalUsage *wal_usage;
+ int nindexes;
+ char *sharedquery;
+ ErrorContextCallback errcallback;
+
+ /*
+ * A parallel vacuum worker must have only PROC_IN_VACUUM flag since we
+ * don't support parallel vacuum for autovacuum as of now.
+ */
+ Assert(MyProc->statusFlags == PROC_IN_VACUUM);
+
+ elog(DEBUG1, "starting parallel vacuum worker");
+
+ shared = (PVShared *) shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_SHARED, false);
+
+ /* Set debug_query_string for individual workers */
+ sharedquery = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_QUERY_TEXT, true);
+ debug_query_string = sharedquery;
+ pgstat_report_activity(STATE_RUNNING, debug_query_string);
+
+ /*
+ * Open table. The lock mode is the same as the leader process. It's
+ * okay because the lock mode does not conflict among the parallel
+ * workers.
+ */
+ rel = table_open(shared->relid, ShareUpdateExclusiveLock);
+
+ /*
+ * Open all indexes. indrels are sorted in order by OID, which should be
+ * matched to the leader's one.
+ */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &indrels);
+ Assert(nindexes > 0);
+
+ if (shared->maintenance_work_mem_worker > 0)
+ maintenance_work_mem = shared->maintenance_work_mem_worker;
+
+ /* Set index statistics */
+ indstats = (PVIndStats *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_INDEX_STATS,
+ false);
+
+ /* Set dead_items space */
+ dead_items = (VacDeadItems *) shm_toc_lookup(toc,
+ PARALLEL_VACUUM_KEY_DEAD_ITEMS,
+ false);
+
+ /* Set cost-based vacuum delay */
+ VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumCostBalance = 0;
+ VacuumPageHit = 0;
+ VacuumPageMiss = 0;
+ VacuumPageDirty = 0;
+ VacuumCostBalanceLocal = 0;
+ VacuumSharedCostBalance = &(shared->cost_balance);
+ VacuumActiveNWorkers = &(shared->active_nworkers);
+
+ /* Set parallel vacuum state */
+ pvs.indrels = indrels;
+ pvs.nindexes = nindexes;
+ pvs.indstats = indstats;
+ pvs.shared = shared;
+ pvs.dead_items = dead_items;
+ pvs.relnamespace = get_namespace_name(RelationGetNamespace(rel));
+ pvs.relname = pstrdup(RelationGetRelationName(rel));
+
+ /* These fields will be filled during index vacuum or cleanup */
+ pvs.indname = NULL;
+ pvs.status = PARALLEL_INDVAC_STATUS_INITIAL;
+
+ /* Each parallel VACUUM worker gets its own access strategy */
+ pvs.bstrategy = GetAccessStrategy(BAS_VACUUM);
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = parallel_vacuum_error_callback;
+ errcallback.arg = &pvs;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+
+ /* Report buffer/WAL usage during parallel execution */
+ buffer_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_BUFFER_USAGE, false);
+ wal_usage = shm_toc_lookup(toc, PARALLEL_VACUUM_KEY_WAL_USAGE, false);
+ InstrEndParallelQuery(&buffer_usage[ParallelWorkerNumber],
+ &wal_usage[ParallelWorkerNumber]);
+
+ /* Pop the error context stack */
+ error_context_stack = errcallback.previous;
+
+ vac_close_indexes(nindexes, indrels, RowExclusiveLock);
+ table_close(rel, ShareUpdateExclusiveLock);
+ FreeAccessStrategy(pvs.bstrategy);
+}
+
+/*
+ * Error context callback for errors occurring during parallel index vacuum.
+ * The error context messages should match the messages set in the lazy vacuum
+ * error context. If you change this function, change vacuum_error_callback()
+ * as well.
+ */
+static void
+parallel_vacuum_error_callback(void *arg)
+{
+ ParallelVacuumState *errinfo = arg;
+
+ switch (errinfo->status)
+ {
+ case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE:
+ errcontext("while vacuuming index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_NEED_CLEANUP:
+ errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",
+ errinfo->indname,
+ errinfo->relnamespace,
+ errinfo->relname);
+ break;
+ case PARALLEL_INDVAC_STATUS_INITIAL:
+ case PARALLEL_INDVAC_STATUS_COMPLETED:
+ default:
+ return;
+ }
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 417dd28..f3fb1e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -198,7 +198,6 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
struct VacuumParams;
extern void heap_vacuum_rel(Relation rel,
struct VacuumParams *params, BufferAccessStrategy bstrategy);
-extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple stup, Snapshot snapshot,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 97bffa8..5a36049 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -16,6 +16,7 @@
#include "access/htup.h"
#include "access/genam.h"
+#include "access/parallel.h"
#include "catalog/pg_class.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
@@ -63,6 +64,9 @@
/* value for checking vacuum flags */
#define VACUUM_OPTION_MAX_VALID_VALUE ((1 << 3) - 1)
+/* Abstract type for parallel vacuum state */
+typedef struct ParallelVacuumState ParallelVacuumState;
+
/*----------
* ANALYZE builds one of these structs for each attribute (column) that is
* to be analyzed. The struct and subsidiary data are in anl_context,
@@ -305,6 +309,22 @@ extern IndexBulkDeleteResult *vac_cleanup_one_index(IndexVacuumInfo *ivinfo,
IndexBulkDeleteResult *istat);
extern Size vac_max_items_to_alloc_size(int max_items);
+/* in commands/vacuumparallel.c */
+extern ParallelVacuumState *parallel_vacuum_init(Relation rel, Relation *indrels,
+ int nindexes, int nrequested_workers,
+ int max_items, int elevel,
+ BufferAccessStrategy bstrategy);
+extern void parallel_vacuum_end(ParallelVacuumState *pvs, IndexBulkDeleteResult **istats);
+extern VacDeadItems *parallel_vacuum_get_dead_items(ParallelVacuumState *pvs);
+extern void parallel_vacuum_bulkdel_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ int num_index_scans);
+extern void parallel_vacuum_cleanup_all_indexes(ParallelVacuumState *pvs,
+ long num_table_tuples,
+ int num_index_scans,
+ bool estimated_count);
+extern void parallel_vacuum_main(dsm_segment *seg, shm_toc *toc);
+
/* in commands/analyze.c */
extern void analyze_rel(Oid relid, RangeVar *relation,
VacuumParams *params, List *va_cols, bool in_outer_xact,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9863508..f093605 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1306,13 +1306,8 @@ LPWSTR
LSEG
LUID
LVPagePruneState
-LVParallelIndStats
-LVParallelIndVacStatus
-LVParallelState
LVRelState
LVSavedErrInfo
-LVShared
-LVSharedIndStats
LWLock
LWLockHandle
LWLockMode
@@ -1775,7 +1770,10 @@ PTIterationArray
PTOKEN_PRIVILEGES
PTOKEN_USER
PUTENVPROC
+PVIndStats
+PvIndVacStatus
PVOID
+PVShared
PX_Alias
PX_Cipher
PX_Combo
@@ -1809,6 +1807,7 @@ ParallelSlotResultHandler
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
+ParallelVacuumState
ParallelWorkerContext
ParallelWorkerInfo
Param
--
1.8.3.1
On Wed, Dec 22, 2021 9:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"Right, I'll take care of this as I am already making some other edits
in the patch.Fixed this and made a few other changes in the patch that includes (a) passed
down the num_index_scans information in parallel APIs based on which it can
make the decision whether to reinitialize DSM or consider conditional parallel
vacuum clean up; (b) get rid of first-time variable in ParallelVacuumState as that
is not required if we have num_index_scans information; (c) there seems to be
quite a few unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState in leader
backend; (e) changed/added comments at quite a few places.Can you please once verify the changes in the attached?
The changes look ok to me.
I tested the patch for multi-pass parallel vacuum cases and ran 'make check-world',
all the tests passed
Best regards,
Hou zj
On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"Right, I'll take care of this as I am already making some other edits
in the patch.Fixed this and made a few other changes in the patch that includes (a)
passed down the num_index_scans information in parallel APIs based on
which it can make the decision whether to reinitialize DSM or consider
conditional parallel vacuum clean up; (b) get rid of first-time
variable in ParallelVacuumState as that is not required if we have
num_index_scans information; (c) there seems to be quite a few
unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState
in leader backend; (e) changed/added comments at quite a few places.Can you please once verify the changes in the attached?
Thank you for updating the patch!
I agreed with these changes and it looks good to me.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"Right, I'll take care of this as I am already making some other edits
in the patch.Fixed this and made a few other changes in the patch that includes (a)
passed down the num_index_scans information in parallel APIs based on
which it can make the decision whether to reinitialize DSM or consider
conditional parallel vacuum clean up; (b) get rid of first-time
variable in ParallelVacuumState as that is not required if we have
num_index_scans information; (c) there seems to be quite a few
unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState
in leader backend; (e) changed/added comments at quite a few places.Can you please once verify the changes in the attached?
Thank you for updating the patch!
I agreed with these changes and it looks good to me.
Pushed. As per my knowledge, we have addressed the improvements
raised/discussed in this thread.
--
With Regards,
Amit Kapila.
On Fri, Dec 24, 2021 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 23, 2021 at 10:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Dec 22, 2021 at 10:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 6:22 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 22, 2021 at 5:39 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:2) +#include "utils/rel.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h"It might be better to keep the header file in alphabetical order. : +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h"Right, I'll take care of this as I am already making some other edits
in the patch.Fixed this and made a few other changes in the patch that includes (a)
passed down the num_index_scans information in parallel APIs based on
which it can make the decision whether to reinitialize DSM or consider
conditional parallel vacuum clean up; (b) get rid of first-time
variable in ParallelVacuumState as that is not required if we have
num_index_scans information; (c) there seems to be quite a few
unnecessary includes in vacuumparallel.c which I have removed; (d)
unnecessary error callback info was being set in ParallelVacuumState
in leader backend; (e) changed/added comments at quite a few places.Can you please once verify the changes in the attached?
Thank you for updating the patch!
I agreed with these changes and it looks good to me.
Pushed.
Thank you for committing the patch!
As per my knowledge, we have addressed the improvements
raised/discussed in this thread.
Yeah, I think so too.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/