parallel vacuum comments
Hi,
Due to bug #17245: [1]/messages/by-id/17245-ddf06aaf85735f36@postgresql.org I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:
- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.
- There should be error check verifying that all indexes have actually been
vacuumed. It's way too easy to have bugs leading to index vacuuming being
skipped.
- The state machine is complicated. It's very unobvious that an index needs to
be processed serially by the leader if shared_indstats == NULL.
- I'm very confused by the existance of LVShared->bitmap. Why is it worth
saving 7 bits per index for something like this (compared to a simple
array of bools)? Nor does the naming explain what it's for.
The presence of the bitmap requires stuff like SizeOfLVShared(), which
accounts for some of the bitmap size, but not all?
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.
Greetings,
Andres Freund
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.
Yes. We should have tests at least for such cases.
- There should be error check verifying that all indexes have actually been
vacuumed. It's way too easy to have bugs leading to index vacuuming being
skipped.
Agreed.
- The state machine is complicated. It's very unobvious that an index needs to
be processed serially by the leader if shared_indstats == NULL.
I think we can consolidate the logic that decides who (a worker or the
leader) processes the index in one function.
- I'm very confused by the existance of LVShared->bitmap. Why is it worth
saving 7 bits per index for something like this (compared to a simple
array of bools)? Nor does the naming explain what it's for.The presence of the bitmap requires stuff like SizeOfLVShared(), which
accounts for some of the bitmap size, but not all?
Yes, it's better to account for the size of all bitmaps.
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.
Right. But is better to change to use booleans?
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.
Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.
I listed the function names that probably needs to be renamed from
that perspecti:
* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_index
Is there any other function that should be renamed?
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.
I don't come with an idea to make them more generic. Could you
elaborate on that?
I've started to write a patch for these comments.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/e
On Mon, Nov 1, 2021 at 10:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
related code. And I found a few things that I think could stand improvement:- There's pretty much no tests. This is way way too complicated feature for
that. If there had been tests for the obvious edge case of some indexes
being too small to be handled in parallel, but others needing parallelism,
the mistake leading to #17245 would have been caught during development.Yes. We should have tests at least for such cases.
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
regression_tests_for_parallel_vacuum.patchapplication/octet-stream; name=regression_tests_for_parallel_vacuum.patchDownload+181-0
On Mon, Nov 1, 2021 at 5:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.
I started looking at this because I want to commit something like it
alongside a fix to bug #17245.
While I tend to favor relatively heavy assertions (e.g., the
heap_page_is_all_visible() related asserts I added to
lazy_scan_prune()), the idea of having a whole shared memory area just
for assertions seems a bit too much, even to me. I tried to simplify
it by just adding a new field to LVSharedIndStats, which seemed more
natural. It took me at least 15 minutes before I realized that I was
actually repeating exactly the same mistake that led to bug #17245 in
the first place. I somehow forgot that
parallel_stats_for_idx()/IndStatsIsNull() will return NULL for any
index that has already been deemed too small to be worth processing in
parallel. Even after all that drama!
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.
I think that this PARALLEL_VACUUM_STATS refactoring is actually the
simplest way to comprehensively test parallel VACUUM. I will still
need to add tests for my fix to bug #17245, but they won't be truly
general tests. I'll have to make sure that one of the assertions in
nbtdedup.c fails when the tests are run without the fix in place, or
something like that.
--
Peter Geoghegan
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Nov 1, 2021 at 5:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
For discussion, I've written a patch only for adding some tests to
parallel vacuum. The test includes the reported case where small
indexes are not processed by the leader process as well as cases where
different kinds of indexes (i.g., different amparallelvacuumoptions)
are vacuumed or cleaned up.I started looking at this because I want to commit something like it
alongside a fix to bug #17245.While I tend to favor relatively heavy assertions (e.g., the
heap_page_is_all_visible() related asserts I added to
lazy_scan_prune()), the idea of having a whole shared memory area just
for assertions seems a bit too much, even to me. I tried to simplify
it by just adding a new field to LVSharedIndStats, which seemed more
natural. It took me at least 15 minutes before I realized that I was
actually repeating exactly the same mistake that led to bug #17245 in
the first place. I somehow forgot that
parallel_stats_for_idx()/IndStatsIsNull() will return NULL for any
index that has already been deemed too small to be worth processing in
parallel. Even after all that drama!
The idea of that patch was for back branches in order to not affect
non-enable-cassert builds. That said, I agree that it's an overkill
solution.
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.
Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice. Rather, tracking bulkdelete and vacuumcleanup completion
might enable us to improve the vacuum progress reporting so that the
progress stats view shows how many indexes have been vacuumed (or
cleaned up).
Anyway, I'll write a patch accordingly.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mon, Nov 1, 2021 at 7:15 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
But, we need this information in the parallel worker as well to know
whether to perform index vacuum or clean up, so I guess we need this
information in shared memory, no?
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?
Can we think of moving parallelism-related code to a different file
(say vacuumparallel.c)? At least that will reduce the footprint of
vacuumlazy.c.
--
With Regards,
Amit Kapila.
On Tue, Nov 2, 2021 at 2:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Anyway, I'll write a patch accordingly.
While writing a patch for these comments, I found another bug in
parallel_processing_is_safe():
/*
* Returns false, if the given index can't participate in parallel index
* vacuum or parallel index cleanup
*/
static bool
parallel_processing_is_safe(Relation indrel, LVShared *lvshared)
{
uint8 vacoptions = indrel->rd_indam->amparallelvacuumoptions;
/* first_time must be true only if for_cleanup is true */
Assert(lvshared->for_cleanup || !lvshared->first_time);
if (lvshared->for_cleanup)
{
/* Skip, if the index does not support parallel cleanup */
if (((vacoptions & VACUUM_OPTION_PARALLEL_CLEANUP) == 0) &&
((vacoptions & VACUUM_OPTION_PARALLEL_COND_CLEANUP) == 0))
return true;
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.
I've attached a patch to fix it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
fix_parallel_processing_is_safe.patchapplication/octet-stream; name=fix_parallel_processing_is_safe.patchDownload+1-1
On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.I've attached a patch to fix it.
I pushed your fix just now.
Thanks
--
Peter Geoghegan
On Wed, Nov 3, 2021 at 11:53 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Nov 2, 2021 at 7:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
It returns true in the above condition but it should return false
since the index doesn't support parallel index cleanup at all. It
seems that this bug was introduced by commit b4af70cb21 (therefore
exists only in PG14) which flipped the return values of this function
but missed one place. The index AMs that don't support parallel index
cleanup at all are affected by this bug. Among the supported index AM
in the core, hash indexes are affected but since they just return the
number of blocks during vacuumcleanup it would not become a serious
consequence.I've attached a patch to fix it.
I pushed your fix just now.
Thanks!
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice.
If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.
--
With Regards,
Amit Kapila.
On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space in
an ad-hoc, type unsafe way). There should be one array element for
each and every index -- even those indexes where parallel index
vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
processing for "not worthwhile" indexes actually makes sense, BTW). We
can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
also add new per-index state fields to LVSharedIndStats itself. We
could directly record the status of each index (e.g., parallel unsafe,
amvacuumcleanup processing done, ambulkdelete processing done)
explicitly. All code could safely subscript the LVSharedIndStats array
directly, using idx style integers. That seems far more robust and
consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate index
statistics on DSM for unsafe indexes and “not worthwhile" indexes in
practice.If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor.patchapplication/octet-stream; name=parallel_vacuum_refactor.patchDownload+325-332
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))
when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.
Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.
- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?
To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.
Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),
don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.
Greetings,
Andres Freund
On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.
This looks great!
I wonder if this is okay, though:
/* Process the indexes that can be processed by only leader process */ - do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared); + lazy_serial_process_indexes(vacrel);/* - * Join as a parallel worker. The leader process alone processes all the - * indexes in the case where no workers are launched. + * Join as a parallel worker. The leader process alone processes all + * parallel-safe indexes in the case where no workers are launched. */ - do_parallel_processing(vacrel, lps->lvshared); + lazy_parallel_process_indexes(vacrel, lps->lvshared, vacrel->lps->lvsharedindstats);/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
One small thing: the new "LVSharedIndStats.parallel_safe" field seems
to be slightly misnamed. Isn't it more like
"LVSharedIndStats.parallel_workers_can_process"? The index might
actually be parallel safe *in principle*, while nevertheless being
deliberately skipped over by workers due to a deliberate up-front
choice made earlier, in compute_parallel_vacuum_workers(). Most
obvious example of this is the choice to not use parallelism for a
smaller index (say a partial index whose size is <
min_parallel_index_scan_size).
Another small thing, which is closely related to the last one:
typedef struct LVSharedIndStats { - bool updated; /* are the stats updated? */ + LVIndVacStatus status; + + /* + * True if both leader and worker can process the index, otherwise only + * leader can process it. + */ + bool parallel_safe; + + bool istat_updated; /* are the stats updated? */ IndexBulkDeleteResult istat; } LVSharedIndStats;
It would be nice to point out that the new
"LVSharedIndStats.parallel_safe" field (or whatever we end up calling
it) had comments that pointed out that it isn't a fixed thing for the
entire VACUUM operation -- it's only fixed for an individual call to
parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
entire VACUUM).
Alternatively, you could just have two booleans, I think. You know,
one for the "ambulkdelete portion", another for the "amvacuumcleanup
portion". As I've said before, it would be nice if we only called
parallel_vacuum_main() once per VACUUM operation (and not once per
"portion"), but that's a harder and more invasive change. But I don't
think you necessarily have to go that far for it to make sense to have
two bools. Having two might allow you to make both of them immutable,
which is useful.
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.
Maybe that's okay. Do you notice that it takes a lot longer now? I did
try to keep the runtime down when I committed the fixup to the
parallel VACUUM related bug.
--
Peter Geoghegan
On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
I think that I might see a related problem. But I'm not sure, so I'll just ask:
+ /* Set data required for parallel index vacuum or cleanup */ + prepare_parallel_index_processing(vacrel, vacuum); + + /* Reset the parallel index processing counter */ + pg_atomic_write_u32(&(lps->lvshared->idx), 0); + /* Setup the shared cost-based vacuum delay and launch workers */ if (nworkers > 0) { + /* Reinitialize the parallel context to relaunch parallel workers */ if (vacrel->num_index_scans > 0) - { - /* Reset the parallel index processing counter */ - pg_atomic_write_u32(&(lps->lvshared->idx), 0); - - /* Reinitialize the parallel context to relaunch parallel workers */ ReinitializeParallelDSM(lps->pcxt); - }
Is it okay that we don't call ReinitializeParallelDSM() just because
"nworkers == 0" this time around? I notice that there is a wait for
"nworkers_launched" workers to finish parallel processing, at the top
of ReinitializeParallelDSM(). I can see why the
"vacrel->num_index_scans > 0" test is okay, but I can't see why the
"nworkers == 0" test is okay.
I just want to be sure that we're not somehow relying on seeing state
in shared memory (in the LVSharedIndStats array) in all cases, but
finding that it is not actually there in certain rare edge cases.
Maybe this didn't matter before, because the leader didn't expect to
find this information in shared memory in any case. But that is
changed by your patch, of course, so it's something to be concerned
about.
--
Peter Geoghegan
On Thur, Nov 4, 2021 1:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada<sawada.mshk@gmail.com> wrote:
On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg@bowt.ie> wrote:
Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
assert-enabled builds), we should invent PARALLEL_VACUUM_STATS --
a dedicated shmem area for the array of LVSharedIndStats (no more
storing LVSharedIndStats entries at the end of the LVShared space
in an ad-hoc, type unsafe way). There should be one array element
for each and every index -- even those indexes where parallel
index vacuuming is unsafe or not worthwhile (unsure if avoiding
parallel processing for "not worthwhile" indexes actually makes
sense, BTW). We can then get rid of the bitmap/IndStatsIsNull()
stuff entirely. We'd also add new per-index state fields to
LVSharedIndStats itself. We could directly record the status of
each index (e.g., parallel unsafe, amvacuumcleanup processing
done, ambulkdelete processing done) explicitly. All code could
safely subscript the LVSharedIndStats array directly, using idx
style integers. That seems far more robust and consistent.Sounds good.
During the development, I wrote the patch while considering using
fewer shared memory but it seems that it brought complexity (and
therefore the bug). It would not be harmful even if we allocate
index statistics on DSM for unsafe indexes and “not worthwhile"
indexes in practice.If we want to allocate index stats for all indexes in DSM then why not
consider it on the lines of buf/wal_usage means tack those via
LVParallelState? And probably replace bitmap with an array of bools
that indicates which indexes can be skipped by the parallel worker.I've attached a draft patch. The patch incorporated all comments from Andres
except for the last comment that moves parallel related code to another file.
I'd like to discuss how we split vacuumlazy.c.
Hi,
I was recently reading the parallel vacuum code, and I think the patch can
bring a certain improvement.
Here are a few minor comments about it.
1)
+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }
Do you think it might be clearer to report an error here ?
2)
+prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum)
For the second paramater 'vacuum'. Would it be clearer if we pass a
LVIndVacStatus type instead of the boolean value ?
Best regards,
Hou zj
On Fri, Nov 5, 2021 at 4:42 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Nov 3, 2021 at 10:25 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.This looks great!
I wonder if this is okay, though:
/* Process the indexes that can be processed by only leader process */ - do_serial_processing_for_unsafe_indexes(vacrel, lps->lvshared); + lazy_serial_process_indexes(vacrel);/* - * Join as a parallel worker. The leader process alone processes all the - * indexes in the case where no workers are launched. + * Join as a parallel worker. The leader process alone processes all + * parallel-safe indexes in the case where no workers are launched. */ - do_parallel_processing(vacrel, lps->lvshared); + lazy_parallel_process_indexes(vacrel, lps->lvshared, vacrel->lps->lvsharedindstats);/*
* Next, accumulate buffer and WAL usage. (This must wait for the workers
@@ -2786,6 +2734,18 @@ do_parallel_vacuum_or_cleanup(LVRelState *vacrel, int nworkers)
InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
}Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)
lazy_serial_process_indexes() handles only parallel-unsafe (i.g.,
non-parallel-supported or too small indexes) indexes whereas
lazy_parallel_process_indexes() does that only parallel-safe indexes.
Therefore in the edge case, the leader process all indexes by using
both functions.
One small thing: the new "LVSharedIndStats.parallel_safe" field seems
to be slightly misnamed. Isn't it more like
"LVSharedIndStats.parallel_workers_can_process"? The index might
actually be parallel safe *in principle*, while nevertheless being
deliberately skipped over by workers due to a deliberate up-front
choice made earlier, in compute_parallel_vacuum_workers(). Most
obvious example of this is the choice to not use parallelism for a
smaller index (say a partial index whose size is <
min_parallel_index_scan_size).
Agreed.
Another small thing, which is closely related to the last one:
typedef struct LVSharedIndStats { - bool updated; /* are the stats updated? */ + LVIndVacStatus status; + + /* + * True if both leader and worker can process the index, otherwise only + * leader can process it. + */ + bool parallel_safe; + + bool istat_updated; /* are the stats updated? */ IndexBulkDeleteResult istat; } LVSharedIndStats;It would be nice to point out that the new
"LVSharedIndStats.parallel_safe" field (or whatever we end up calling
it) had comments that pointed out that it isn't a fixed thing for the
entire VACUUM operation -- it's only fixed for an individual call to
parallel_lazy_vacuum_or_cleanup_all_indexes() (i.e., it's only fixed
for the "ambulkdelete portion" or the "amvacuumcleanup portion" of the
entire VACUUM).
Agreed.
Alternatively, you could just have two booleans, I think. You know,
one for the "ambulkdelete portion", another for the "amvacuumcleanup
portion". As I've said before, it would be nice if we only called
parallel_vacuum_main() once per VACUUM operation (and not once per
"portion"), but that's a harder and more invasive change. But I don't
think you necessarily have to go that far for it to make sense to have
two bools. Having two might allow you to make both of them immutable,
which is useful.
If we want to make booleans immutable, we need three booleans since
parallel index cleanup behaves differently depending on whether
bulk-deletion has been called once. Anyway, if I understand your
suggestion correctly, it probably means to have booleans corresponding
to VACUUM_OPTION_PARALLEL_XXX flags. Does the worker itself need to
decide whether to skip conditionally-parallel-index-cleanup-safe
indexes?
Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.Maybe that's okay. Do you notice that it takes a lot longer now? I did
try to keep the runtime down when I committed the fixup to the
parallel VACUUM related bug.
It took 6s on my laptop (was 400ms).
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Nov 5, 2021 at 6:25 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
Since "The leader process alone processes all parallel-safe indexes in
the case where no workers are launched" (no change there), I wonder:
how does the leader *know* that it's the leader (and so can always
process any indexes) inside its call to
lazy_parallel_process_indexes()? Or, does the leader actually process
all indexes inside lazy_serial_process_indexes() in this edge case?
(The edge case where a parallel VACUUM has no workers at all, because
they couldn't be launched by the core parallel infrastructure.)I think that I might see a related problem. But I'm not sure, so I'll just ask:
+ /* Set data required for parallel index vacuum or cleanup */ + prepare_parallel_index_processing(vacrel, vacuum); + + /* Reset the parallel index processing counter */ + pg_atomic_write_u32(&(lps->lvshared->idx), 0); + /* Setup the shared cost-based vacuum delay and launch workers */ if (nworkers > 0) { + /* Reinitialize the parallel context to relaunch parallel workers */ if (vacrel->num_index_scans > 0) - { - /* Reset the parallel index processing counter */ - pg_atomic_write_u32(&(lps->lvshared->idx), 0); - - /* Reinitialize the parallel context to relaunch parallel workers */ ReinitializeParallelDSM(lps->pcxt); - }Is it okay that we don't call ReinitializeParallelDSM() just because
"nworkers == 0" this time around? I notice that there is a wait for
"nworkers_launched" workers to finish parallel processing, at the top
of ReinitializeParallelDSM(). I can see why the
"vacrel->num_index_scans > 0" test is okay, but I can't see why the
"nworkers == 0" test is okay.I just want to be sure that we're not somehow relying on seeing state
in shared memory (in the LVSharedIndStats array) in all cases, but
finding that it is not actually there in certain rare edge cases.
Maybe this didn't matter before, because the leader didn't expect to
find this information in shared memory in any case. But that is
changed by your patch, of course, so it's something to be concerned
about.
If we launch workers (i.g., nworkers > 0), we wait for these workers
to finish after processing all indexes (see we call
WaitForParallelWorkersToFinish() after lazy_parallel_process_indexes).
So it's guaranteed that all workers finished at the end
ofparallel_lazy_vacuum_or_cleanup_all_indexes(). So even in the
second call to this function, we don't need to wait for
"nworkers_launched" workers who previously were running to finish.
Does it make sense?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.
Thanks for your explanation. Understood.
I'll update the patch accordingly.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Tue, Nov 9, 2021 at 9:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Nov 5, 2021 at 4:00 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2021-11-01 10:44:34 +0900, Masahiko Sawada wrote:
On Sun, Oct 31, 2021 at 6:21 AM Andres Freund <andres@anarazel.de> wrote:
But even though we have this space optimized bitmap thing, we actually need
more memory allocated for each index, making this whole thing pointless.Right. But is better to change to use booleans?
It seems very clearly better to me. We shouldn't use complicated stuff like
#define SizeOfLVShared (offsetof(LVShared, bitmap) + sizeof(bits8))
#define GetSharedIndStats(s) \
((LVSharedIndStats *)((char *)(s) + ((LVShared *)(s))->offset))
#define IndStatsIsNull(s, i) \
(!(((LVShared *)(s))->bitmap[(i) >> 3] & (1 << ((i) & 0x07))))when there's reason / benefit.
- Imo it's pretty confusing to have functions like
lazy_parallel_vacuum_indexes() (in 13, renamed in 14) that "Perform index
vacuum or index cleanup with parallel workers.", based on
lps->lvshared->for_cleanup.Okay. We need to set lps->lvshared->for_cleanup to tell worker do
either index vacuum or index cleanup. So it might be better to pass
for_cleanup flag down to the functions in addition to setting
lps->lvshared->for_cleanup.Yup.
- I don't like some of the new names introduced in 14. E.g.
"do_parallel_processing" is way too generic.I listed the function names that probably needs to be renamed from
that perspecti:* do_parallel_processing
* do_serial_processing_for_unsafe_indexes
* parallel_process_one_indexIs there any other function that should be renamed?
parallel_processing_is_safe().
I don't like that there's three different naming patterns for parallel
things. There's do_parallel_*, there's parallel_, and there's
(begin|end|compute)_parallel_*.- On a higher level, a lot of this actually doesn't seem to belong into
vacuumlazy.c, but should be somewhere more generic. Pretty much none of this
code is heap specific. And vacuumlazy.c is large enough without the parallel
code.I don't come with an idea to make them more generic. Could you
elaborate on that?To me the the job that the parallel vacuum stuff does isn't really specific to
heap. Any table AM supporting indexes is going to need to do something pretty
much like it (it's calling indexam stuff). Most of the stuff in vacuumlazy.c
is very heap specific - you're not going to be able to reuse lazy_scan_heap()
or such. Before the parallel vacuum stuff, the index specific code in
vacuumlazy.c was fairly limited - but now it's a nontrivial amount of code.Based on a quick look
parallel_vacuum_main(), parallel_processing_is_safe(),
parallel_stats_for_idx(), end_parallel_vacuum(), begin_parallel_vacuum(),
compute_parallel_vacuum_workers(), parallel_process_one_index(),
do_serial_processing_for_unsafe_indexes(), do_parallel_processing(),
do_parallel_vacuum_or_cleanup(), do_parallel_lazy_cleanup_all_indexes(),
do_parallel_lazy_vacuum_all_indexes(),don't really belong in vacuumlazy.c. but should be in vacuum.c or a new
file. Of course that requires a bit of work, because of the heavy reliance on
LVRelState, but afaict there's not really an intrinsic need to use that.Thanks for your explanation. Understood.
I'll update the patch accordingly.
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.
What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum. Feedback is very
welcome.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
Attachments:
parallel_vacuum_refactor_v2.patchapplication/octet-stream; name=parallel_vacuum_refactor_v2.patchDownload+1344-1121
On Thu, Nov 11, 2021 at 8:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've attached a draft patch that refactors parallel vacuum and
separates parallel-vacuum-related code to new file vacuumparallel.c.
After discussion, I'll divide the patch into logical chunks.What I'm not convinced yet in this patch is that vacuum.c,
vacuumlazy.c and vacuumparallel.c depend on the data structure to
store dead tuples (now called VacDeadTuples, was LVDeadTuples). I
thought that it might be better to separate it so that a table AM can
use another type of data structure to store dead tuples. But since I
think it may bring complexity, currently a table AM has to use
VacDeadTuples in order to use the parallel vacuum.
I think it might be better to attempt doing anything to make it
generic for tableAM in a separate patch if that is required.
Few questions/comments:
=====================
1. There are three different structures PVState,
ParallelVacuumContext, and ParallelVacuumCtl to maintain the parallel
vacuum state. Can't we merge PVState and ParallelVacuumCtl? Also, I
see that most of the fields of PVState are there in
ParallelVacuumContext except for error info fields, does it makes
sense to directly use PVState instead? Also, it would be better to
write some more comments atop each structure to explain its usage.
2. In vacuum.c, the function names doesn't match the names in their
corresponding function header comments.
3.
+ INDVAC_STATUS_COMPLETED,
+} PVIndVacStatus;
The comma is not required after the last value of enum.
--
With Regards,
Amit Kapila.